Vol 1 - Drug Discovery

MEDICINAL CHEMISTRY
AND
DRUG DISCOVERY
Sixth Edition
Volume 1: Drug Discovery
Edited by
Donald J.Abraham
Department of Medicinal Chemistry
Vir
.. School of Pharmacy
- r- m iversity
Burger's Medicinal Chemistry and Drug Discovery

is available Online in full color at
www.mrw.interscience.wiley.com/bmcdd.
A John Wiley and Sons, Inc., Publication

BURGER MEMORIAL EDITION
The Sixth Edition of Burger's Medicinal laboratories, brought to market [Parnate,

Chemistry and Drug Discovery is being desig- which is the brand name for tranylcypromine,
nated as a Memorial Edition. Professor Alfred a monoamine oxidase (MAO) inhibitor]. Dr.
Burger was born in Vienna, Austria on Sep- Burger was a visiting Professor at the Univer-
tember 6, 1905 and died on December 30, sity of Hawaii and lectured throughout the
2000. Dr. Burger received his Ph.D. from the world. He founded the Journal of Medicinal
University of Vienna in 1928 and joined the Chemistry, Medicinal Chemistry Research,
Drug Addiction Laboratory in the Department and published the first major reference work
of Chemistry at the University of Virginia in "Medicinal Chemistry" in two volumes in
1929. During his early years at UVA, he syn- 1951. His last published work, a book, was
thesized fragments of the morphine molecule written at age 90 (Understanding Medica-
in an attempt to find the analgesic pharma- tions: What the Label Doesn't Tell You, June
cophore. He joined the UVA chemistry faculty 1995). Dr. Burger received the Louis Pasteur
in 1938 and served the department until his
Medal of the Pasteur Institute and the Amer,
retirement in 1970. The chemistry depart-
ican Chemical Society Smissman Award. Dr.
ment at UVA became the major academic
training ground for medicinal chemists be- Burger played the violin and loved classical
cause of Professor Burger. music. He was married for 65 years to Frances
Dr. Burger's research focused on analge- Page Burger, a genteel Virginia lady who al-
sics, antidepressants, and chemotherapeutic ways had a smile and an open house for the
agents. He is one of the few academicians to Professor's graduate students and postdoc-
have a drug, designed and synthesized in his toral fellows.
vii
PREFACE
The Editors, Editorial Board Members, and sixth edition, we devote an entire subsection
John Wiley and Sons have worked for three of Volume 4 to cancer research; we have also
and a half years to update the fifth edition of reviewed the major published Medicinal
Burger's Medicinal Chemistry and Drug Dis- Chemistry and Pharmacology texts to ensure
covery. The sixth edition has several new and that we did not omit any major therapeutic
unique features. For the first time, there will classes of drugs. An editorial board was consti-
be an online version of this major reference tuted for the first time to also review and sug-
work. The online version will permit updating gest topics for inclusion. Their help was
and easy access. For the first time, all volumes greatly appreciated. The newest innovation in
are structured entirely according to content this series will be the publication of an aca-
and published simultaneously. Our intention demic, "textbook-like" version titled, "Bur-
was to provide a spectrum of fields that would ger's Fundamentals of Medicinal Chemistry."
provide new or experienced medicinal chem- The academic text is to be published about a
ists, biologists, pharmacologists and molecu- year after this reference work appears. It will
lar biologists entry to their subjects of interest also appear with soft cover. Appropriate and
as well as provide a current and global per- key information will be extracted from the ma-
spective of drug design, and drug develop- jor reference.
ment. There are numerous colleagues, friends,
Our hope was to make this edition of and associates to thank for their assistance.
Burger the most comprehensive and useful First and foremost is Assistant Editor Dr.
published to date. To accomplish this goal, we John Andrako, Professor emeritus, Virginia
expanded the content from 69 chapters (5 vol- Commonwealth University, School of Phar-
umes) by approximately 50% (to over 100 macy. John and I met almost every Tuesday
chapters in 6 volumes). We are greatly in debt for over three years to map out and execute
to the authors and editorial board members the game plan for the sixth edition. His contri-
participating in this revision of the major ref- bution to the sixth edition cannot be under-
erence work in our field. Several new subject stated. Ms. Susanne Steitz, Editorial Program
areas have emerged since the fifth edition ap- Coordinator at Wiley, tirelessly and meticu-
peared. Proteomics, genomics, bioinformatics, lously kept us on schedule. Her contribution
combinatorial chemistry, high-throughput was also key in helping encourage authors to
screening, blood substitutes, allosteric effec- return manuscripts and revisions so we could
tors as potential drugs, COX inhibitors, the publish the entire set at once. I would also like
statins, and high-throughput pharmacology to especially thank colleagues who attended
are only a few. In addition to the new areas, we the QSAR Gordon Conference in 1999 for very
have filled in gaps in the fifth edition by in- helpful suggestions, especially Roy Vaz, John
cluding topics that were not covered. In the Mason, Yvonne Martin, John Block, and Hugo
Preface
Kubinyi. The editors are greatly indebted to Dukat, Martin Safo, Jason Rife, Kevin Reyn-
Professor Peter Ruenitz for preparing a tem- olds, and John Andrako in our Department
plate chapter as a guide for all authors. My of Medicinal Chemistry, School of Pharmacy,
secretary, Michelle Craighead, deserves spe- Virginia Commonwealth University for sug-
cial thanks for helping contact authors and gestions and special assistance in reviewing
reading the several thousand e-mails gener- manuscripts and text. Graduate student
ated during the project. I also thank the com- Derek Cashman took able charge of our web
puter center at Virginia Commonwealth Uni- site, http:l/www.burgersmedchem.com, an-
versity for suspending rules on storage and other first for this reference work. I would es-
e-mail so that we might safely store all the pecially like to thank my dean, Victor
versions of the author's manuscri~tswhere Yanchick, and Virginia Commonwealth Uni-
they could be backed up daily. ~ r $andt not versity for their support and encouragement.
least, I want to thank each and every author, Finally, I thank my wife Nancy who under-
some of whom tackled two chapters. Their stood the magnitude of this project and pro-
contributions have ~rovidedour-field with a
A vided insight on how to set up our home office
sound foundation of information to build for as well as provide John Andrako and me
the future. We thank the many reviewers of lunchtime menus where we often dreamed of
manuscripts whose critiques have greatly en- getting chapters completed in all areas we se-
hanced the presentation and content for the lected. To everyone involved, many, many
sixth edition. Special thanks to Professors thanks.
Richard Glennon, William Soine, Richard
Westkaemper, Umesh Desai, Glen Kel- DONALD J. ABRAHAM
logg, Brad Windle, Lemont Kier, Malgorzata Midlothian, Virginia
Dr. Alfred Burger
Pholtograph of Professor Burger followed by his comments to the American Chemical Society 26th Medicinal
Che,mistry Symposium on June 14, 1998. This was his last public appearance a t a meeting of medicinal
cheimists. As general chair of the 1998 ACS Medicinal Chemistry Symposium, the editor invited Professor
Burger to open the meeting. He was concerned that the young chemists would not know who he was and he
might have an attack due to his battle with Parkinson's disease. These fears never were realized and his
com.ments to the more than five hundred attendees drew a sustained standing ovation. The Professor was 93,
and it was Mrs. Burger's 91st birthday.
Opening Remarks
ACS 26th Medicinal Chemistry Symposium

June 14, 1998
Alfred Burger
University of Virginia
It has been 46 years since the third Medicinal Chemistry Symposium met at the University of
Virginia in Charlottesville in 1952. Today, the Virginia Commonwealth University welcomes
you and joins all of you in looking forward to an exciting program.
So many aspects of medicinal chemistry have changed in that half century that most of the
new data to be presented this week would have been unexpected and unbelievable had they
been mentioned in 1952. The upsurge in biochemical understandings of drug transport and
drug action has made rational drug design a reality in many therapeutic areas and has made
medicinal chemistry an independent science. We have our own journal, the best in the world,
whose articles comprise all the innovations of medicinal researches. And if you look at the
announcements of job opportunities in the pharmaceutical industry as they appear in
Chemical & Engineering News, you will find in every issue more openings in medicinal
chemistry than in other fields of chemistry. Thus, we can feel the excitement of being part of
this medicinal tidal wave, which has also been fed by the expansion of the needed research
training provided by increasing numbers of universities.
The ultimate beneficiary of scientific advances in discovering new and better therapeutic
agents and understanding their modes of action is the patient. Physicians now can safely look
forward to new methods of treatment of hitherto untreatable conditions. To the medicinal .
scientist all this has increased the pride of belonging to a profession which can offer predictable
intellectual rewards. Our symposium will be an integral part of these developments.
xii
CONTENTS
HISTORY OF QUANTITATIVE DRUG-TARGET BINDING

STRUCTURE-ACTMTY FORCES: ADVANCES IN FORCE
RELATIONSHIPS, 1 FIELD APPROACHES, 169
C. D. Selassie Peter A. Kollman
Chemistry Department University of California
Pomona College School of Pharmacy
Claremont, California Department of Pharmaceutical
Chemistry
San Francisco, California
RECENT TRENDS IN
QUANTITATrVE STRUCTURE- David A. Case
ACTMTY RELATIONSHIPS, 49 The Scripps Research Institute
A. Tropsha
Department of Molecular Biology .
La Jolla, California
University of North Carolina
Laboratory for Molecular Modeling
School of Pharmacy COMBINATORIAL LIBRARY
Chapel Hill, North Carolina DESIGN, MOLECULAR
SIMILARITY, AND DIVERSITY
APPLICATIONS,187
MOLECULAR, MODELING IN
DRUG DESIGN, 77 Jonathan S. Mason
Pfizer Global Research &
Garland R. Marshall
Development
Washington University
Sandwich, United Kingdom
Center for Computational Biology
St. Louis, Missouri Stephen D. Pickett
GlmoSmithKline Research
Denise D. Beusen
Stevenage, United Kingdom
Tripos, Inc.
St. Louis, Missouri
xiii
xiv Contents
6 VIRTUAL SCREENING, 243 Donald J. Abraham

Virginia Commonwealth University
Ingo Muegge
Richmond, Virginia
Istvan Enyedy
Bayer Research Center
West Haven, Connecticut 11 X-RAY CRYSTALLOGRAPHY IN
DRUG DISCOVERY, 471
7 DOCKING AND SCORING Douglas A. Livingston
FUNCTIONS/VIRTUAL
Sean G. Buchanan
SCREENING, 281
Kevin L. D'Amico
Christoph Sotriffer Michael V. Milburn
Gerhard Klebe Thomas S. Peat
University of Marburg J. Michael Sauder
Department of Pharmaceutical Structural GenomiX
Chemistry San Diego, California
Marburg, Germany
Martin Stahl
Hans-Joachim Bohm 12 NMR AND DRUG DISCOVERY,
Discovery Technologies 507
F. Hoffmann-La Roche AG David J. Craik
Basel, Switzerland Richard J. Clark
Institute for Molecular Bioscience
8 BIOINFORMATICS: ITS ROLE IN Australian Research Council
DRUG DISCOVERY, 333 Special Research Centre for
Functional and Applied Genomics
David J. ParrySmith University of Queensland .
ChiBio Informatics Brisbane, Australia
Cambridge, United Kingdom
9 CHEMICAL INFORMATION 13 MASS SPECTROMETRY AND

COMPUTING SYSTEMS IN DRUG DISCOVERY, 583
DRUG DISCOVERY, 357 Richard B. van Breemen
Douglas R. Henry Department of Medicinal Chemistry
MDL Information Systems, Inc. and Pharmacognosy
San Leandro, California University of Illinois at Chicago
Chicago, Illinois
10 STRUCTURE-BASED DRUG
DESIGN, 417 14 ELECTRON CRYOMICROSCOPY
Larry W. Hardy OF BIOLOGICAL
Aurigene Discovery Technologies MACROMOLECULES, 611
Lexington, Massachusetts Richard Henderson
Martin K. Safo Medical Research Council
Virginia Commonwealth University Laboratory of Molecular Biology
Richmond, Virginia Cambridge, United Kingdom
Contents
Timothy S. Baker 19 STRUCTURAL CONCEPTS IN

Purdue University THE PREDICTION OF THE
Department of Biological Sciences TOXICITY OF THERAPEUTICAL
West Lafayette, Indiana AGENTS, 827
Herbert S. Rosenkranz
15 PEPTIDOMIMETICS FOR DRUG Department of Biomedical Sciences
DESIGN, 633 Florida Atlantic University
M. Angels Estiarte Boca Raton, Florida
Daniel H. Rich
School of Pharmacy-Department of 20 NATURAL PRODUCTS AS
Chemistry LEADS FOR NEW
University of Wisconsin-Madison PHARMACEUTICALS, 847
Madison, Wisconsin
A. D. Buss
MerLion Pharmaceuticals
16 ANALOG DESIGN, 687 Singapore Science Park,
Joseph G. Cannon Singapore
The University of Iowa
B. Cox
Iowa City, Iowa
Medicinal Chemistry
Respiratory Diseases Therapeutic
17 APPROACHES TO THE Area
RATIONAL DESIGN OF Novartis Pharma Research Centre
ENZYME INHIBITORS, 715 Horsham, United Kingdom
Michael J. McLeish R. D. Waigh
George L. Kenyon
Department of Pharmaceutical .
Sciences
University of Michigan University of Strathclyde
Ann Arbor, Michigan Glasgow, Scotland
18 CHIRALITY AND BIOLOGICAL INDEX, 901

ACTIVITY, 781
Alistair G. Draffan
Graham R. Evans
James A. Henshilwood
Celltech R&D Ltd.
Granta Park, Great Abington,
BURGER'S
M E D I C I N A L CHEMISTRY
AND
D R U G DISCOVERY
CHAPTER ONE
History of Quantitative
structure-~ctivityRelationships
C. D. SELASSIE
Chemistry Department
Pomona College
Claremont, California
Contents
1 Introduction, 2
1.1Historical Development of QSAR, 3
1.2 Development of Receptor Theory, 4
2 Tools and Techniques of QSAR, 7
2.1 Biological Parameters, 7
2.2 Statistical Methods: Linear
Regression Analysis, 8
2.3 Compound Selection, 11
3 Parameters Used in QSAR, 11
3.1 Electronic Parameters, 11
3.2 Hydrophobicity Parameters, 15
3.2.1 Determination of Hydrophobicity by
Chromatography, 17 .
3.2.2 Calculation Methods, 18
3.3 Steric Parameters, 23
3.4 Other Variables and Variable Selection, 25
3.5 Molecular Structure Descriptors, 26
4 Quantitative Models, 26
4.1 Linear Models, 26
4.1.1 Penetration of ROH into
Phosphatidylcholine Monolayers (1841,
27
4.1.2 Changes in EPR Signal of Labeled
Ghost Membranes by ROH (185),27
4.1.3 Induction of Narcosis in Rabbits by
ROH (184), 27
4.1.4 Inhibition of Bacterial Luminescence
by ROH (185),27
4.1.5 Inhibition of Growth of Tetrahymena
pyriformis by ROH (76, 1861, 27
4.2 Nonlinear Models, 28
4.2.1 Narcotic Action of ROH on Tadpoles, 28
4.2.2 Induction of Ataxia in Rats by ROH, 29
Burger's Medicinal Chemistry and Drug Discovery 4.3 Free-Wilson Approach, 29
Sixth Edition, Volume 1: Drug Discovery 4.4 Other QSAR Approaches, 30
Edited by Donald J. Abraham 5 Applications of QSAR, 30
ISBN 0-471-27090-3 O 2003 John Wiley & Sons, Inc. 5.1 Isolated Receptor Interactions, 31
History of Quantitative Structure-Activity Relationships
5.1.1 Inhibition of Crude Pigeon Liver 5.1.18 Inhibition of 5-a-Reductase by 17P-

DHFR by Triazines (202),31 (N-(1-X-phenyl-cycloalky1)carbamoyl)-
5.1.2 Inhibition of Chicken Liver DHFR by 6-azaandrost-4-ene-3-ones, 111, 36
3-X-Triazines (207),31 5.2 Interactions at the Cellular Level, 37
5.1.3 Inhibition of Human DHFR by 3-X- 5.2.1 Inhibition of Growth of L1210/S by 3-
Triazines (208), 32 X-Triazines (209), 37
5.1.4 Inhibition of L1210 DHFR by 3-X- 5.2.2 Inhibition of Growth of L1210lR by
Triazines (2091, 32 3-X-Triazines (209), 37
5.1.5 Inhibition of P. carinii DHFR by 3-X- 5.2.3 Inhibition of Growth of Tetrahymena
Triazines (210), 32
pyriformis (40 h), 37
5.1.6 Inhibition of L. major DHFR by 3-X-
5.2.4 Inhibition of Growth of T. pyriformis
Triazines (211),33
by Phenols (using a) (22'71, 38
5.1.7 Inhibition of T. gondii DHFR by 3-X-
Triazines, 33 5.2.5 Inhibition of Growth of T. pyriformis
5.1.8 Inhibition of Rat Liver DHFR by 2,4- by Electron-Releasing Phenols (2271,
Diamino, 5-Y, 6-Z-quinazolines (213), 38
34 5.2.6 Inhibition of Growth of T. pyriformis
5.1.9 Inhibition of Human Liver DHFR by by Electron-Attracting Phenols (2271,
2,4-Diamino, 5-Y, 6-Z-quinazolines 38
(214), 34 5.2.7 Inhibition of Growth of T. pyriformis
5.1.10 Inhibition of Murine L1210 DHFR by by Aromatic Compounds (229), 38
2,4-Diamino, 5-Y, 6-Z-quinazolines 5.3 Interactions In Viuo, 38
(2141, 34 5.3.1 Renal Clearance of P-Adrenoreceptor
5.1.11 Inhibition of Bovine Liver DHFR by Antagonists, 38
2,4-Diamino, 5-Y, 6-Z-quinazolines 5.3.2 Nonrenal Clearance of P-
(215), 34 Adrenoreceptor Antagonists, 39
5.1.12 Binding of X-Phenyl, N-Benzoyl-L- 6 Comparative QSAR, 39
alaninates to a-Chyrnotqpsin in 6.1 Database Development, 39
Phosphate Buffer, pH 7.4 (203),35 6.2 Database: Mining for Models, 39
5.1.13 Binding of X-Phenyl, N-Benzoyl-L-ala- 6.2.1 Incidence of Tail Defects of Embryos
ninates to a-Chymotrypsin in (235), 40
Pentanol(203), 35 6.2.2 Inhibition of DNA Synthesis in CHO
5.1.14 Binding of X-Phenyl, N-Benzoyl-L-
alaninates in Aqueous Phosphate
Cells by X-Phenols (236),40
6.2.3 Inhibition of Growth of L1210 by X-
.
Buffer (218),35 Phenols, 40
5.1.15 Binding of X-Phenyl, N-Benzoyl-L- 6.2.4 Inhibition of Growth of L1210 by
alaninates in Pentanol(218), 35 Electron-Withdrawing Substituents
5.1.16 Inhibition of 5-a-Reductase by 4-X, (af > 0),41
N-Y-6-azaandrost-17-CO-Z-4-ene-3- 6.2.5 Inhibition of Growth of L1210 by
ones, I, 36 Electron-Donating Substituents (at<
5.1.17 Inhibition of 5-a-Reductase by 170- O), 41
(N-(X-pheny1)carbamoyl)-6-azaan- 6.3 Progress in QSAR, 41
drost-4-ene-3-ones, II,36 7 Summary, 42
1 INTRODUCTION scribed by electronic attributes, hydrophobic-

ity, and steric properties as well as the rapid
It has been nearly 40 years since the quantita- and extensive development in methodologies
tive structure-activity relationship (QSAR) and computational techniques that have en-
paradigm first found its way into the practice sued to delineate and refine the many vari-
of agrochemistry, pharmaceutical chemistry, ables and approaches that define the para-
toxicology, and eventually most facets of digm. The overall goals of QSAR retain their
chemistry (1).Its stayingpower may be attrib- original essence and remain focused on the
uted to the strength of its initial postulate that predictive ability of the approach and its re-
activity was a function of structure as de- ceptiveness to mechanistic interpretation.
1 Introduction
Rigorous analysis and fine-tuning of indepen- tion of bases and weak acids in bacteriostatic
dent variables has led to an expansion in de- activity (10-12). Meanwhile on the physical
velopment of molecular and atom-based de- organic front, great strides were being made in
scriptors, as well as descriptors derived from the delineation of substituent effects on or-
quantum chemical calculations and spectros- ganic reactions, led by the seminal work of
copy (2). The improvement in high-through- Hammett, which gave rise to the "sigma-rho"
put screening procedures allows for rapid culture (13, 14). Taft devised a way for sepa-
screening of large numbers of compounds un- rating polar, steric, and resonance effects and
der similar test conditions and thus minimizes introducing the first steric parameter, Es (15).
the risk of combining variable test data from The contributions of Hammett and Taft to-
many sources. gether laid the mechanistic basis for the devel-
The formulation of thousands of equa-
opment of the QSAR paradigm by Hansch and
tions using QSAR methodology attests to a
Fujita. In 1962 Hansch and Muir published
validation of its concepts and its utility in
their brilliant study on the structure-activity
the elucidation of the mechanism of action of
drugs at the molecular level and a more com- relationships of plant growth regulators and
plete understanding of physicochemical phe- their dependency on Hammett constants and
nomena such as hydrophobicity. It is now hydrophobicity (16). Using the octanoVwater
possible not only to develop a model for a system, a whole series of partition coefficients
system but also to compare models from a were measured, and thus a new hydrophobic
biological database and to draw analogies scale was introduced (17). The parameter a,
with models from a physical organic data- which is the relative hydrophobicity of a sub-
base (3). This process is dubbed model min- stituent, was defined in a manner analogous to
ing and it provides a sophisticated approach the definition of sigma (18).
to the study of chemical-biological interac-
tions. QSAR has clearly matured, although
it still has a way to go. The previous review
by Kubinyi has relevant sections covering P, and P, represent the partition coefficients
portions of this chapter as well as an exten- of a derivative and the parent molecule, re-
sive bibliography recommended for a more spectively. Fujita and Hansch then combined
complete overview (4). these hydrophobic constants with Hammett's
1.1 Historical Development of QSAR electronic constants to yield the linear Hansch
equation and its many extended forms (19).
More than a century ago, Crum-Brown and
Fraser expressed the idea that the physiologi-
cal action of a substance was a function of its
chemical composition and constitution (5). A
few decades later, in 1893, Richet showed that Hundreds of equations later, the failure of lin-
the cytotoxicities of a diverse set of simple or- ear equations in cases with extended hydro-
ganic molecules were inversely related to their phobicity ranges led to the development of the
corresponding water solubilities (6). At the Hansch parabolic equation (20):
turn of the 20th century, Meyer and Overton
independently suggested that the narcotic (de- .
Log 1IC = a log P
(1.3)
pressant) action of a group of organic com-
pounds paralleled their olive oiVwater parti- - b(l0g P y + C U +k
tion coefficients (7, 8). In 1939 Ferguson
introduced a thermodynamic generalization The delineation of these models led to explo-
to the correlation of depressant action with sive development in QSAR analysis and re-
the relative saturation of volatile compounds lated approaches. The Kubinyi bilinear
in the vehicle in which they were administered model is a refinement of the parabolic model
(9). The extensive work of Albert, and Bell and and, in many cases, it has proved to be supe-
Roblin established the importance of ioniza- rior (21).
.
Log 1IC = a log P distances and Euclidean distances among at-
oms to calculate E-state values for each atom
in a molecule that is sensitive to conforma-
tional structure. Recently, these electrotopo-
Besides the Hansch approach, other method- logical indices that encode significant struc-
ologies were also developed to tackle structured information on the topological state of
ture-activity questions. The Free-Wilson ap- atoms and fragments as well as their valence
proach addresses structure-activity studies in electron content have been applied to biologi-
a congeneric series as described in Equation cal and toxicity data (28). Other recent devel-
1.5 (22). opments in QSAR include approaches such as
HQSAR, Inverse QSAR, and Binary QSAR
(29-32). Improved statistical tools such as
partial least square (PLS) can handle situa-
BA is the biological activity, u is the average tions where the number of variables over-
contribution of the parent molecule, and aiis whelms the number of molecules in a data set,
the contribution of each structural feature; xi which may have collinear X-variables (33).
denotes the presence Xi = 1 or absence Xi = 0 1.2 Development of Receptor Theory
of a particular structural fragment. Limita-
tions in this approach led to the more sophis- The central theme of molecular pharmacol-
ogy, and the underlying basis of SAR studies,
ticated Fujita-Ban equation that used the log-
has focused on the elucidation of the structure
arithm of activity, which brought the activity
and function of drug receptors. It is an en-
parameter in line with other free energy-re- deavor that proceeds with unparalleled vigor,
lated terms (23). fueled by the developments in genomics. It is
generally accepted that endogenous and exog-
enous chemicals interact with a binding site
on a specific macromolecular receptor. This in-
In Equation 1.6, u is defined as the calculated teraction, which is determined by intermolec-
biological activity value of the unsubstituted ular forces, may or may not elicit a pharmaco-
parent compound of a particular series. Girep-
resents the biological activity contribution of
logical response depending on its eventual site
of action.
.
the substituents, whereasxi is ascribed with a The idea that drugs interacted with specific
value of one when the substituent is present or receptors began with Langley, who studied the
zero when it is absent. Variations on this ac- mutually antagonistic action of the alkaloids,
tivity-based approach have been extended by pilocorpine and atropine. He realized that
Klopman et al. (24) and Enslein et al. (25). both these chemicals interacted with some re-
Topological methods have also been used to ceptive substance in the nerve endings of the
address the relationships between molecular gland cells (34). Paul Ehrlich defined the re-
ceptor as the "binding group of the protoplas-
structure and physical/biological activity. The
mic molecule to which a foreign newly intro-
minimum topological difference (MTD)
duced group binds" (35). In 1905 Langley's
method of Simon and the extensive studies on studies on the effects of curare on muscular
molecular connectivity by Kier and Hall have contraction led to the first delineation of crit-
contributed to the development of quantita- ical characteristics of a receptor: recognition
tive structure propertylactivity relationships capacity for certain ligands and an amplifica-
(26,271. Connectivity indices based on hydro- tion component that results in a pharmacolog-
gen-suppressed molecular structures are rich ical response (36).
in information on branching, 3-atom frag- Receptors are mostly integral proteins em-
ments, the degree of substitution, proximity of bedded in the phospholipid bilayer of cell
substituents and length, and heteroatom of membranes. Rigorous treatment with deter-
substituted rings. A method in its embryonic gents is needed to dissociate the proteins from
state of development uses both graph bond the membrane, which often results in loss of
1 Introduction
integrity and activity. Pure proteins such as Probing of various enzymes by different li-
enzymes also act as drug receptors. Their rel- gands also aided in dispelling the notion of
ative ease of isolation and amplification have Fischer's rigid lock-and-key concept, in which
made enzymes desirable targets in structure- the ligand (key) fits precisely into a receptor
based ligand design and QSAR studies. Nu- (lock). Thus, a "negative" impression of the
cleic acids comprise an important category of substrate was considered to exist on the en-
drug receptors. Nucleic acid receptors (apta- zyme surface (geometric complementarity).
mers), which interact with a diverse number Unfortunately, this rigid model fails to ac-
of small organic molecules, have been isolated count for the effects of allosteric ligands, and
by in vitro selection techniques and studied this encouraged the evolution of the induced-
(37). Recent binary complexes provide insight -
fit model. Thus, "deformable" lock-and-key
into the molecular recognition process in models have gained acceptance on the basis of
these biopolymers and also establish the im- structural studies, especially NMR (45).
portance of the architecture of tertiary motifs It is now possible to isolate membrane-
in nucleic acid folding (38). Groove-binding li- bound receptors, although it is still a challenge
gands such as lexitropsins hold promise as po- to delineate their chemistry, given that sepa-
tential drugs and are thus suitable subjects for ration from the membrane usually ensures
focused QSAR studies (39). loss of reactivity. Nevertheless, great ad-
Over the last 20 years, extensive QSAR vances have been made in this arena, and the
studies on ligand-receptor interactions have three-dimensional structures of some mem-
been carried out with most of them focusing brane-bound proteins have recently been elu-
on enzymes. Two recent developments have cidated. To gain an appreciation for mecha-
augmented QSAR studies and established an nisms of ligand-receptor interactions, it is
attractive approach to the elucidation of the necessary to consider the intermolecular
mechanistic underpinnings of ligand-receptor forces at play. Considering the low concentra-
interactions: the advent of molecular graphics tion of drugs and receptors in the human body,
and the ready availability of X-ray crystallog- the law of mass action cannot account for the
raphy coordinates of various binary and ter- ability of a minute amount of a drug to elicit a
nary complexes of enzymes with diverse li- pronounced pharmacological effect. The driv-
gands and cofactors. Early studies with serine ing force for such an interaction may be attrib-.
and thiol proteases (chymotrypsin, trypsin, uted to the low energy state of the drug-
and papain), alcohol dehydrogenase, and nu- receptor complex: KD = [Drug][Receptor]/
merous dihydrofolate reductases (DHFR) not [Drug-Receptor Complex].Thus, the biological
only established molecular modeling as a pow- activity of a drug is determined by its affinity
e r h l tool, but also helped clarify the extent of for the receptor, which is measured by its K,,,
the role of hydrophobicity in enzyme-ligand the dissociation constant at equilibrium. A
interactions (40-44). Empirical evidence indi- smaller KD implies a large concentration of
cated that the coefficients with the hydropho- the drug-receptor complex and thus a greater
bic term could be related to the degree of de- affinity of the drug for the receptor. The latter
solvation of the ligand by critical amino acid property is promoted and stabilized by mostly
residues in the binding site of an enzyme. To- noncovalent interactions sometimes aug-
tal desolvation, as characterized by binding in mented by a few covalent bonds. The sponta-
a deep crevice/pocket, resulted in coefficients neous formation of a bond between atoms re-
of approximately 1.0 (0.9-1.1) (44). An exten- sults in a decrease in free energy; that is, AG is
sion of this agreement between the mathemat- negative. The change in free energy AG is re-
ical expression and structure as determined by lated to the equilibrium constant K,,.
X-ray crystallography led to the expectation
that the binding of a set of substituents on the
surface of an enzyme would yield a coefficient
of about 0.5 (0.4-0.6) in the regression equa- Thus, small changes in AG" can have a pro-
tion, indicative of partial desolvation. found effect on equilibrium constants.
6 History of Quantitative Structure-Activity Relationships
Table 1.1 Types of Intermolecular Forces

Bond Strength
Bond Type (kcallmol) Example
1. Covalent 40-140 CH3CH20-H
0
2. Ionic (Electrostatic) 5 R 4+N ~ ~- ~ ~I1~ ~ ~ ~ O - C -
3. Hydrogen
5. van der Wads
6 . Hydrophobic
In the broadest sense, these "bonds" would bility of the (ahelix and base-pairing in DNA.
include covalent, ionic, hydrogen, dipole-di- Hydrogen bonding is based on an electrostatic
pole, van der Wads, and hydrophobic interac- interaction between the nonbonding electrons
tions. Most drug-receptor interactions consti- of a heteroatom (e.g., N, 0, S) and the elec-
tute a combination of the bond types listed in tron-deficient hydrogen atom of an -OH, SH,
Table 1.1, most of which are reversible under or NH group. Hydrogen bonds are strongly
physiological conditions. directional, highly dependent on the net de-
Covalent bonds are not as important in gree of solvation, and rather weak, having en-
drug-receptor binding as noncovalent interac- ergies ranging from 1 to 10 kcal/mol(47,48).
tions. Alkylating agents in chemotherapy tend Bonds with this type of strength are of critical
to react and form an immonium ion, which importance because they are stable enough to
then alkylates proteins, preventing their nor- provide significant binding energy but weak
mal participation in cell divisions. Baker's
enough to allow for quick dissociation. The
concept of active site directed irreversible in-
greater electronegativity of atoms such as ox-
hibitors was well established by covalent for-
mation of Baker's antifolate and dihydrofolate ygen, nitrogen, sulfur, and halogen, compared
reductase (46). to that of carbon, causes bonds between these
Ionic (electrostatic) interactions are formed atoms to have an asymmetric distribution of
between ions of opposite charge with energies electrons, which results in the generation of
that are nominal and that tend to fall off with electronic dipoles. Given that so many func-
distance. They are ubiquitous and because tional groups have dipole moments, ion-dipole
they act across long distances, they play a and dipole-dipole interactions are frequent.
prominent role in the actions of ionizable The energy of dipole-dipole interactions can
drugs. The strength of an electrostatic force is be described by Equation 1.8, where p is the
directly dependent on the charge of each ion dipole moment, 0 is the angle between the two
and inversely dependent on the dielectric con- poles of the dipole, D is the dielectric constant
stant of the solvent and the distance between of the medium and r is the distance between
the charges. the charges involved in the dipole.
Hydrogen bonds are ubiquitous in nature:
their multiple presence contributes to the sta-
2 Tools and Techniques of QSAR
Although electrostatic interactions are state that it is the involvement of myriad in-
generally restricted to polar molecules, there teractions that contribute to the overall selec-
are also strong interactions between nonpolar tivity of drug-receptor interactions.
molecules over small intermolecular dis-
tances. Dispersion or Londonlvan der Wads
forces are the universal attractive forces be- 2 TOOLS AND TECHNIQUES OF QSAR
tween atoms that hold nonpolar molecules to-
gether in the liquid phase. They are based on
2.1 Biological Parameters
polarizability and these fluctuating dipoles or
shifts in electron clouds of the atoms tend to In QSAR analysis, it is imperative that the
induce opposite dipoles in adjacent molecules, biological data be both accurate and precise to
resulting in a net overall attraction. The en- develop a meaningful model. It must be real-
ergy of this interaction decreases very rapidly ized that any resulting QSAR model that is
in proportion to llr6,where r is the distance developed is only as valid statistically as the
separating the two molecules. These van der data that led to its development. The equilib-
Wads forces operate at a distance of about rium constants and rate constants that are
0.4-0.6 nm and exert an attraction force of used extensively in physical organic chemistry
less than 0.5 kcallmol. Yet, although individ- and medicinal chemistry are related to free
ual van der Wads forces make a low energy energy values AG. Thus for use in QSAR, stan-
contribution to an event, they become signifi- dard biological equilibrium constants such as
cant and additive when summed up over a Ki or K, should be used in QSAR studies.
large area with close surface contact of the Likewise only standard rate constants should
atoms. be deemed appropriate for a QSAR analysis.
Hydrophobicity refers to the tendency of Percentage activities (e.g., % inhibition of
nonpolar compounds to transfer from an growth at certain concentrations) are not ap-
aqueous phase to an organic phase (49, 50). propriate biological endpoints because of the
When a nonpolar molecule is placed in water, nonlinear characteristic of dose-response rela-
it gets solvated by a "sweater" of water mole- tionships. These types of endpoints may be
cules ordered in a somewhat icelike manner. transformed to equieffective molar doses.
This increased order in the water molecules Only equilibrium and rate constants pass
surrounding the solute results in a loss of en- muster in terms of the free-energy relatioA-
tropy. Association of hydrocarbon molecules ships or influence on QSAR studies. Biological
leads to a "squeezing out" of the structured data are usually expressed on a logarithmic
water molecules. The displaced water becomes scale because of the linear relationship be-
bulk water, less ordered, resulting in a gain in tween response and log dose in the midregion
entropy, which provides the driving force for of the log dose-response curve. Inverse loga-
what has been referred to as a hydrophobic rithms for activity (log 1/C) are used so that
bond. Although this is a generally accepted higher values are obtained for more effective
view of hydrophobicity, the hydration of apo- analogs. Various types of biological data have
lar molecules and the noncovalent interac- been used in QSAR analysis. A few common
tions between these molecules in water are endpoints are outlined in Table 1.2.
still poorly understood and thus the source of Biological data should pertain to an aspect
continued examination (51-53). of biological/biochemical function that can be
Because noncovalent interactions are gen- measured. The events could be occurring in
erally weak, cooperativity by several types of enzymes, isolated or bound receptors, in cellu-
interactions is essential for overall activity. lar systems, or whole animals. Because there
Enthalpy terms will be additive, but once the is considerable variation in biological re-
first interaction occurs, translational entropy sponses, test samples should be run in dupli-
is lost. This results in a reduced entropy loss in cate or preferably triplicate, except in whole
the second interaction. The net result is that animal studies where assay conditions (e.g.,
eventually several weak interactions combine plasma concentrations of a drug) preclude
to produce a strong interaction. One can safely such measurements.
Table 1.2 Types of Biological Data Utilized Usually the observed biological activity is re-
in QSAR Analysis flective of the slow step or the rate-determin-
Source of Activity Biological Parameters ing step.
To determine a defined biological response
1. Isolated receptors
(e.g., IC,,), a dose-response curve is first es-
Rate constants Log k& Log k,,& Log k
Michaelis-Menten Log 1 /K,
tablished. Usually six to eight concentrations
constants are tested to yield percentages of activity or
Inhibition constants Log l/Ki inhibition between 20 and 80%,the linear por-
Affinity data P&; PA, tion of the curve. Using the curves, the dose
responsible for an established effect can easily
2. Cellular systems be determined. This procedure is meaningful
Inhibition constants Log 1/1C,, if, at the time the response is measured, the
Cross resistance Log CR system is at equilibrium, or at least under
In vitro biological data Log 1IC steady-state conditions.
Mutagenicity states Log T b Other approaches have been used to apply
3. "In vivo" systems
the additivity concept and ascertain the bind-
Biocencentration factor Log BCF ing energy contributions of various substitu-
In vivo reaction rates Log I (Induction) ent (R) groups. Fersht et al. have measured
Pharmacodynamic Log 2' (total clearance) the binding energies of various alkyl groups to
rates aminoacyl-tRNA synthetases (54). Thus the
AG values for methyl, ethyl, isopropyl, and
thio substituents were determined to be 3.2,
6.5, 9.6, and 5.4 kcal/mol, respectively.
It is also important to design a set of mole- An alternative, generalized approach to de-
cules that will yield a range of values in terms termining the energies of various drug-recep-
of biological activities. It is understandable tor interactions was developed by Andrews et
that most medicinal chemists are reluctant to al. (55), who statistically examined the drug-
synthesize molecules with poor activity, even receptor interactions of a diverse set of mole-
though these data points are important in de- cules in aqueous solution. Using Equation 1.9,
veloping a meaningful QSAR. Generally, the a relationship was established between AG
larger the range (>2 log units) in activity, the and Ex (intrinsic binding energy), ED,, (energy'
easier it is to generate a predictive QSAR. This of average entropy loss), and the A S , , (energy
kind of equation is more forgiving in terms of of rotational and translational entropy loss).
errors of measurement. A narrow range in bi-
ological activity is less forgiving in terms of
accuracy of data. Another factor that merits
consideration is the time structure. Should a Ex denotes the sum of the intrinsic binding
particular reading be taken after 48 or 72 h? energy of each functional group of which nx
Knowledge of cell cycles in cellular systems or are present in each drug in the set. Using
biorhythms in animals would be advanta- Equation 1.9, the average binding energies for
geous. various functional groups were calculated.
Each single step of drug transport, binding, These energies followed a particular trend
and metabolism involves some form of parti- with charged groups showing stronger inter-
tioning between an aqueous compartment and actions and nonpolar entities, such as sp2, sp3
a nonaqueous phase, which could be a mem- carbons, contributing very little. The applica-
brane, serum protein, receptor, or enzyme. In bility of this approach to specific drug-receptor
the case of isolated receptors, the endpoint is interactions remains to be seen.
clear-cut and the critical step is evident. But in
more complex systems, such as cellular sys- 2.2 Statistical Methods: Linear
tems or whole animals, many localized steps Regression Analysis
could be involved in the random-walk process The most widely used mathematical tech-
and the eventual interaction with a target. nique in QSAR analysis is multiple regression
2 Tools and Techniques of QSAR
analysis (MRA). We will consider some of the Expanding Equation 1.15, we obtain
basic tenets of this approach to gain a firm
understanding of the statistical procedures n
that define a QSAR. Regression analysis is a
powerful means for establishing a correlation
SS = 2 (Yo,: - YobsaXi YObsb
i=l
-
between independent variables and a depen-

dent variable such as biological activity (56). - Yob&Xi+ a 2X i2 + aXib (1.16)
Taking the partial derivative of Equation 1.14

Certain assumptions are made with regard with respect to b and then with respect to a,
to this procedure (57): results in Equations 1.17 and 1.18.
1. The independent variables, which in this n
case usually include the physicochemical dSS
parameters, are measured without error.
-
db
= 2 - 2(Yobs
- b - axi) (1.17)
i=l
Unfortunately, this is not always the case,
although the error in these variables is dSS
n
small compared to that in the dependent -- -

da 2 - 2Xi(Yobs- b - a x i ) (1.18)
variable. i=l
2. For any given value of X, the Y values are

independent and follow a normal distribu- SS can be minimized with respect to b and a
tion. The error term Eipossesses a normal and divided by -2 to yield the normal Equa-
distribution with a mean of zero. tions 1.19 and 1.20.
3. The expected mean value for the variable
Y, for all values of X, lies on a straight line.
4. The variance around the regression line is
constant. The "best" straight line for
model Yi = b + aZi + E is drawn through
the data points, such that the sum of the
squares of the vertical distances from the
points to the line is minimized. Y repre-
sents the value of the observed data point These "normal equations" can be rewritten as
and Y,,,, is the predicted value on the line. follows:
The sum of squares SS = 2: (Y,,, - Yc,,)2.
2 Ei2= C A 2
= SS
The solution of these simultaneous equa-
i=l
tions yields a and b. More thorough analyses
= 2( yobs - YcaIc)
of these procedures have been examined in
detail (19, 58-60). The following simple ex-
n ample, illustrated by Table 1.3, will illus-
Thus, SS = 2 (Yobs a x i- - b)2 (1.15) trate the nuances of a linear regression anal-
i=l ysis.
Table 1.3 Antibacterial Activity

of N'-(R-pheny1)sulfanilamides
Compound u(X) Observed BA (Y)
1. 4-CH3 -0.17 4.66 The correlation coefficient r is a measure of
2. 4-H 0 4.80 quality of fit of the model. It constitutes the
3. 441 0.23 4.89 variance in the data. In an ideal situation one
4. 241 0.23 5.55 would want the correlation coefficient to be
5. 2-NO2 0.78 6.00 equal to or approach 1, but in reality because
6. 4-NO, 0.78 6.00
of the complexity of biological data, any value
k = no. of variables = 1 above 0.90 is adequate. The standard devia-
n = no. of data points = 6 tion is an absolute measure of the quality of fit.
X X = 1.85
Z Y = 31.90 Ideally s should approach zero, but in experi-
Z X 2 = 1.352 mental situations, this is not so. It should be
Z Y 2 = 171.45 small but it cannot have a value lower than the
Z XY = 10.968 standard deviation of the experimental data.
The magnitude of s may be attributed to some
experimental error in the data as well as im-
perfections in the biological model. A larger
For linear regression analysis, Y = ax +b data set and a smaller number of variables
generally lead to lower values of s. The F value
is often used as a measure of the level of sta-
tistical significance of the regression model. It
is defined as denoted in Equation 1.27.
A larger value of F implies a more significant

correlation has been reached. The confidence
intervals of the coefficients in the equation r&
veal the significance of each regression term in
the equation.
To obtain a statistically sound QSAR, it is
important that certain caveats be kept in
mind. One needs to be cognizant about col-
linearity between variables and chance corre-
lations. Use of a correlation matrix ensures
that variables of significance and/or interest
The correlation coefficient r, the total vari- are orthogonal to each other. With the rapid
ance SS,, the unexplained variance SSQ, proliferation of parameters, caution must be
and the standard deviation, are defined as exercised in amassing too many variables for a
follows: QSAR analysis. Topliss has elegantly demon-
strated that there is a high risk of ending up
with a chance correlation when too many vari-
ables are tested (62).
Outliers in QSAR model generation
present their own problems. If they are badly
fit by the model (off by more than 2 standard
deviations), they should be dropped from the
data set, although their elimination should be
x A2 = SSQ = 2 (Yobs- YcdJ2 (1.25) noted and addressed. Their aberrant behavior
3 Parameters Used in QSAR
may be attributed to inaccuracies in the test- designs are grouped together in the overall
ing procedure (usually dilution errors) or un- training set that is representative of all clus-
usual behavior. They often provide valuable ters (74).
information in terms of the mechanistic inter-
pretation of a QSAR model. They could be par- 3 PARAMETERS USED IN QSAR
ticipating in some intermolecular interaction
that is not available to other members of the 3.1 Electronic Parameters
data set or have a drastic change in mecha- Parameters are of critical importance in deter-
nism. mining the types of intermolecular forces that
2.3 Compound Selection underly drug-receptor interactions. The three
major types of parameters that were initially
In setting up to run a QSAR analysis, com- suggested and still hold sway are electronic,
pound selection is an important angle that hydrophobic, and steric in nature (20,751. Ex-
needs to be addressed. One of the earliest tensive studies using electronic parameters
manual methods was an approach devised by reveal that electronic attributes of molecules
Craig, which involves two-dimensional plots of are intimately related to their chemical reac-
important physicochemical properties. Care is tivities and biological activities. A search of a
taken to select substituents from all four computerized QSAR database reveals the fol-
quadrants of the plot (63). The Topliss opera- lowing: the common Hammett constants (a,
tional scheme allows one to start with two u+, up) account for 700018500 equations in
compounds and construct a potency tree that the Physical organic chemistry (PHYS) data-
grows branches as the substituent set is ex- base and nearly 1600/8000 in the Biology
panded in a stepwise fashion (64). Topliss (BIO) database, whereas quantum chemical
later proposed a batchwise scheme including indices such as HOMO, LUMO, BDE, and po-
certain substituents such as the 3,4-Cl,, 441, larizability appear in 100 equations in the BIO
4-CH,, 4-OCH,, and 4-H analogs (65). Other database (76).
methods of manual substituent selection in- The extent to which a given reaction re-
clude the Fibonacci search method, sequential sponds to electronic perturbation constitutes
simplex strategy, and parameter focusing by a measure of the electronic demands of that
Magee (66- 68). reaction, which is determined by its mecha-,
One of the earliest computer-based and sta- nism. The introduction of substituent groups
tistical selection methods, cluster analysis was into the framework and the subsequent alter-
devised by Hansch to accelerate the process ation of reaction rates helps delineate the
and diversity of the substituents (1).Newer overall mechanism of reaction. Early work ex-
methodologies include D-optimal designs, amining the electronic role of substituents on
which focus on the use of det (X'X), the vari- rate constants was first tackled by Burckhardt
ance-covariance matrix. The determinant of and firmly established by Hammett (13, 14,
this matrix yields a single number, which is 77, 78). Hammett employed, as a model reac-
maximized for compounds expressing maxi- tion, the ionization in water of substituted
mum variance and minimum covariance (69- benzoic acids and determined their equilib-
71). A combination of fractional factorial de- rium constants K,. See Equation 1.28. This
sign in tandem with a principal property led to an operational definition of u, the sub-
approach has proven useful in QSAR (72). Ex- stituent constant. It is a measure of the size of
tensions of this approach using multivariate the electronic effect for a given substituent
design have shown promise in environmental and represents a measure of electronic charge
QSAR with nonspecific responses, where the distribution in the benzene nucleus.
clusters overlap and a cluster-based design ap-
proach has to be used (73). With strongly clus-
tered data containing several classes of com-
pounds, a new strategy involving local
multivariate designs within each cluster is de-
scribed. The chosen compounds from the local Electron-withdrawing substituents are thus
COOH ceptibility of a reaction to substituent effects.

I A positive rho value suggests that a reaction is
aided by electron withdrawal from the reac-
tion site, whereas a negative rho value implies
that the reaction is assisted by electron dona-
tion at the reaction site. Hammett also drew
attention to the fact that a plot of log KA for
benzoic acids versus log k for ester hydrolysis
COO-
of a series of molecules is linear, which sug-
I gests that substituents exert a similar effect in
dissimilar reactions.
kx
log -
Kx
log - = p -a (1.32)
AH KH
characterized by positive values, whereas elec- Although this expression is empirical in na-
tron-donating ones have negative values. In ture, it has been validated by the sheer volume
an extension of this approach, the ionization of positive results. It is remarkable because
of substituted phenylacetic acids was mea- four different energy states must be related.
sured. A correlation of this type is clearly mean-
ingful; it suggests that changes in structure
produce proportional changes in the activa-
tion energy AG* for such reactions. Hence, the
derivation of the name for which the Hammett
equation is universally known: linear free en-
ergy relationship (LFER). Equation 1.32 has
become known as the Hammett equation and
has been applied to thousands of reactions
that take place at or near the benzene ring
bearing substituents at the meta and para po-
sitions. Because of proximity and steric ef-
fects, ortho-substituted molecules do not al-
ways follow this maxim and are subject to
different parameterizations. Thus, an ex-
panded approach was established by Charton
(79) and Fujita and Nishioka (80). Charton
partitioned the ortho electronic effect into its
The effect of the 4-C1 substituent on the ion- inductive, resonance, and steric contribu-
ization of 4 4 1 phenylacetic acid (PA) was tions; the factors a, p, and X are susceptibility
found to be proportional to its effect on the or reaction constants and h is the intercept.
ionization of 4-C1 benzoic acid (BA).
Log k = aa, + paR+ Xr, + h (1.33)
Fujita and Nishioka used an integrated ap-

proach to deal with ortho substituents in data
(1.31)
K'a sets including meta and para substituents.
then log--,= pea
K H
Log k = pa + GEsodhO+ fFOrth,+ C (1.34)
p (rho) is defined as a proportionality or reac-
tion constant, which is a measure of the sus- For ortho substituents, para sigma values
were used in addition to Taft's Es values and aObs(3

,4,5-trichlorobenzoic acid) = 0.95
Swain-Lupton field constants F,,,,.
The reason for employing alternative treat- Sigma values for smaller substituents are
ments to ortho-substituted aromatic mole- more likely to be additive. However, in the
cules is that changes in rate or ionization con- case of 3-methyl, 4-dimethylaminobenzoic
stants mediated by meta or para substituents acid, the discrepancy is high. For example,
are mostly changes in (@or AiT because sub-
stitution does not affect AS* or AS". Ortho 2 acdc(3-CH,, 4-N(CH3), benzoic acid)
substituents affect both enthalpy and entropy;
the effect on entropy is noteworthy because
entropy is highly sensitive to changes in the
size of reagents and substituents as well as 2 uobs(3-CH3,4-N(CH3)2benzoic acid)
degree of solvation. Bolton et al. examined the
ionization of substituted benzoic acids and
measured accurate values for AG, AH, and A S The large discrepancy may be attributed to
(81). A hierarchy of different scenarios, under the twisting of the dimethylamino substitu-
which an LFER operates, was established: ent out of the plane of the benzene ring,
resulting in a decrease in resonance. Exner
1. AIP is constant and A S varies for a series. and his colleagues have critically examined
2. AS" is constant and AH varies. the use of additivity in the determination of
3. AiT and AS" vary and are shown to be lin- a constants (82).
early related. 3. Changes in mechanism or transition state
4. Precise measurements indicated that cate- cause discontinuities in Hammett plots.
gory 3 was the prevalent behavior in ben- Nonlinear plots are often found in reac-
zoic acids. tions that proceed by two concurrent path-
ways (83,84).
Despite the extensive and successful use in 4. Changes in solvent may lead to dissimilar-
QSAR studies, there are some limitations to ities in reaction mechanisms. Thus extrap-
the Hammett equation. olation of u values from a polar solv'ent
(e.g., CH,CN) to a nonpolar solvent such as
1. Primary a values are obtained from the benzene has to be approached cautiously.
thermodynamic ionizations of the appro- Solvation properties will differ consider-
ably, particularly if the transition state is
priate benzoic acids at 25°C; these are reli-
able and easily available. Secondary values -polar andlor the substituents are able to
interact with the solvent.
are obtained by comparison with another
series of compounds and are thus subject to 5. A strong positional dependency of sigma
makes it imperative to use appropriate val-
error because they are dependent on the
ues for positional, isomeric substituents.
accuracy of a measured series and the de-
Substituents ortho to the reaction center
velopment of a regression line using statis- are difficult to describe and thus one must
tical methods. resort to a Fujita-Nishioka analysis (80).
2. In some multisubstituted compounds, the 6. Thorough resonance or direct conjugation
lack of additivity needs to be noted. Proxi- effects cause a breakdown in the Hammett
mal effects are operative and tend to distort equation. When coupling occurs between
electronic contributions. For example, the substituent and the reaction center
through the pi-electron system, reactivity
2 aCdc(3,4,5-trichlorobenzoic acid) is enhanced, diminished, or mitigated by
= 0.97; separation. In a study of X-cumyl chlorides,
Brown and Okamoto noticed the strong
thatis, 2 a M +up or 2(0.37) + 0.23 conjugative interaction between lone-pair,
para substituents and the vacant p-orbital (a*) of a substituent R' in the ester R' COOR,
in the transition state, which led to devia- where B and A refer to basic and acidic hydro-
tions in the Hammett plot (85). They de- lysis, respectively.
fined a modified LFER applicable to this
situation.
KY
Log- = ( p + ) ( a + )
kH The factor of 2.48 was used to make a* equi-
scalar with Hammett a values. Later, a aI
a+ was a new substituent constant that ex- scale derived from the ionization of 4-X-
pressed enhanced resonance attributes. A bicyclo[2.2.2]octane-1-carboxylic acids was
similar situation was noticed when a strong shown to be related to a* (87, 88). It is now
donor center was present as a reactant or more widely used than a*.
formed as a product (e.g., phenols and m i -
lines). In this case, strong resonance interac-
tions were possible with electron-withdrawing
groups (e.g., NO, or CN). A scale for such sub- Ionization is a function of the electronic
stituents was constructed such that structure of an organic drug molecule. Albert
was the first to clearly delineate the relation-
ship between ionization and biological activity
(89). Now, pKa values are widely used as the
independent variable in physical organic reac-
One shortcoming of the benzoic acid sys- tions and in biological systems, particularly
when dealing with transport phenomena.
tem is the extent of coupling between the car-
However, caution must be exercised in inter-
boxyl group and certain lone-pair donors. In-
preting the dependency of biological activity
sertion of a methylene group between the core
on pKa values because pKa values are inher-
(benzene ring) and the functional group ently composites of electronic factors that are
(COOH moiety) leads to phenylacetic acids used directly in QSAR analysis.
and the establishment of a0scale from the ion- In recent years, there has been a rapid
ization of X-phenylacetic acids. A flexible growth in the application of quantum chemi-
method of dealing with the variability of the cal methodology to QSAR, by direct derivation
resonance contribution to the overall elec- of electronic descriptors from the molecular
tronic demand of a reaction is embodied in the wave functions (90). The two most popular
Yukawa-Tsuno equation (86). It includes nor- methods used for the calculation of quantum
and enhanced resonance contributions to chemical descriptors are ab initio (Hartree-
Fock) and semiempirical methods. As in other
electronic parameters, QSAR models incorpo-
k~ rating quantum chemical descriptors will in-
Log -= p[a
kH
+ r(a+- a ) ] (1.37)
clude information on the nature of the inter-
molecular forces involved in the biological
where r is a measure of the degree of enhanced response. Unlike other electronic descriptors,
resonance interaction in relation to benzoic there is no statistical error in quantum chem-
acid dissociations (r = 0) and cumyl chloride ical computations. The errors are usually
hydrolysis (r = 1). made in the assumptions that are established
Most of the Hammett-type constants per- to facilitate calculation (91). Quantum chemi-
tain to aromatic systems. In evaluating an cal descriptors such as net atomic changes,
electronic parameter for use in aliphatic sys- highest occupied molecular orbitalllowest un-
tems, Taft used the relative acid and base hy- occupied molecular orbital (HOMO-LUMO)
drolysis rates for esters. He developed equa- energies, frontier orbital electron densities,
tion 1.38 as a measure of the inductive effect and superdelocalizabilities have been shown
to correlate well with various biological activ- vised and used a multiparameter approach
ities (92). A mixed approach using frontier or- that included both electronic and hydrophobic
bital theory and topological parameters have terms, to establish a QSAR for a series of plant
been used to calculate Hammett-like substitu- growth regulators (16). This study laid the ba-
ent constants (93). sis for the development of the QSAR paradigm
and also firmly established the importance of
lipophilicity in biosystems. Over the last 40
years, no other parameter used in QSAR has
generated more interest, excitement, and con-
troversy than hydrophobicity (96). Hydropho-
bic interactions are of critical importance in
many areas of chemistry. These include en-
zyme-ligand interactions, the assembly of lip-
ids in biomembranes, aggregation of surfac-
In Equation 1.40, AN represents the extent tants, coagulation, and detergency (97-100).
of electron transfer between interacting ac- The integrity of biomembranes and the ter-
id-base systems; AE is the energy decrease in tiary structure of proteins in solution are de-
bimolecular systems underlying electron termined by apolar-type interactions.
transfer; D X D H (EAH/EAx)corresponds to Molecular recognition depends strongly on
electron affinity and distance terms; and hydrophobic interactions between ligands and
OS, factors the electrotopological state in- receptors. Excellent treatises on this subject
dex, whereas E a is the number of all a-elec- have been written by Taylor (101) and Blokzijl
trons in the functional group. Observed and Engerts (51). Despite extensive usage of
principal component analysis (PCA) cluster- the term hydrophobic bond, it is well known
ing of 66 descriptors derived from AM1 cal- that there is no strong attractive force be-
culations was similar to that previously re- tween apolar molecules (102). Frank and
ported for monosubstituted benzenes (94, Evans were the first to apply a thermodynamic
95). The advantages of quantum chemical treatment to the solvation of apolar molecules
descriptors are that they have definite in water at room temperature (103). Their
meaning and are useful in the elucidation of "iceberg" model suggested that a large en-
intra- and intermolecular interactions and tropic loss ensued after the dissolution of apo-
can easily be derived from the theoretical lar compounds and the increased structure of
structure of the molecule. water molecules in the surrounding apolar sol-
ute. The quantitation of this model led to the
development of the "flickering" cluster model
3.2 Hydrophobicity Parameters
of NBmethy and Scheraga, which emphasized
More than a hundred years ago, Meyer and the formation of hydrogen bonds in liquid wa-
Overton made their seminal discovery on the ter (104). The classical model for hydrophobic
correlation between oiltwater partition coeffi- interactions was delineated by Kauzmann to
cients and the narcotic potencies of small or- describe the van der Waals attractions be-
ganic molecules (7,8). Ferguson extended this tween the nonpolar parts of two molecules im-
analysis by placing the relationship between mersed in water. Given that van der Waals
depressant action and hydrophobicity in a forces operate over short distances, the water
thermodynamic context; the relative satura- molecules are squeezed out in the vicinity of
tion of the depressant in the biophase was a the mutually bound apolar surfaces (49). The
critical determinant of its narcotic potency (9). driving force for this behavior is not that al-
At this time, the success of the Hammett equa-
- kanes "hate" water, but rather water that
tion began to permeate structure-activity "hates" alkanes (105, 106). Thus, the gain in
studies and hydrophobicity as a determinant entropy appears as the critical driving force
was relegated to the background. In a land- for hydrophobic interactions that are primar-
mark study, Hansch and his colleagues de- ily governed by the repulsion of hydrophobic
solutes from the solvent water and the limited amphiphilicity and hydrogen-bonding capabil-
but important capacity of water to maintain ity with phospholipids and proteins found in
its network of hydrogen bonds. biological membranes.
Hydrophobicities of solutes can readily be The choice of the octanollwater partition-
determined by measuring partition coeffi- ing system as a standard reference for assess-
cients designated as P. Partition coefficients ing the compartmental distribution of mole-
deal with neutral species, whereas distribu- cules of biological interest was recently
tion ratios incorporate concentrations of investigated by molecular dynamics simula-
charged andlor polymeric species as well. By tions (111).It was determined that pure l-oc-
convention, P is defined as the ratio of concen- tan01 contains a mix of hydrogen-bonded
tration of the solute in octanol to its concen- "polymeric" species, mostly four-, five-, and
tration in water. six-membered ring clusters at 40°C. These
small ring clusters form a central hydroxyl
core from which their corresponding alkyl
chains radiate outward. On the other hand,
It was fortuitous that octanol was chosen as water-saturated octanol tends to form well-de-
the solvent most likely to mimic the biomem- fined, inverted, micellar aggregates. Long hy-
brane. Extensive studies over the last 35 years drogen-bonded chains are absent and water
(40,000 experimental P-values in 400 different molecules congregate around the octanol hy-
solvent systems) have failed to dislodge octa- droxyls. "Hydrophilic channels" are formed by
no1 from its secure perch (107,108). cylindrical formation of water and octanol hy-
Octanol is a suitable solvent for the mea- droxyls with the alkyl chains extending out-
surement of partition coefficients for many ward. Thus, water-saturated octanol has cen-
reasons (109, 110). It is cheap, relatively non- tralized polar cores where polar solutes can
toxic, and chemically unreactive. The hy- localize. Hydrophobic solutes would migrate
droxyl group has both hydrogen bond acceptor to the alkyl-rich regions. This is an elegant
and hydrogen bond donor features capable of study that provides insight into the partition-
interacting with a large variety of polar ing of benzene and phenol by analyzing the
groups. Despite its hydrophobic attributes, it structure of the octanollwater solvation shell
is able to dissolve many more organic com- and delineating octanol's capability to serve as
pounds than can alkanes, cycloalkanes, or ar- a surrogate for biomembranes.
The shake-flask method, so-called, is most
omatic hydrocarbons. It is UV transparent
commonly used to measure partition coeffi-
over a large range and has a vapor pressure
cients with great accuracy and precision and
low enough to allow for reproducible measure- with a log P range that extends from -3 to +6
ments. It is also elevated enough to allow for (112, 113). The procedure calls for the use of
its removal under mild conditions. In addition, pure, distilled, deionized water, high-purity
water saturated with octanol contains only octanol, and pure solutes. At least three con-
M octanol at equilibrium, whereas octa- centration levels of solute should be analyzed
no1 saturated with water contains 2.3 M of and the volumes of octanol and water should
water. Thus, polar groups need not be totally be varied according to a rough estimate of the
dehydrated in transfer from the aqueous log P value. Care should be exercised to ensure
phase to the organic phase. Likewise, hydro- that the eventual amounts of the solute in
phobic solutes are not appreciably solvated by each phase are about the same after equilib-
the M octanol in the water phase unless rium. Standard concentration curves using
their intrinsic log P is above 6.0. Octanol be- three to four known concentrations in water
gins to absorb light below 220 nm and thus saturated with octanol are usually estab-
solute concentration determinations can be lished. Generally, most methods employ a UV-
monitored by W spectroscopy. More impor- based procedure, although GC and HPLC may
tant, octanol acts as an excellent mimic for also be used to quantitate the concentration of
biomembranes because it shares the traits of the solute.
Generally, 110-mLstopped centrifuge tubes or donor, and proton acceptor-and they were rep-
2WmL centrifuge bottles are used. They are in- resented by alkanes, odanol, chloroform, and
verted gently for 2-3 min and then centrifuged at propyleneglycol dipelargonate (PGDP), respec-
1000-2000 g for 20 min before the phases are an- tively. The demands of measuringfour partition
alyzed. Analysis of both phases is highly recom- coefficients for each solute has slowed progress
mended, to minimize errors incurred by adsorp in this particular area.
tion to glass walls at low solute concentration. For
highly hydrophobic compounds, the slow stirring 3.2.1 Determination of Hydrophobicity by
Chromatography. Chromatography provides
procedure of de B d j n and Hermens is recom-
an alternate tool for the estimation of hydro-
mended (114).The filler probe extractor system of
phobicity parameters. R, values derived from
Tornlinson et al. is a modified, automated, shake
thin-layer chromatography provide a simple,
flask method, which is efficient, fast, reliable, and rapid, and easy way to ascertain approximate
flexible (115). values of hydrophobicity (122,123).
Partition coefficients from different sol-
vent systems can also be compared and con-
verted to the octanollwater scale, as was sug-
gested by Collander (116). He stressed the Other recent developments in chromatogra-
importance of the following linear relation- phy techniques have led to the development
ship: log P, = a log P, + b. This type of rela-
. ~
of powerful tools to rapidly and accurately

tionship works well when the two solvents are measure octanol/water partition coefficients.
both alkanols. However, when two solvent sys- Countercurrent chromatography is one of
tems have varying hydrogen bond donor and these methods. The stationary and mobile
acceptor capabilities, the relationship tends to ~ h a s e include
A
s two nonmiscible solvents (wa-
fray. A classical example involves the relation- ter and octanol) and the total volume of the
ship between log P values in chloroform and liquid stationary phase is used for solute par-
octanol(ll7, 118). titioning (124,125). Log P,, values of several
diuretics including ionizable drugs have been
Log Po,,, = 1.012 log P,, - 0.513 (1.42) measured at different pH values using coun-
tercurrent chromatography; the log P values
ranged from -1.3 to 2.7 and were consistknt
with literature values (126).
Only 66% of the variance in the data is ex- Recently, a rapid method for the determi-
plained by this equation. However, a separation nation of partition coefficients using gradient
of the various solutes into OH bond donors, ac- reversed phasehigh pressure liquid chroma-
ceptors, and neutrals helped account for 94% of tography (RP-HPLC) was developed. This
the variance in the data. These restrictions led method is touted as a high-throughput hydro-
Seiler to extend the Collander equation by incor- phobicity screen for combinatorial libraries
porating a corrective term for H-bonding in the (127, 128). A chromatography hydrophobicity
cyclohexane system (119). Fujita generalized index (CHI) was established for a diverse set of
this approach and formulated Equation 1.43 as compounds. Acetonitrile was used as the mod-
shown below (120). ifier and 50 mm ammonium acetate as the mo-
bile phase (127). A linear relationship was es-
log P2 = a log P , + 2 bi. HBi + C (1.43) tablished between Clog P and CHIN for
neutral molecules.
P, is the reference solvent and HB, is an H-
bonding parameter. Leahy et al. suggested that Clog P = 0.057 CHIN - 1.107 (1.45)
a more sophisticated approach incorporating
four model systems would be needed to ade-
quately address issues of solute partitioning in
membranes (121). Thus, four distinct solvent A more recent study using RP-HPLC for the
types were chosen-apolar, amphiprotic, proton determination of log P (octanol) values for
neutral and weakly acidic and basic drugs, beginning that not all hydrogens on aromatic
revealed an excellent correlation between systems could be substituted without correc-
log Po,, and log Kw values (129). Log Po,, tion factors because of strong electronic inter-
values determined in this system are reactions. It became necessary to determine .rr
ferred to as Elog Po,,. They were expressed values in various electron-rich and -deficient
in terms of solvation parameters. systems (e.g., X-phenols and X-nitroben-
zenes). Correction factors were introduced for
special features such as unsaturation, branch-
ing, and ring fusion. The proliferation of
T-scales made it difficult to ascertain which
system was more appropriate for usage, par-
ticularly with complex structures.
The shortcomings of this approach pro-
vided the impetus for Nys and Rekker to de-
sign the fragmental method, a "reductionist"
In this equation, R, is the excess molar re- approach, which was based on the statistical
fraction; ,rr,H is the dipolarity/polarizability; analysis of a large number of measured parti-
2 aZHand 2 p,O are the summation of hydro- tion coefficients and the subsequent assign-
gen bond acidity and basicity values, respec- ment of appropriate values for particular mo-
tively; and V, is McGowan's volume. lecular fragments (118, 134). Hansch and Leo
took a "constructionist" approach and devel-
3.2.2 Calculation Methods. Partition coef- oped a fragmental system that included cor-
ficients are additive-constitutive, free energy- rection factors for bonds and proximity effects
related properties. Log P represents the over- (1, 135). Labor-intensive efforts and inconsis-
all hydrophobicity of a molecule, which tency in manual calculations were eliminated
includes the sum of the hydrophobic contribu- with the debut of the automated system
tions of the "parent" molecule and its sub- CLOGP and its powerful SMILES notation
stituent. Thus, the .rr value for a substituent (136-138). Recent analysis of the accuracy of
may be defined as CLOGP yielded Equation 1.48 (139).
.
MLOGP = 0.959 CLOGP + 0.08 (1.48)
% is set to zero. The n-value for a nitro

substituent is calculated from the log P of ni-
trobenzene and benzene. The Clog P values of 228 structures (1.8%
of the data set) were not well predicted. It
must be noted that Starlist (most accurate val-
ues in the database) contains almost 300
charged nitrogen solutes (ammonium, pyri-
dinium, imidazolium, etc.) and over 2200 in
An extensive list of T-values for aromatic all, which amounts to 5% of Masterfile (data-
substituents appears in Table 1.4. Pi values base of measured values). CLOGP adequately
for side chains of amino acids in peptides have handles these molecules within the 0.30 stan-
been well characterized and are easily avail- dard deviation limit. Most other programs
able (130-132). Aliphatic fragments values make no attempt to calculate them. For more
were developed a few years later. For a more details on calculating log Po, from structures,
extensive list of substituent value constants, see excellent reviews by Leo (140, 141).
refer to the extensive compilation by Hansch The proliferation of methodologies and
et al. (133). Initially, the T-system was applied programs to calculate partition coefficients
only to substitution on aromatic rings and continues unabated. These programs are
when the hydrogen being replaced was of in- based on substructure approaches or whole-
nocuous character. It was apparent from the molecule approaches (142, 143). Substructure
3 Parameters Used in QSAR 19
Table 1.4 Substituent Constants for QSAR Analysis

No. Substituent Pi MR L B1 B5 S-P S-M
+N(CH3)3
EtN(CH,),+
CH,N(CH,)3+
C0,-
+ NH,
PR-N(CH3),+
CH2NH3+
10,
C(CN)3
NHNO,
C(N0,)3
SOZ(NH2)
C(CN)=C(CN),
CH,O(NH,)
N(COCH3)Z
SO,CH,
P(O)(OH),
-(CHJ
N(S02CH3)2
-(NH,)
CH(CN)2
CH,NHCOCH3
NHC=S(NH,)
NH(OH)
CH=NNHCONHNH,
NHO(NH,)
-(NHCH3)
2-Aziridinyl
NH2
NHSO,CH,
P(O)(OCHa)z
C(CH3)(CN),
N(CH3)S0,CH3
S0,Et
CH,NH,
1-Tetrazolyl
CH,OH
N(CH3)COCH3
NHCHO
NHC(==O)CH3
C(CH3)(NOz)z
NHNH,
0S0,CH3
S02N(CH3),
NHC=%NHC,H,)
SOZ(CHF2)
OH
CHO
CH,CHOHCH3
CS(NH2)
OC--O(CH3)
SOCHF,
4-Pyrimidinyl
2-Pyrimidinyl
Table 1.4 (Continued)

P(CF3),
CH,CN
CN
COCH,
CH,P=O(OEt),
P=O(OEt),
NHCOOMe
NHC==O(NHC,H,)
NH-(CH2C1)
NHCH,
N(CH,)COCF,
C--S(NHCH,)
NHC=S(CH,)
C(Et)(NO,),
COzH
C(OH)(CH,),
EtC0,H
NO2
CH=NNHCSNH,
NHCN
CH&(OH)(CH,),
CH===CHCHO
NHCH,CO,Et
CH20CH3
NH-CH(CH,),
CH,O-(CH,)
CHzN(CH3)z
CH,SCN
1-Aziridinyl
NO
ONO,
H
-(CFJ
CH=C(CN),
SO,(F)
COEt
C(CFJ3
NH-Et
NHM(CF,)
S-(CH,)
CF3
OCH,F
CHSHNOJTR)
CH,F
F
C(OMe),
SECF,
NHM(0Et)
CH,C1
N(CH,)z

CHF,
CCCF,
SO,C,H5
COCH(CH,),
OCHF,
CH,SO,CF,
C(NOz)(CHJ,
P(O)(OPR),
CH2M(CF3)
0CH2CH3
SH
N=NCF,
CCH
N=CCl,
SCCH
SCN
P(CH&
NHSO,C,H,
S0,NHC,H5
CH,CF,
NNN
NNN
4-wdyl
N=NN(CH,),
(JFO(NHC&)
2-Pyridyl
0CH2CH=CH2
C=O(OEt)
S=O(CF3)
CHOHC,H,
0CH2CI
SOZ(CF3)
'333
SCH3
SC=O(CFJ
COC(CH3),
CH=NC,H,
P=O(C&&
C1
N=CHC,H,
SeCH,
SCH2F
OCH=CH,
CH,Br
CCCH,
CHdH2
Br
NHSO,CF,
0S02C6H5
1-Pyrryl
N(CH,)SO,CF,
SCHF,
CH,CH3
OCF,


Cyclopentyl
CHI,
SC6H5
1-Cyclohexenyl
OCCl,
C(Et)(CHJz
CH,C(CH3),
SC6H4N0,-p
SCF,CHF,
C6H4C1-p
C6F5
C5Hn
CCC6H5
CBr,
EtC6H5
C&(CH3)-p
C6H41~
C6H41-m
1-Adamantyl
C(Et),
CH(C&),
N(C,H&
Heptyl
C(SCF3)3
C6C15
methods are based on molecular fragments, hydrogen bond donor strength, respectively;
atomic contributions, or computer-identified and e is the intercept. An extension of this
fragments (1, 106, 107, 144-147). Whole-mol- model has been formulated by Abraham and
ecule approaches use molecular properties or used by researchers to refine molecular de:
spatial properties to predict log P values (148- scriptors and characterize hydrophobicity
150). They run on different platforms (e.g., scales (153-156).
Mac, PC, Unix, VAX, etc.) and use different
calculation procedures. An extensive, recent
3.3 Steric Parameters
review by Mannhold and van de Waterbeemd
addresses the advantages and limitations of The quantitation of steric effects is complex at
the various approaches (143). Statistical pa- best and challenging in all other situations,
rameters yield some insight as to the effective- particularly at the molecular level. An added
ness of such programs. level of confusion comes into play when at-
Recent attempts to compute log P calcula- tempts are made to delineate size and shape.
tions have resulted in the development of sol- Nevertheless, sterics are of overwhelming im-
vatochromic parameters (151, 152). This ap- portance in ligand-receptor interactions as
proach was proposed by Kamlet et al. and well as in transport phenomena in cellular sys-
focused on molecular properties. In its sim- tems. The first steric parameter to be quanti-
plest form it can be expressed as follows: fied and used in QSAR studies was Taft's Es
constant (157). Es is defined as
V is a solute volume term; T* represents

the solute polarizability; P, and a , are mea- where k , and k , represent the rates of acid
sures of hydrogen bond acceptor strength and hydrolysis of esters, XCH,COOR and CH,COOR,
respectively. To correct for hyperconjuga- must be taken in the QSAR analysis of such
tion in the a-hydrogens of the acetate moi- derivatives. The MR descriptor does not dis-
ety, Hancock devised a correction on Es such tinguish shape; thus the MR value for amyl
that (-CH2CH2CH2CH2CH,)is the same as that
for [-C(Et)(CH,),]: 2.42. The coefficients
with MR terms challenge interpretation, al-
though extensive experience with this param-
In Equation 1.51, n represents the num- eter suggests that a negative coefficient im-
plies steric hindrance at that site and a
ber of a-hydrogens and 0.306 is a constant
positive coefficient attests to either dipolar in-
derived from molecular orbital calculations
teractions in that vicinity or anchoring of a
(158). Unfortunately, the limited availabil-
ligand in an opportune position for interaction
ity of Es and E s C values for a great number (161).
of substituents precludes their usage in The failure of the MR descriptor to ade-
QSAR studies. Charton demonstrated a quately address three-dimensional shape is-
strong correlation between Es and van der sues led to Verloop's development of STERI-
Waals radii, which led to his development of MOL parameters (162), which define the
the upsilon parameter y, (159). steric constraints of a given substituent along
several fixed axes. Five parameters were
deemed necessary to define shape: L, B1, B2,
B3, and B4. L represents the length of a sub-
where r, and r , are the minimum van der stituent along the axis of a bond between the
Waals radii of the substituent and hydrogen, parent molecule and the substituent; B1 to B4
respectively. Extension of this approach represent four different width parameters.
from symmetrical substituents to nonsym- However, the high degree of collinearity be-
metrical substituents must be handled with tween B1, B2, and B3 and the large number of
caution. training set members needed to establish the
One of the most widely used steric param- statistical validity of this group of parameters
eters is molar refraction (MR), which has led to their demise in QSAR studies. Verloop
been aptly described as a "chameleon" pa- subsequently established the adequacy of jqst
three parameters for QSAR analysis: a slightly
rameter by Tute (160). Although it is gener-
modified length L, a minimum width B1, and a
ally considered to be a crude measure of
maximum width B5 that is orthogonal to L
overall bulk, it does incorporate a polariz- (163). The use of these insightful parameters
ability component that may describe cohe- have done much to enhance correlations with
sion and is related to London dispersion biological activities. Recent analysis in our
forces as follows: MR = 47rNd3, where N is laboratory has established that in many cases,
Avogadro's number and a is the polarizabil- B1 alone is superior to Taft's Es and a combi-
ity of the molecule. It contains no informa- nation of B1 and B5 can adequately replace Es
tion on shape. MR is also defined by the (164).
Lorentz-Lorenz equation: Molecular weight (MW) terms have also
been used as descriptors, particularly in cellu-
lar systems, or in distributionltransport stud-
ies where diffusion is the mode of operation.
According to the Einstein-Sutherland equa-
tion, molecular weight affects the diffusion
MR is generally scaled by 0.1 and used in bio- rate. The Log MW term has been used exten-
logical QSAR, where intermolecular effects sively in some studies (159-161)and an exam-
are of primary importance. The refractive in- ple of such usage is given below. In correlating
dex of the molecule is represented by n. With permeability (Perm) of noneledrolytes through
alkyl substituents, there is a high degree of chara cells, Lien et al. obtained the following
collinearity with hydrophobicity; hence, care QSAR (168):
3 Parameters Used in QSAR 25
Log Perm acid in guinea pig leukocytes by X-vinyl cat-

echols led to the development of the following
= 0.889 log P* - 1.544 log MW (1.54) QSAR (171):
Log 11C
In QSAR 54, Log P* represents the olive oil/

water partition coefficient, MW is the molec-
ular weight of the solute and defines its size,
and Hb is a crude approximation of the total
number of hydrogen bonds for each mole-
cule. The molecular weight descriptor has Log Po = 4.61(?0.49) Log P = -4.33
also been an omnipresent variable in QSAR
studies pertaining to cross-resistance of var- The indicator variables are D2 and D3; for
ious drugs in multidrug-resistant cell lines simple X-catechols, D2 = 1 and for X-naphtha-
(169). was used because it most lene diols, D3 = 1. The negative coefficients
closely approximates the size (radii) of the with both terms (D2 and D3) underscore the
drugs involved in the study and their inter- detrimental effects of these structural fea-
actions with GP-170. See QSAR 1.55. tures in these inhibitors. Thus, discontinuities
in the structural features of the molecules of
Log CR = 0.70 w this data set are accounted for by the use of
indicator variables. An indicator variable may
- 1.01 l0g(Ps10 + 1) be visualized graphically as a constant that
(1.55) adjusts two parallel lines so that they are su-
- 0.10 log P + 0.381 perimposable. The use of indicator variables
in QSAR analysis is also described in the fol-
lowing example. An analysis of a comprehen- .
sive set of nitroaromatic and heteroaromatic
compounds that induced mutagenesis in TA98
log /3 = -6.851 optimum = 7.21 cells was conducted by Debnath et al., and
QSAR 1.57 was formulated (172).
3.4 Other Variables and Variable Selection
Log TA98
Indicator variables ( I )are often used to high-
light a structural feature present in some of
the molecules in a data set that confers un-
usual activity or lack of it to these particular
members. Their use could be beneficial in
cases where the data set is heterogeneous and
includes large numbers of members with un-
usual features that may or may not impact a
biological response. QSAR for the inhibition of
trypsin by X-benzamidines used indicator
variables to denote the presence of unusual
features such as positional isomers and vinyl/ Log Po = 4.93(%0.35) Log P = -5.48
carbonyl-containing substituents (170). A re-
cent study on the inhibition of lipoxygenase TA98 represents the number of revertants per
catalyzed production of leukotriene B4 and nanomole of nitro compound. E,,,, is the
5-hydroxyeicosatetraenoic from arachidonic energy of the lowest unoccupied molecular or-
bital and I, is an indicator variable that signi- tively, in a molecule. To correct for differences
fies the presence of an acenthrylene ring in the in valence, Kier and Hall proposed a valence
mutagens. I, is also an indicator variable that delta (6") term to calculate valence connectiv-
pertains to the number of fused rings in the ity indices (175).
data set. It acquires a value of 1 for all conge- Molecular connectivity indices have been
ners containing three or more fused rings and shown to be closely related to many physico-
a value of zero for those containing one or two chemical parameters such as boiling points,
fused rings (e.g., naphthalene, benzene). molar refraction, polarizability, and partition
Thus, the greater the number of fused rings, coefficients (174, 176). Ten years ago, the E-
the greater the mutagenicity of the nitro con- State index was developed to define an atom-
geners. The EL,,, term indicates that the or group-centered numerical code to represent
lower the energy of the LUMO, the more po- molecular structure (28). The E-State was es-
tent the mutagen. In this QSAR the combina- tablished as a composite index encoding both
tion of indicator variables affords a mixed electronic and steric properties of atoms in
blessing. One variable helps to enhance activ- molecules. It reflects an atom's electronegativ-
ity, whereas the other leads to a decrease in ity, the electronegativity of proximal and dis-
mutagenicity of the acenthrylene congeners. tal atoms, and topological state. Extensions of
In both these QSAR, Kubinyi's bilinear model this method include the HE-State, atom-type
is used (21).See Section 4.2 for a description of E-State, and the polarity index Q . Log P
this approach. showed a strong correlation with the Q index
of a small set (n = 21) of miscellaneous com-
3.5 Molecular Structure Descriptors pounds (28). Various models using electroto-
pological indices have been developed to delin-
These are truly structural descriptors because eate a variety of biological responses
they are based only on the two-dimensional (177-179). Some criticism has been leveled at
representation of a chemical structure. The this approach (180, 181). Chance correlations
most widely known descriptors are those that are always a problem when dealing with such
were originally proposed by Randic (173) and a wide array of descriptors. The physico-
extensively developed by Kier and Hall (27). chemical interpretation of the meaning of
The strength of this approach is that the re- these descriptors is not transparent, although
quired information is embedded in the hydro- attempts have been made to address thi's
gen-suppressed framework and thus no exper- issue (27).
imental measurements are needed to define
molecular connectivity indices. For each bond
the Ck term is calculated. The summation of 4 QUANTITATIVE MODELS
these terms then leads to the derivation of X,
the molecular connectivity index for the mol- 4.1 Linear Models
ecule. The correlation of biological activity with
physicochemical properties is often termed an
extrathermodynamic relationship. Because it
follows in the line of Hammett and Taft equa-
S is the count of formally bonded carbons and tions that correlate thermodynamic and re-
h is the number of bonds to hydrogen atoms. lated parameters, it is appropriately labeled.
The Hammett equation represents relation-
ships between the logarithms of rate or equi-
librium constants and substituent constants.
'X is the first bond order because it considers The linearity of many of these relationships
only individual bonds. Higher molecular con- led to their designation as linear free energy
nectivity indices encode more complex at- relationships. The Hansch approach repre-
tributes of molecular structure by considering sents an extension of the Harnmett equation
longer paths. Thus, 2X and 3X account for all from physical organic systems to a biological
two-bond paths and three-bond paths, respec- milieu. It should be noted that the simplicity
4 Quantitative Models
of the approach belies the tremendous com- the mode of interactions of chemicals with bi-
plexity of the intermolecular interactions at ological entities. Examples of linear models
play in the overall biological response. pertaining to nonspecific toxicity are de-
Biological systems are a complex mix of het- scribed. The effects of a series of alcohols
erogeneous phases. Drug molecules usually tra- (ROH) have been routinely studied in many
verse many of these phases to get from the site of model and biological systems. See QSAR 1.63-
administration to the eventual site of action. 1.67.
Along this random-walk process, they perturb
many other cellular components such as or- 4.1.1 Penetration of ROH into Phosphati-
ganelles, lipids, proteins, and so forth. These in- dylcholine Monolayers (1 84)
teractions are complex and vastly different from
organic reactions in test tubes, even though the Log 1/C = 0.87(?0.01)logP
eventual interaction with a receptor may be (1.63)
chemical or physicochemical in nature. Thus, + 0.66(&0.01)
depending on the biological system involved-
isolated receptor, cell, or whole animal-one ex-
pects the response to be multifactorial and com-
plex. The overall process, particularly in vitro or 4.1.2 Changes in EPR Signal of Labeled
in vivo, studies a mix of equilibrium and rate Ghost Membranes by ROH (185)
processes, a situation that defies easy separation
and delineation. Log 1/C = 0.93(?0.09)logP
Meyer and Overton were the first to attempt
to get a grasp on biological responses by noting
the relationship between oillwater partition co-
efficients and their narcotic activity. Ferguson
recognized that equitoxic concentrations of 4.1.3 Induction of Narcosis in Rabbits by
small organic molecules was markedly influ- ROH (184)
enced by their phase distribution between the
biophase and exobiophase. This concept was
Log 1/C = 0.72(?0.16)logP
generalized in the form of Equation 1.60 and
extended by Fylita to Equation 1.61 (182,183).
Log 1/C = m Log(1lA) + constant (1.61) 4.1.4 lnhibition of Bacterial Luminescence

by ROH (1 85)
C represents the equipotent concentration, k
and m are constants for a particular system, Log 1/C = 1.10(+0.07)logP
and A is a physicochemical constant represen- (1.66)
tative of phase distribution equilibria such as + 0.16(20.12)
aqueous solubility, oillwater partition coeffi-
cient, and vapor pressure. In examining a
large and diverse number of biological systems,
Hansch and coworkers defined a relationship 4.1.5 lnhibition of Growth of Tetrahymena
(Equation 1.62) that expressed biological ac- pyriformis by ROH (76, 186)
tivity as a function of physicochemical param-
eters (e.g., partition coefficients of organic Log 1/C = 0.82(+0.04)clog P
molecules) (19).
Model systems have been devised to elucidate In all cases, there is a strong dependency on
Octanol phase n f7 Bio phase Log 1/C = -a(log P)*+ b log P +
+ constant (1.70)
Water phase Aqueous phase
In the random-walk process, the compounds
Figure 1.1. Log Pohno,mirrors Log Pbio. partition in and out of various compartments
and interact with myriad biological compo-
log P, because all these processes involve nents in the process. To deal with this conun-
transport of alcohols through membranes. drum, Hansch proposed a general, compre-
The low intercepts speak to the nonspecific hensive equation for QSAR 1.71 (188).
nature of the alcohol-mediated toxic interac-
tion. An equilibrium-pseudoequilibrium mod- Log 1/C = -a(log P)' + b log P
eled by log P can be defined as shown in Fig. (1.71)
1.1. + p u + SEs + constant
The Hammett-type relationship for this
conceptual idea of distribution is The optimum value of logP for a given system
is log Po and it is highly influenced by the
Log Pbio .
= a log Po-o1+ b (1.68) number of hydrophobic barriers a drug en-
counters in its walk to its site of action.
This postulate assumes that steric, hydropho- Hansch and Clayton formulated the following
bic, electronic, and hydrogen bonding factors parabolic model to elucidate the narcotic ac-
that affect partitioning in the biophase are tion of alcohols on tadpoles (189).
handled by the octanollwater system. Given
that the biological response (log 1/C)is propor- 4.2.7 Narcotic Action of ROH on Tadpoles
tional to log P,,, then it follows that
.
Log 1IC = a log + constant (1.69)
Hansch and coworkers have amply demon-

strated that Equation 1.69 applies not only to
systems at or near phase distribution equilib-
rium but also to systems removed from equi-
librium (184, 185).
4.2 Nonlinear Models This is an example of nonspecific toxicity

where the last step probably involves parti-
Extensive studies on development of linear
tioning into a hydrophobic membrane. Log Po
models led Hansch and coworkers to note that
a breakdown in the linear relationship oc- represents the optimal hydrophobicity (as de-
curred when a greater range in hydrophobic- fined by logP) that elicits a maximal biological
ity was assessed with particular emphasis response.
placed on test molecules at extreme ends of the Despite the success of the parabolic equa-
hydrophobicity range. Thus, Hansch et al. tion, there are a number of worrisome limita-
suggested that the compounds could be in- tions. This approach forces the data into a
volved in a "random-walk" process: low hydro- symmetrical parabola, with the result that
phobic molecules had a tendency to remain in there are usually deviations between the ex-
the first aqueous compartment, whereas perimental and parabola-calculated data. Sec-
highly hydrophobic analogs sequestered in the ond, the ascending slope is curved and incon-
first lipoidal phase that they encountered. sistent with the observed linear data. Thus,
This led to the formulation of a parabolic the slope of a linear model cannot be compared
equation, relating biological activity and hy- to the curved slope of the parabola. In 1973
drophobicity (187). Franke devised a sophisticated, empirical
4 Quantitative Models
model consisting of a linear ascending part ganic phase and the aqueous phase. An impor-
and a parabolic part (190). See Equations 1.73 tant feature of this model lies in the symmetry
and 1.74. of the curves. For aqueous phases of this
model system, symmetrical curves with linear
Log 1/C = a . l o g P + c ascending and descending sides (like a teepee)
(1.73) and a limited parabolic section around the hy-
(if log P < log Px) drophobicity optimum are generated. Unsym-
.
Log 1/C = -a(log P)' + b log P + c
(1.74)
metrical curves arise for the lipid phases. It is
highly compatible with the linear model and
(if log P > log Px) allows for quick comparisons of the ascending
slopes. It can also be used with other parame-
The binding of drugs to proteins is linearly ters such as MR and u,where it appears to
dependent on hydrophobicity up to a limited pinpoint a change in mechanism similar to the
value, log P,, after which steric hindrance breaks in linearity of the Hammett equation.
causes the linear dependency to alter to a non- The following example of the bilinear model
linear one. The major limitation of this ap- reveals the symmetrical nature of the curve.
proach involves the inclusion of highly hydro-
phobic congeners that tend to cause 4.2.2 Induction of Ataxia in Rats by ROH
systematic deviations between experimental
and predicted values. Log 1/C = O.77(+O.lO)log P
Another cutoff model, which deals with
nonlinearity in biological systems, is one de-
fined by McFarland (191). It attempts to elu-
cidate the dependency of drug transport on
hydrophobicity in multicompartment models.
McFarland addressed the probability of drug s = 0.165, log Po = 2.0
molecules traversing several aqueous lipid
barriers from the first aqueous compartment The bilinear model has been used to model
to a distant, final aqueous compartment. The biological interactions in isolated receptor sys-
probability Po,, of a drug molecule to access tems and in adsorption, metabolism, elimina- '
the final compartment n of a biological system tion, and toxicity studies, although it has a few
was used to define the drug concentration in limitations. These include the need for at least
this compartment. 15 data points (because of the presence of the
additional disposable parameter p and data
LogCR=a - l o g P - 2a.log(P+ 1) points beyond optimum Log P. If the range in
+ constant (1.75) values for the dependent variable is limited,
unreasonable slopes are obtained.
The ascending and descending slopes are 4.3 Free-Wilson Approach
equal (=1)and linear. However, a major draw-
back of this model is that it forces the activity The Free-Wilson approach is truly a structure-
curves to maximize at log P = 0. These studies activity-based methodology because it incor-
were extended by Kubinyi, who developed the porates the contributions made by various
elegant and powerful bilinear model, which is structural fragments to the overall.biological
superior to the parabolic model and is exten- activity (22, 193, 194). It is represented by
sively used in QSAR studies (192). Equation 1.78.
Log 1 / C = a . l o g P - b - l o g ( p . P + 1)
+ constant (1.76)
Indicator variables are used to denote the pres-
where p is the ratio of the volumes of the or- ence or absence of a particular structure feature.
Like classical QSAR, this de novo approach as- Recent analyses of a Free-Wilson type have
sumes that substituent effeds are additive and included the in vitro inhibitory activity of a
constant. BA is the biological activity; Xjis the series of heterocyclic compounds against K.
jth substituent, which carries a value 1 if pneumonia (197). Other applications of the
present, 0 if absent. The term aj represents the Free-Wilson approach have included studies
contribution of the jth substituent to biological on the antimycobacterial activity of 4-alkyl-
activity and pis the overall average activity. The thiobenzanilides, the antibacterial activity of
summation of all activity contributions at each fluoronapthyridines, and the benzodiazepine
position must equal zero. The series of linear receptor-binding ability of some non-benzodi-
equations that are formulated are solved by lin- apzepine compounds such as 3-X-imidazo-
[1,2-blpyridazines, 2-phenylimidazo[l,2-alpyri-
ear regression analysis. It is necessary for each
dines, 2-(alkoxycarbony)imidazo[2,1-plbenzo-
substituent to appear more than once at a posi-
thiazoles, and 2-arylquinolones (198-200).
tion in different combinations with substituents
at other positions.
4.4 Other QSAR Approaches
There are certain advantages to the Free-
Wilson method that have been addressed The similarity in approaches of Hansch anal-
(193-195). Any type of quantitative biological ysis and Free-Wilson analysis allows them to
data can be subject to such analysis. There is be used within the same framework. This is
no need for any physicochemical constants. based on their theoretical consistency and the
The molecules of a series may be structurally numerical equivalencies of activity contribu-
dissected in any way and multiple sites of sub- tions. This development has been called the
stitution are necessary and easily accommo- mixed approach and can be represented by the
dated (196). Limitations include the large following equation:
number of molecules with varying substituent
combinations that are needed for this analysis Log 1/C = 2 a,+ cj + constant (1.80)
and the inability of the system to handle non-
linearity of the dependency of activity on sub- The term ai denotes the contribution for each
stituent properties. Intramolecular interac- ith substituent, whereas Djis any physicochem-
tions between the substituent are not handled ical property of a substituent q.For a thorough
very well, although special treatments can be review of the relationship between Hansch-and
used to accommodate proximal effects. Ex- Free-Wilson analyses, see the excellent reviews
trapolation outside of the substituents used in by Kubinyi (58, 195). A recent study of the
P-glycoprotein inhibitory activity of 48
the study is not feasible. Another problem in-
propafenone-type modulators of multidrug re-
herent with this approach is that usually a
sistance, using a combined HanscWFree-Wilson
large number of variables is required to de-
approach was deemed to have higher predictive
scribe a smaller number of compounds, which ability than that of a stand-alone Free-Wilson
creates a statistical faux pas. Fujita and Ban analysis (201). Molar refractivity, which has a
modified this approach in two important ways high collinearity with molecular weight, was a
(23). They expressed the biological activity on significant determinant of modulating ability. It
a logarithmic scale, to bring it into line with is of interest to note that molecular weight has
the extrathermodynamic approach, as seen in been shown to be an omnipresent parameter in
the following equation: cross-resistance profiles in multidrug-resistance
phenomena (167).
Log X, = C, aiXi + p (1.79)
This allowed the derived substituent con- 5 APPLICATIONS OF QSAR

stants to be compared with other free energy-
related parameters. The overall average inter- Over the last 40 years, the glut in scientific
cept u took on a new look, as it were, akin to an information has resulted in the development
intercept in other QSAR analyses. of thousands of equations pertaining to struc-
5 Applications of QSAR
NH2, HCI
I
In all equations, n is the number of data

points, r2 is the square of the correlation coef-
Figure 1.2. 4,6-Diamino-1,2-dihydro-2,2-dimethyl- ficient, s represents the standard deviation,
1R-s-triazines. and the figures in parentheses are for con-
struction of the 95% confidence intervals. .rr
represents the hydrophobicity of the substitu-
ture-activity relationships in biological sys- ent Rand .rr,is the optimum hydrophobic con-
tems. In its original definition, the Hansch tribution of the R substituent. D is an indica-
equation was defined to model drug-receptor tor variable that acquires a value of 1.0 when a
interactions involving electronic, steric, and phenyl ring is present on the nitrogen and a
hydrophobic contributions. Nonlinear rela- value of zero for all other R. This is an example
tionships helped refine this approach in cellu- of a Hansch-Fujita-Ban analysis, where the in-
lar systems and organisms where pharmacoki- dicator variable D establishes the contribution
netic constraints had to be considered and and thus the importance of a phenyl ring in
tackled. They have also found increased utility DHFR inhibition. This equation has some lim-
in addressing the complex QSAR of some re- itations. Improper choice of N-substituents
ceptor-ligand interactions. In many cases the led to a high degree of collinearity between
Kubinyi bilinear model has provided a sophis- size and hydrophobicity and in terms of elec-
ticated approach to delineation of steric effects tronic contributions, spanned space was lim-
in such interactions. Examples of ligand-re- ited and thus inadequate. A subsequent study
ceptor interactions will be drawn from recep- on the binding of these compounds to DHFR
tors such as the much-studied dihydrofolate isolated from chicken liver was more reveal-
reductases (DHFR), a-chymotrypsin and 5a-
ing.
reductase (202-204).
5.1 Isolated Receptor Interactions 5.1.2 lnhibition of Chicken Liver DHFR b i

3-X-Triazines (207)
The critical role of DHFR in protein, purine, and
pyrimidine synthesis; the availability of crystal
structures of binary and ternary complexes of Log l/Ki
the enzyme; and the advent of molecular graph-
ics combined to make DHFR an attractive target
for well-designed heterocyclic ligands generally
incorporating a 2,4-diamino-1,-3-diazapharma-
cophore (205). The earliest study focused on the
inhibition of DHFR by 4,6-diamino-l,2-dihydro-
2,2-dimethyl-1R-s-triazines, the structure of
which is shown in Fig. 1.2 (202). d o= 1.89(&0.36) log P = -1.08
5.1.1 lnhibition of Crude Pigeon Liver In this example, the R group on the 2-nitrogen
DHFR by Triazines (202) was restricted to an (3-X-phenyl) aromatic
ring (205). Accurate Ki values were obtained
Log l/IC,o = 2 . 2 1 ( + 1 . 0 0 ) ~ from highly purified DHFR isolated from
chicken liver. In most cases, T' represented
- 0.28(?0.17)~~ the hydrophobicity of the substituent except
+ O.84(+0.76) D in certain instances where X = -OR or
-CH,ZC,H,-Y. It was ascertained that alkoxy
+ 2.58(?1.30) substituents were not making direct hydro-
phobic contact with the enzyme, given that 5.1.4 lnhibition of 11210 DHFR by 3-X-Tria-
their inhibitory activities were essentially con- zines (209)
stant from the methoxy to the nonyloxy sub-
stituent. In the bridged substituents where Z Log l/Ki
= 0,NH, S, Se, the Y substituent again did not
contact the enzyme surface. Variation in Y led
to the same, constant biological activity. The
coefficient with a' suggests that the substitu-
ent is engulfed in a hydrophobic pocket that
has an optimal a ' , of 2. This value is consis-
tent with that seen in the crude pigeon liver
DHFR corrected for the presence of the phenyl a t o= 1.76(?0.28) log /3 = -0.979
group (4.0 - 2.0 = 2). The 0.86 p value (coef-
ficient with u) suggests that there could be a The consistency in these models versus pro-
dipolar interaction between the electron defi- karyotic DHFR is established by the coeffi-
cient phenyl ring and a region of positively cient with the hydrophobic term, the optimum
charged electrostatic potential in the enzyme, a' value, and the rho value. These numerical
perhaps an arginine, lysine, or histidine resi- coefficients can be contrasted sharply with
due. Hathaway et al. developed a QSAR for the those obtained from fungal and protozoal
inhibition of human DHFR by 3-X-triazines DHFR. Inhibition constants were determined
and obtained Equation 1.83 (208). for 3-X-triazines versus Pneumocystis carinii
DHFR (210).
5.1.3 lnhibition of Human DHFR by 3-X-
Triazines (208) 5.1.5 lnhibition of P. carinii DHFR by 3-X-
Triazines (210)
Log l/Ki
Log l/Ki
a t o= 2.0(+0.87) log P = -0.577
The enhanced activity of the "bridged" sub-

stituents was corrected by the indicator vari-
able I. Note that triazines bearing the bridge In Equation 1.85, I,, is an indicator variable
moieties -CH,NHC,H,Y, --CH,OC,H,Y, that assumes a value of 1 when an alkoxy sub-
and -CH,SC,H,Y had unusually high en- stituent is present and 0 for all other substitu-
zyme binding activity. Note that the ents. It is of interest to note that the Y sub-
-CH,NHC,H, bridge is present in the endog- stituent on the second phenyl ring now
enous substrate, folic acid. The bilinear depen- contributes to activity. The MR, term sug-
dency on hydrophobicity of the substituents gests that it most probably accesses a polar
parallels that seen in the case of chicken liver region of the active site of the enzyme. The
DHFR. A similar QSAR was obtained for positive coefficient with M R , suggests that an
DHFR isolated from L1210 murine leukemia increase in bulk andlor polarizability en-
cells (209). hances binding. The descending slope of the
bilinear equation is much steeper (1.36 - 0.73 the former and the testing for QSAR 1.87 was
= 0.63) than that seen with the mammalian conducted under different assay conditions; Ki
and avian enzymes. values were not determined. A noteworthy dif-
A similar model is obtained vs. the bifunc- ference between these models is the wide dis-
tional protozoal DHFR from Leishmania ma- parity in % values. The binding site of the
jor, which is coupled to thymidylate synthase protozoal and fungal species comprises an ex-
(211). tensive hydrophobic surface unlike the abbre-
viated pockets in the mammalian and avian
5.1.6 lnhibition of L. major DHFR by 3-X- enzymes. The positive coefficients with the
Triazines (211) MR, terms suggests that added bulk on the
bridged phenyl ring enhances inhibitory po-
Log 11Ki tency. The study versus T. gondii DHFR
(QSAR 1.87) included a number of mostly small,
polar substituents (NH,, NO,, CONMe,) on
the bridged phenyl and their activities were
considerably lower than the unsubstituted an-
alog. Comparative QSAR can be useful, partic-
ularly if the biological data are consistent
(tested under the same assay conditions, ex-
cellent purity of enzymes, substrates, inhibi-
tors, buffers), and the choice of substituents is
appropriate.
One of the major problems that arises with
some QSAR studies is extrapolation from be-
yond spanned space. Predictive ability is
QSAR analysis on a limited set of 3-X-triazines sound when one has probed an adequate range
assayed by Chio and Queener versus Toxo- in electronic, hydrophobic, and steric space. At
plasmosis gondii led to the formulation of the onset of the study, the training set should
Equation 1.87 (202, 212). address these concerns. Lack of adequate at-
tention to such issues can result in QSAR -
5.1.7 Inhibition of T. gondii DHFR by 3-X- models that are misleading. When examined
Triazines on its own, such a model may appear to with-
stand statistical rigor and apparent transpar-
Log l/ICS, = 0.39(IC_0.20)~' ency but, on being subjected to lateral valida-
(1.87)
- O.43(+0. 19)MRy + 6.65(20.30) tion, loopholes emerge. A brief study to
illustrate this phenomenon is outlined below.
Four different QSAR were derived for the
inhibition of DHFR from rat liver, human leu-
A quick comparison of QSAR 1.82-1.84 re- kemia, mouse L1210, and bovine liver by 2,4-
veals the strong similarity between the avian diamino, 5-Y, 6-Z-quinazolines (Fig. 1.3) (202,
and mammalian models. In fact because of its 213-215). A comparison of their QSAR pre-
increased stability, chicken liver DHFR has sents an interesting study on the importance
often been used as a surrogate for human of spanned space in delineating enzyme-recep-
DHFR in enzyme-inhibition studies. The in- tor interactions.
tercepts, coefficients with d, and optimum
do for avian (6.33, 1.01, 1.91, human (6.07,
1.07, 2.0), and mouse leukemia (6.12, 0.98,
1.76) can be compared to the corresponding
values for P. carinii (6.48, 0.73, 3.99) and
Leishmania major (5.05, 0.65, 4.54). QSAR
1.81 and 1.87 are not included in the compar-
ison because crude pigeon enzyme was used in Figure 1.3. 2,4-Diarnino,5-Y,6-Z-quinazolines.
5.1.8 lnhibition of Rat Liver DHFR by 2,4- 5.1.1 1 lnhibition of Bovine Liver DHFR by
Diamino, 5-Y, 6-Z-quinazolines (21 3) 2,4-Diamino, 5-Y, 6-Z-quinazolines (21 5)
Log 1/IC50
= 0.78(+0.12).rr5
+0.81(20.12)~~,
- 0 . 0 ~ ~ 2 0 . 0 2 ~ ~ ~(1.88)
~ ~ These QSAR vary in size and the number of
variables used to define inhibitory activity.
- 0.73(rt0.49)11- 2.15(?0.38)12 Selassie and Klein have described a more thor-
ough comparative analysis of these QSAR
- 0.54(?0.21)13- 1.40(+0.41)14
(202).A brief focus on the MR, term reveals
+ 0.78(t0.37)16 that its coefficients vary remarkably in all four
sets. QSAR 1.88 is a parabola with an opti-
- O.2O(tO.l2)M& . I mum of 6.4. Because it is parabolic in nature,
the coefficient of the ascending slope cannot be
+ 4.92(t0.23) compared with the linear slopes in QSAR
n = 101, r 2 = 0.924, s = 0.441, 1.89-1.91. Figure 1.4 illustrates the problems
with QSAR 1.89-1.91, which failed to test an-
M&,g = 6.4(+0.8) alogs across the available space.
Figure 1.4 reveals that QSAR 1.89 and 1.90
were sampled in the suboptimal MR, range;
5.1.9 lnhibition of Human Liver DHFR by
thus, the negative dependency on MR,. On the
2,4-Diamino, 5-Y, 6-Z-quinazolines (214)
other hand, QSAR 1.91 was focused on the
ascending portion of the curve and thus only
Log l / K i molecules in the 0.1-3.4 range were tested.
Thus, with a limited set of compounds, one
= -2.87(?0.16)11 gets a misleading picture of the biological
interactions.
.
Enzymatic reactions in nonaqueous sol-
vents have generated a great deal of interest,
fueled in part by the commercial application of
enzymes as catalysts in specialty synthesis.
The increasing demand for enantiopure phar-
maceuticals has accelerated the study of enzy-
matic reactions in organic solvents containing
5.1.10 lnhibition of Murine 11210 DHFR by

2,4-Diamino, 5-Y, 6-Z-quinazolines (214)
QSAR 90
Log 1/IC50
0 2 4 6 8 10
MR 6
Figure 1.4. Gaps in spanned space of MR6 for

2,4-diamino-quinazolines.
little or no water (216). To investigate the sub- 5.1.1 4 Binding of X-Phenyl, KBenzoyh-
strate specificity of a-chymotrypsin in penta- alaninates in Aqueous Phosphate Buffer (218)
nol, a series of X-phenyl esters of N-benzoyl-L-
alanine (Fig. 1.5) were synthesized and their
binding constants were evaluated in buffer
and in pentanol (203). The following QSAR
1.92 and 1.93 were derived in phosphate
buffer and pentanol.
5.1.12 Binding of X-Phenyl, N-Benzoyh-

alaninates to aChymotrypsin in Phosphate
Buffer, pH 7.4 (203)
5.1.1 5 Binding of X-Phenyl, N-Benzoyh-

alaninates in Pentanol (218)
5.1.1 3 Binding of X-Phenyl, N-Benzoyl-L-ala-

ninates to aChymotrypsin in Pentanol (203)
The disappearance of the MR term in QSAR

1.93 and 1.95 is significant. The MR term usu-
ally relates to nonspecific, dispersive interac-
tions in polar space. Thus, its presence in
QSAR 1.92 and 1.94 suggests that substrates
bearing polarizable substituents may displace
Outliers in QSAR 1.92 included the 4-t-butyl the ordered-category I1 water molecules. In
and 4-OH analogs, whereas the 4-CONH, pentanol, the substrate may be faced with the
analog was an outlier in QSAR 1.93. These task of displacing pentanol, not water, from
results were recently reanalyzed by Kim the enzyme and thus the MR term is no longer
(217,218) with respect to the role of enthal- of consequence. QSAR 1.94 also indicates that
pic and entropic contributions to ligand the enthalpy term .rr,plays a more critical role
binding with a-chymotrypsin. Use of the Fu- in binding than the entropy term .rr,. Note
jiwara hydrophobic enthalpy parameter r,
that these roles are reversed in QSAR 1.95,
and the hydrophobic entropy parameter %
suggesting that binding in pentanol is largely
led to the development of QSAR 1.94 and
1.95 (219). an entropic-driven process. Similar results
were obtained by Compadre et al. in a study on
the hydrolysis of X-phenyl-N-benzoyl-glyci-
nates by cathepsin B in aqueous buffer and
acetonitrile (220). Kim's analysis provides an
excellent example of a study that focuses on
mechanistic interpretation and clearly dem-
onstrates that a thermodynamic approach in
QSAR can provide pertinent information
about the energetics of the ligand binding pro-
Figure 1.5. X-Phenyl, N-benzoyl-L-alaninates. cess.
5a-Reductase, a critical enzyme in male 5.1.1 6 lnhibition of 5-a-Reductase by 4-X,

sexual development, mediates the reduction of N-Y-6-azaandrost-l7-CO-Z-4-ene-3-ones, I
testosterone to dihydrotestosterone (DHT).
Elevated levels of DHT in certain disease Log l/Ki
states such as benign prostatic hypertrophy
and prostatic cancer drives the need for effec-
tive inhibitors of 5a-reductase. A recent QSAR
study on inhibition of human 5a-reductase,
type 1 by various steroid classes was carried
out by Kurup et al. (204,221,222). A few of the
models will be examined to demonstrate the
importance and power of lateral validation. outliers: X = Y = H, Z = NHCMe,;
The three classes of steroidal inhibitors are
depicted in Fig. 1.6. X = Me, Y = H, Z = CH2CHMe2
5.1.1 7 lnhibition of 5-a-Reductase by 17P-

(N-(X-phenyl)carbamoyl)d-azaandrost-4-ene-
3-ones, I1
Log l/Ki = 0.35(?0.09)Clog P
outlier: 2,5- (CF,)
5.1.18 lnhibition of 5-a-Reductase by 17P-

(N-(1 -X-phenyl-cycloalkyl)carbamoyl)-6-azaan-
drost-4-ene-3-ones, Ill .
Log 1/Ki = 0.32(+0.17)c10g P
(1.98)
+ 6.34(-+1.15)
outlier: n = 5, X = 4-t-Bu
In all these equations, the coefficients with hy-

drophobicity as represented by Clog P, suggest
0
\C-NH-C that binding of these azaandrostene-ones oc-
curs on the surface of the binding site where
partial desolvation can occur. I is an indicator
variable that pinpoints the negative effect of a
double bond at C-1. A bulky substituent on
N-6 is detrimental to activity, whereas a large
substituent in the ortho position on the aro-
matic ring enhances activity (QSAR 1.97). The
bulky ortho substituents (mostly t-Bu) may
destroy coplanarity with the amide bridge by
perhaps twisting of the phenyl ring and en-
Figure 1.6. Steroidal inhibitors of 5a-redudase. hancing its hydrophobic contact with the
binding site on the enzyme. Note that the DHFR and it can be posited that the cytotox-
larger intercept in QSAR 1.98 versus QSAR icity in the sensitive cell line results from the
1.97 suggests that hydrophobicity is more im- inhibition of the enzyme. The intercepts sug-
portant in this area. gest that slight interference with folate me-
tabolism significantly affects growth. A com-
5.2 Interactions at the Cellular Level parison of the sensitive and resistant QSAR
reveals a substantial difference in the coeffi-
QSAR analysis of studies at the cellular level cients with T . The lack of many variables in
allows us to get a handle on the physicochem- QSAR 1.100 and its overall simplicity suggests
ical parameters critical to pharmacokinetics that inhibition of the enzyme is not the critical
processes, mostly transport. Cell culture sys- step, but rather transport to the site of action
tems offer an ideal way to determine the opti- in these resistant cells may be of utmost im-
mum hydrophobicity of a system that is more portance. This particular cell line was resis-
complex than an isolated receptor. Extensive tant to methotrexate by virtue of elevated lev-
QSAR have been developed on the toxicity of els of DHFR and also overexpression of
3-X-triazines to many mammalian and bacte- glycoprotein, GP-170 (209). Thus, modified
rial cell lines (202, 209). A comparison of the transport through the dysfunctional mem-
cytotoxicities of these analogs vs. sensitive brane would severely curtail the partitioning
murine leukemia cells (L1210/S) and metho- process, resulting in a coefficient with T that is
trexate-resistant murine leukemia cells only one-half (0.42) of what is normally seen.
(L1210/R)reveals some startling differences. The negative coefficient with the MR term in-
dicates that size plays a role, albeit a negative
5.2.1 lnhibition of Growth of L1210/S by one, in passage through the GP-170-fortified
3-X-Triazines (209) membrane and to the site of action.
The QSAR paradigm has been shown to be
Log 111C50 particularly useful in environmental toxicology,
especially in acute toxicity determinations of xe-
nobiotics (223). There has recently been an em-
phasis on "transparent, mechanistically com-
prehensive QSAR for toxicity," a move that is
welcomed by many researchers in the field (224,
225). Cronin and Schultz developed QSAR 1.101
to describe the polar, narcotic toxicity of a large
set of substituted phenols. A number of phenols
with ionizable or reactive groups (e.g., -COOH,
-NO,, -NO, -NH,, or -NHCOCH,) were
omitted from the h a l analysis (226).
TO = 1.45(+_0.93) log p = -0.274
5.2.3 lnhibition of Growth of Tetrahymena
5.2.2 lnhibition of Growth of L1210/R by pyriformis (40 h)
3-X-Triazines (209)
Log 11C
Log 1/IC50
Using Hammett u constants, Garg et al. re-

There is a radical difference between these derived QSAR 1.102 for the same set and
two QSAR. QSAR 1.99 is very similar to the QSAR 1.103 and 1.104 for the diverse set of
one (QSAR 1.84) obtained versus the L1210 multi-, di-, and monophenols, which were se-
questered into two subsets containing elec- 5.2.7 lnhibition of Growth of T. pyriformis
tron-releasing and electron-attracting sub- by Aromatic Compounds (229)
stituents, respectively (227).
5.2.4 lnhibition of Growth of T. pyriformis
by Phenols (using a)(227)
Log 1/C
The indicator variables I,,, ,and I ,

,,,,, ,,,
suggest that 2- and 4-amino-substituted phe-
5.2.5 lnhibition of Growth of T. pyriformis nols enhance toxicity, whereas strong acids
by Electron-Releasing Phenols (227) decrease toxicity, respectively. The H-bond
donor parameter may be correcting for the
Log 1/C = O.66(?O.O5)Clog P added potency of amino phenols. The low r 2
(1.103) may be attributed to inherent variability in
+ 1.63(+0.15) biological data and to the commingling of data
from four different studies. The wide variety
of compounds with different toxicity mecha-
5.2.6 lnhibition of Growth of T. pyriformis nisms, present in this combined study, would
by Electron-Attracting Phenols (227) also be a contributing factor to the low r 2 .
Overall, this regression-based approach shows
P
Log 1/C = O.63(izOOO7)Clog adequate predictability and is transparent,
thus aiding in mechanistic interpretation.
5.3 Interactions In Vivo

The paucity of QSAR studies in whole animals
is understandable in terms of the costs, the
heterogeneity of the biological data, and the
There is excellent agreement between QSAR complexity of the results. Nevertheless, in the
1.101 and QSAR 1.104, in terms of the impor- few studies that have been done, excellent
tance of hydrophobicity and electron demand of QSAR have been obtained, despite the small
the substituents: the coefficients with ClogP are number of subjects in the data set (164). One
similar and there is a good correspondence be- particular example is insightful. The renal and
tween ELmOand a.Nevertheless, separation of nonrenal clearance rates of a series of 11
the phenols into subsets, based on their elec- P-blockers, including bufuralol, tolamolol,
tronic attributes, indicates that different mech- propranolol, alprenolol, oxprenolol, acebutol,
anisms of toxicity might be operative in this or- timolol, metoprolol, prindolol, atenolol, and
ganism, a phenomenon that has been duplicated nadolol were measured (230). The following
in mammalian cells (228). In a recent extension QSAR were formulated using those data (164).
of toxicity studies on aromatics, Cronin and
Schultz used a two-parameter or response-sur-
5.3.1 Renal Clearance of &Adrenoreceptor
face approach to define toxicity (229). In addi-
Antagonists
tion, indicator variables and group counts were
included to broaden the applicability of the ap-
proach. An excellent comparison of the different Log k = -0.42(?0.12)c10g P
modeling approaches (MLR, PLS, and Bayesian- (1.106)
2.35(+0.24)
i-
regularized neural networks) in QSAR is also
made (229).
6 Comparative QSAR
5.3.2 Nonrenal Clearance of @Adrenore- steric effects and there was no dependency
ceptor Antagonists on electronic terms. Careful analysis of the
initial data revealed that it had a limited
Log k = 1.94(?0.6l)Clog P range in hydrophobicity and steric at-
tributes. The lack of other QSAR to validate
the findings in QSAR 1.108 made it statisti-
cally significant, a t that time, but mechanis-
tically weak. Most weaknesses in QSAR for-
mulations usually violate the compound-to-
parameter ratio rule (232, 233).
ClogPo = 2.6 + 1.5 log P = -0.813
outlier: oxprenolol 6 COMPARATIVE QSAR
It is apparent from QSAR 1.106 and 1.107, 6.1 Database Development

that the hydrophobic requirements of the sub-
There are literally dozens of databases con-
strates vary considerably. As expected, renal taining information about chemical struc-
clearance is enhanced in the case of hydro- tures, synthetic methods, and reaction mech-
philic drugs, whereas nonrenal clearance anisms. The C-QSAR database is a database
shows a strong dependency on hydrophobic- for QSARmodels (164,234). It was designed to
ity. Note that QSAR 1.107 is stretching the organize QSAR data on physical (PHYS) or-
limits of the bilinear model with only 10 data ganic reactions as well as chemical-biological
points! The 95% confidence intervals are (BIO) interactions, in numerical terms, to
also large but, nevertheless, the equations bring cohesion and understanding to mecha-
.
serve to em~hasize the difference in clearance nisms of chemical-biodynamics. The two data-
mechanisms that are clearly linked to bases are organized on a similar format, with
hydrophobicity. the emphasis on reaction types in the PHYS
In formulating QSAR, it is useful to use a database. The entries in the BIO database are
well-designed series to optimize a particular sequestered into six main groups: macromole-
biological activity. It is also important to en- cules, enzymes, organelles, single-cell organ- .
sure that the ratio of compounds to parame- isms, organsltissues, and multicellular organ-
ters is 5, so that collinearity is minimized isms (e.g., insects). The combined databases or
while spanned space is maximized. A normal the separate PHYS or BIO databases can be
distribution of biological data is necessary. A searched independently by a string search or
violation of these guidelines usually leads to searching using the SMILES notation. A
statistically insignificant QSAR or models SMILES search can be approached in three
that defy predictability. One of our earliest ways: one can identify every QSAR that con-
works on the inhibition of E. coli DHFR by tains a specific molecule, one can use a MER-
2,4-diamino-5-X-benzylpyrimidines led to the LIN search that locates all derivatives of a
derivation of the following equation (231): given structure, or one can search on single or
multiple parameters. For a more thorough de-
Log l/Ki = - 1 . 1 3 ~+ ~5.54 (1.108) scription of the C-QSAR database and ways to
search it, see Hansch et al. (234) and Hansch
et al. (164). The net result of searching the
QSAR database is to "mine" for models; one
Most of the variance in these data was ex- could thus call it model-mining.
plained by the Hammett through-resonance
constant (a,). It implied that electron-re-
6.2 Database: Mining for Models
leasing substituents enhanced inhibitory po-
tency. Later, expanded and extensive stud- To enhance our understanding of ligand-re-
ies on this system revealed that inhibition of ceptor interactions and bring coherence to
the bacterial enzyme was related to mostly these relationships, there needs to be a con-
Table 1.5 Rho Values for Chemical and Biochemical Reactions

Solvent Radical &agent n pf (at)
Hydrogen Abstraction from Unhindered Phenols

1 CC1,
2 Benzene
3 CC1,
X-phenols-Enzyme Systems
1 Horseradish peroxidase
2 Ladoperoxidase
certed effort not only to develop high-quality 6.2.2 lnhibition of DNA Synthesis in CHO
regressions but also to create models that res- Cells by X-Phenols (236)
onate with those drawn from mechanistic or-
ganic chemistry. A comprehensive, integrated Log 1IC = -0.74(t0.34)u+
database C-QSAR allows us to do so; it con-
- 1.02(?0.41)CMR (1.110)
tains over 16,000 examples drawn from all fac-
ets of chemistry and biology. An example on
the toxicity of X-phenols will illustrate the use-
fulness of this database (164, 228, 235-238).
Recently, increasing numbers of QSAR for
phenols have been based on Brown's a+term, These Brown p+ values were in line with those
an electronic term that was first designed to obtained from chemical and biological systems
(228) see Table 1.5.
rationalize electronic effects of substituents
Cytotoxicity studies of X-phenols versus
on electrophilicaromatic substitution. Studies
L1210 cells in culture led to an unusual result,
conducted at EPA gave early indications that which was b a n g but reminiscent of Hammett
embryologic defects of rat embryos in vitro plots related to changes in mechanism (228).
could be correlated by u+, as seen in QSAR
1.109109 (239).
6.2.3 lnhibition of Growth of 11210 by X-
Phenols
6.2.1 Incidence of Tail Defects of Embryos
(235) Log 1IIC50
= -0.83(t0.18)ut
Soon, this parameter was shown to correlate

radical reactions in chemistry as well as chem-
ical-biological interactions in an extensive
compilation (240). Another older study by
Richard et al. on the inhibition of replicative Log Po = -0.18 Log /3 = -2.28
DNA synthesis in Chinese hamster ovary cells
was examined and led to the development of outliers: 4-C2H5,3-NH2
Equation 1.110 (241). Again, there was a de-
pendency on u+. Sequestering of the data into two subsets with
6 Comparative QSAR
varying electronic attributes (a > 0 and a+<

f
predicted by this model. The model suggests

0) led to the derivation of the following equa- that cytotoxicity is an outcome of phenoxy
tions. radical formation and subsequent interaction
with a relatively nonpolar receptor. The small
6.2.4 Inhibition of Growth of 11210 by hydrophobic coefficient suggests that DNA
Electron-Withdrawing Substituents (a+> 0) could be a likely target.
The appearance of the a+ parameter in a
Log 1/IC5, = 0.62(t-0.16)Log P large number of reactions and interactions in-
volving X-phenols indicates that the phenoxy
radical can be a potent, reactive intermediate
in myriad reactions. The availability of a fast,
easily retrievable computerized database to
outlier: 3-OH corroborate this phenomenon was useful. This
approach of lateral validation was crucial in
establishing a QSAR model that was not only
6.2.5 lnhibition of Growth of 11210 by
Electron-Donating Substituents (a+c 0)
statistically significant but also mechanisti-
cally interpretable.
6.3 Progress in QSAR
The last four decades have seen major changes
in the QSAR paradigm. In tandem with devel-
opments in molecular modeling and X-ray
crystallography, it has impacted drug design
and development in many ways. It has also
outliers: 3-NH2,4-NHAc spawned 3D QSAR approaches that are rou-
tinely used in computer-assisted molecular de-
In QSAR 1.113, 62% of the variance is ac- sign. In terms of ligand design, it shares center
counted for by at and 28% is explained by stage with other approaches such as struc-
log P. It appears that free-radical-mediated ture-based ligand design and other rational
toxicity is responsible for the growth-inhibi- drug design approaches including docking.
tory effects of the phenols. Homolytic bond methods and genetic algorithms (243). Suc-
dissociation energies related to the homolytic cess stories in QSAR have been recently re-
cleavage of the OH bond in the following reac- viewed (244, 245). Bioactive compounds have
.
tion: (X--C,H,OH + C6H,0 + X-C6H,0 . emerged in agrochemistry, pesticide chemis-
+ C6H,0H) have been used in lieu of a+val- try, and medicinal chemistry.
ues. The net result is similar, as seen in QSAR Bifenthrin, a pesticide, was the product of a
1.114 (242). design strategy that used cluster analysis
(244) (Fig. 1.7). Guided by QSAR analysis, the
Log 1/IC5, = -0.21(+-0.03)BDE chemists at Kyorin Pharmaceutical Company
designed and developed Norfloxacin, a
6-fluoro quinolone, which heralded the arrival
of a new class of antibacterial agents (246)
(Fig. 1.7). Two azole-containing fungicides,
metconazole (Fig. 1.8) and ipconazole were
launched in 1994 in France and Japan, respec-
outliers: 4-NHAc, 3-NH2, 3-NMe2 tively (247). Lomerizine, a 4-F-benzhydryl-4-
(2,3,4-trimethoxy benzyl) piperazine, was in-
This data set contains a wide diversity of phe- troduced into the market in 1999 after
nolic inhibitors, including a large number of extensive design strategies using QSAR (248)
ortho-substituted compounds, estrogenic phe- (Fig. 1.8). Flobufen, an anti-inflammatory
nols (P-estradiol, DES, nonyl phenol), and agent was designed by Kuchar et al. as a long-
other antioxidants whose activities are well acting agent without the usual gastric toxicity
Figure 1.7. Bifenthrin and Norfloxacin.
(249) (Fig. 1.8). It is currently in clinical trial.

Other examples of the commercial utility of
QSAR include the development of metamitron
and bromobutide (250).In most of these exam-
ples, QSAR was used in combination with
other rational drug-design strategies, which is
a useful and generally fruitful approach.
In addition to these commercial successes,
the QSAR paradigm has steadily evolved into Figure 1.8. Lomerizine, Metconazole,and Flobufen.
a science. It is empirical in nature and it seeks
to bring coherence and rigor to the QSAR
models that are developed. By comparing drophobicity for CNS penetration has been
models one is able to more fully comprehend determined by Hansch et al. (252). QSAR has
scientific phenomena with a "global" perspec- helped delineate allosteric effects in enzymes'
tive; trends in patterns of reactivity or biolog- such as cyclooxygenase, trypsin, and in the
ical activity become self-evident. well-defined and complex hemoglobin system
(253, 254).
7 SUMMARY QSAR has matured over the last few de-
cades in terms of the descriptors, models,
QSAR has done much to enhance our under- methods of analysis, and choice of substitu-
standing of fundamental processes and phe- ents and compounds. Embarking on a QSAR
nomena in medicinal chemistry and drug de- project may be a daunting and confusing task
sign (251). The concept of hydrophobicity and to a novice. However, there are many excellent
its calculation has generated much knowledge reviews and tomes (1, 4, 19, 58-60) on this
and discussion as well as spawned a mini-in- subject that can aid in the elucidation of the
dustry. QSAR has refined our thinking on se- paradigm. Dealing with biological systems is
lectivity at the molecular and cellular level. not a simple problem and in attempting to de-
Hydrophobic requirements vary considerably velop a QSAR, one must always be cognizant
between tumor-sensitive cells and resistant of the biochemistry of the system analyzed
ones. It has allowed us to design more selectiv- and the limitations of the approach used.
ity into antibacterial agents that bind to dihy-
drofolate reductase. QSAR studies in the REFERENCES
pharmacokinetic arena have established dif- 1. C . Hansch and A. Leo, Substituent Constants
ferent hydrophobic requirements for renal/ for Correlation Analysis in Chemistry and Bi-
nonrenal clearance, whereas the optimum hy- ology, John Wiley & Sons, New York, 1979.
References
2. D. J. Livingstone, J. Chem. Znf. Comput. Sci., 27. L. H. Hall and L. B. Kier, J. Pharm. Sci., 66,
40,195 (2000). 642 (1977).
3. C. Hansch, A. Kurup, R. Garg, and H. Gao, 28. L. B. Kier and L. H. Hall, Molecular Structure
Chem. Rev., 101,619 (2001). Description. The Electrotopological State, Aca-
4. H. Kubinyi in M. Wolff, Ed., Burger's Medici- demic Press, San Diego, CA, 1999.
nal Chemistry and Drug Discovery, Volume 1: 29. W. Tong, D. R. Lowis, R. Perkins, Y. Chen,
Principles and Practice, John Wiley & Sons, W. J. Welsh, D. W. Goddette, T. W. Heritage,
New York, 1995, p. 497. and D.M. Sleehan, J. Chem. Inf. Comput Sci.,
5. A. Crum-Brown and T. R. Fraser, Trans. R. 38, 669 (1998).
Soc. Edinburgh, 25, 151 (1868). 30. S. J. Cho, W. Zheng, and A. Tropsha, Pac.
6. C. Richet and C. R. Seancs, Soc. Biol. Ses. Fil., Symp. Biocomput., 305 (1998).
9,775 (1893). 31. H. Gao and J. Bajorath, J. Mol. Diversity, 4,
7. H. Meyer, Arch. Exp. Pathol. Pharmakol., 42, 115 (1999).
109 (1899). 32. H. Gao, C. Williams, P. Labute, and J. Bajo-
rath, J. Chem. Znf. Comput. Sci., 39, 164
8. E. Overton, Studien Uber die Narkose, Fischer,
Jena, Germany, 1901. (1999).
33. W. J. DunnIII, S. Wold, U. Edlund, S. Hellberg,
9. J. Ferguson, Proc. R. Soc. London Ser. B , 127,
and J. Gasteeger, Quant. Struct.-Act. Relat., 3,
387 (1939).
131 (1984).
10. A. Albert, S. Rubbo, R. Goldacre, M. Darcy, and
34. J. Langley, J. Physiol., 1, 367 (1878).
J. Stove, Br. J. Exp. Pathol., 26, 160 (1945).
35. P. Ehrlich, Klin. Jahr., 6, 299 (1897).
11. A. Albert, Selective Toxicity: The Physicochem-
36. J. N. Langley, J. Physiol., 33,374 (1905).
ical Bases of Therapy, 7th ed., Chapman and
Hall, London, 1985, p. 33. 37. M. Famulok, Curr. Opin. Struct. Biol., 9, 324
(1999).
12. P. H. Bell and R. 0. Roblin, Jr.J. Am. Chem.
38. K. Y. Wang, S. Swaminathan, and P. H. Bolton,
SOC.,64,2905 (1942).
Biochemistry, 33, 7617 (1994).
13. L. P. Hammett, Chem. Rev., 17,125 (1935). 39. J. W. Lown in S. Neidle and M.-J. Waring, Eds.,
14. L. P. Hammett, Physical Organic Chemistry, Molecular Aspects ofhticancer Drug-DNA Zn-
2nd ed., McGraw-Hill, New York, 1970. teractions, Macmillan, Basinstoke, UK, 1993,
15. R. W. Taft, J. Am. Chem. Soc., 74,3120 (1952). p. 322.
40. L. Morgenstern, M. Recanatini, T. E. Klein, W.
.
16. C. Hansch, P. P. Maloney, T. Fujita, and R. M.
Muir, Nature, 194, 178 (1962). Steinmetz, C. Z. Yang, R. Langridge, and C.
17. R. Nelson Smith, C. Hansch, and M. M. Ames, Hansch, J. Biol. Chem., 262, 10767 (1987).
J. Pharm. Sci., 64,599 (1975). 41. R. N. Smith, C. Hansch, K. H. Kim, B. Omiya,
G. Fukumura, C. D. Selassie, P. Y. C. Jow, J. M.
18. T. Fujita, J. Iwasa, and C. Hansch, J. Am.
Blaney, and R. Langridge, Arch. Biochem. Bio-
Chem. Soc., 86, 5175 (1964).
phys., 215,319 (1982).
19. C. Hansch and A. Leo in S. R. Heller, Ed., Ex- 42. C. Hansch, T. Klein, J. McClarin, R. Lang-
ploring QSAR. Fundamentals and Applica- ridge, and N. W. Cornell, J. Med. Chem., 29,
tions in Chemistry and Biology, American 615 (1986).
Chemical Society, Washington, DC, 1995.
43. C. D. Selassie, Z. X. Fang, R. Li, C. Hansch, T.
20. C. Hansch, Acc. Chem. Res., 2,232 (1969). Klein, R. Langridge, and B. T. Kaufman,
21. H. Kubinyi,Arzneim.-Forsch., 26,1991 (1976). J. Med. Chem., 29,621 (1986).
22. S. M. Free and J. W. Wilson, J. Med. Chem., 7, 44. J. M. Blaney and C. Hansch in C. A. Ramsden,
395 (1964). Ed., Comprehensive Medicinal Chemistry. The
23. T. Fujita and T. Ban, J. Med. Chem., 14, 148 Rational Design, Mechanistic Study and Ther-
(1971). apeutic Application of Chemical Compounds,
Vol. 4, Quantitative Drug Design, Pergamon,
24. G. Klopman, J. Am. Chem. Soc., 106, 7315 Elmsford, NY,1990, p. 459.
(1984).
45. G. C. K. Roberts, Pharmacochem. Libr., 6, 91
25. B. W. Blake, K. Enslein, V. K. Gombar, and (1983).
H. H. Borgstedt, Mutat. Res., 241,261 (1990). 46. A. A. Kumar, J . H. Mangum, D. T. Blanken-
26. Z. Simon, Angew. Chem. Znt. Ed. Eng., 13,719 ship, and J. H. Freisheim, J. Biol. Chem., 266,
(1974). 8970 (1981).
47. G. D. Rose and R. Wolfenden,Annu. Rev. Bio- 71. M. Baroni, S. Clernenti, G. Cruciani, N. Ket-
phys. Biomol. Struct., 22,381 (1993). taneh-Wold, and S. Wold, Quant. Struct.-Act.
48. A. T . Hagler, P. Dauber, and S. Lifson, J. Am. Relat., 12, 225 (1993).
Chem. Soc., 101,5131 (1979). 72. M. Sjostrom and L. Eriksson in H. van de
49. W . Kauzmann, Adv. Protein Chem., 14, 1 Waterbeemd, Ed., Chemometric Methods in
(1959). Molecular Design,VCH, Weinheim, Germany,
50. A. Ben-Naim, Pure Appl. Chem., 69, 2239 1995, p. 63.
(1997). 73. L. Eriksson, E. Johansson, M . Muller, and S.
51. W. Blokzijl and J . B. F. N. Engberts, Angew. Wold, Quant. Struct.-Act. Relat., 16, 383
Chem. Znt. Ed. Engl., 32, 1545 (1993). (1997).
52. N . Muller, Acc. Chem. Res., 23,23 (1990). 74. L. Eriksson, E. Johansson, M . Muller, and S.
Wold, J. Chemom., 14,599 (2000).
53. F. Eisenhaber, Perspect. Drug Discov. Des., 17,
27 (1999). 75. C. Hansch and T . Fujita, J. Am. Chem.. Soc.,
86, 1616 (1964).
54. A. R. Fersht, J. S. Shindler, and W . C. Tsui,
Biochemistry, 19,5520 (1980). 76. C-QSAR Database, BioByte Corp., Claremont,
55. P. R. Andrews, D. J. Craik, and J . L. Matin, CA.
J.Med. Chem., 27,1648 (1984). 77. G. N. Burckhardt, W . G. K.Ford, and E. Sin-
56. N. R. Draper and H . Smith, Applied Regression gelton, J. Chem. Soc., 17 (1936).
Analysis, 2nd ed., John Wiley & Sons, New 78. L. P. Hammett, J. Chem. Ed., 43,464 (1966).
York, 1981. 79. M. Charton, Prog. Phys. Org. Chem., 8, 235
57. Y . Martin in G. Grunewald, Ed., Quantitative (1971).
Drug Design, Marcel Dekker, New York, 1978, 80. T . Fujita and T . Nishioka, Prog. Phys. Org.
p. 167. Chem., 12,49 (1976).
58. H. Kubinyi in R. Mannhold, P. Krogsgaard- 81. P. D. Bolton, K. A. Fleming, and F. M . Hall,
Larsen, and H. Timmerman, Eds., QSAR: J. Am. Chem. Soc., 94,1033 (1972).
Hansch Analysis and Related Approaches,
82. K. Kalfus, J. Kroupa, M . Vecera, and 0. Exner,
VCH, New York, 1993, p. 91.
Collect. Czech. Chem. Commun., 40, 3009
59. R. Franke in W . Th. Nauta and R. F. Rekker, (1975).
Eds., Theoretical Drug Design Methods,
83. M. Bergon and J. P. Calmon, Tetrahedron
Elsevier Science, A m s t e r d d e w York, 1983,
Lett., 22, 937 (1981).
p. 395.
60. C. Hansch in C. J. Cavallito, Ed., Structure Ac-
84. J . Schreck, J. Chem. Ed., 48, 103 (1971). -
tivity Relationships,Vol. 1, Pergamon, Oxford, 85. H. C. Brown and Y . Okarnoto, J. Am. Chem.
U K , 1973, p. 75. SOC.,80,4979 (1958).
61. J . K. Seydel, Znt. J. Quantum Chem., 20, 131 86. Y . Tsuno, T . Ibata, andY.Yukawa, Bull. Chem.
(1981). Soc. Jpn., 32,960,965,971 (1959).
62. J . G. Topliss and R. P. Edwards, J. Med. 87. J. D. Roberts and W. T . Moreland, J. Am.
Chem., 22, 1238 (1979). Chem. Soc., 75,2167 (1953).
63. P. N. Craig, J. Med. Chem., 14, 680 (1971). 88. K. Bowden in C. A. Ramsden, Ed., Comprehen-
sive Medicinal Chemistv. The Rational De-
64. J. G. Topliss, J. Med. Chem., 15,1006 (1972).
sign, Mechanistic Study and Therapeutic Ap-
65. J. G. Topliss, J. Med. Chem., 20,463 (1977). plication of Chemical Compounds, Vol. 4:
66. T . M . Bustard, J. Med. Chem., 17, 777 (1974). Quantitative Drug Design, Pergamon, Elms-
67. F. Darvas, J. Med. Chem., 17, 799 (1974). ford, NY, 1990, p. 212.
68. P. S. Magee in J. Miyamoto and P. C. Kearney, 89. A. Albert, Selective Toxicity: The Physicochem-
Eds., Pesticide Chemistry: Human Welfare and ical Bases of Therapy, 7th ed., Chapman and
Environment, Proceedings of the international Hall, London, 1985, p. 379.
Congress on Pesticide Chemistry, Vol. 1, Per- 90. M. Karelson, V. S. Lobanov, and A. R. Ka-
gamon, Oxford,U K , 1983, p. 251. tritzky, Chem. Rev., 96, 1027 (1996).
69. T . J. Mitchell, Technometrics, 16, 203 (1974). 91. P. S. Magee in ACS Symposium Series 37,
70. T. Moon, M. H. Chi, D. H. Kim, C. N. Yoon, and American Chemical Society, Washington, DC,
Y . S. Choi, Quant. Struct.-Act. Relat., 19, 257 1980.
(2000). 92. S. P. Gupta, Chem. Rev., 91,1109 (1991).
References
93. J. J. Sullivan, A. D. Jones, and K. K. Tangi, Determination and Estimation, Pergamon,

J.Chem. Znt. Comput. Sci., 40, 1113 (2000). Oxford, UK, 1986, p. 83.
94. M. Cocchi, M. C. Menziani, F. Fanelli, P. G. 116. R. Collander, Acta Chem. Scand., 5 , 774
Debenedetti, J. Mol. Struct., 331, 79 (1995). (1951).
M. Cocchi, M. Menziani, P. G. Debenedetti, A. 117. A. Leo, C. Hansch, and D. Elkins, Chem. Rev.,
Cruciani, Chemom. Zntell. Lab. Sys., 14, 209 71,525 (1971).
(1992). 118. R. F. Rekker, The Hydrophobic Fragmented
J. H. Hildebrand, Proc. Natl. Acad. Sci. USA, Constant. Its Derivation and Application: A
76,194 (1979). Means of Characterizing Membrane Systems,
G. D. Rose, A. R. Geselowitz, G. J. Lesser, R. H. Elsevier, Amsterdam, 1977, p. 131.
Lee, and M. H. Zehfus, Science, 229, 834 119. P. Seiler, Eur. J. Med. Chem., 9,473 (1974).
(1985). 120. T. Fujita, T. Nishioka, and M. Nakajima,
H. J. Schneider, Angew. Chem. Znt. Ed. Engl., J. Med. Chem., 20, 1071 (1977).
30,1417 (1991). 121. D. E. Leahy, P. J. Taylor, and A. R. Wait,
J. N. Israelachvili and H. Wennerstrom, J. Quant. Struct.-Act. Relat., 8, 17 (1989).
Phys. Chem., 96, 520 (1992). 122. J. C. Dearden, A. M. Patel, and J. M. Thubby,
J. J. H. Nusselder and J. B. F. N. Engberts, J. Pharm. Pharmacol., 26 (Suppl.), 75P
Langmuir, 7, 2089 (1991). (1974).
P. J. Taylor in C. A. Ramsden, Ed., Compre- 123. W. Draber, K. H. Buchel, and K. Dickore, Proc.
hensive Medicinal Chemistry. The Rational Znt. Congr. Pest. Chem., 2nd ed., 1971, 5, 153
Design, Mechanistic Study and Therapeutic (1972).
Application of Chemical Compounds, Vol. 4, P. Vallat, N. El Tayar, B. Testa, I. Slacanin, A.
Quantitative Drug Design, Pergamon, Elms- Martson, and K. Hostettmann, J. Chro-
ford, NY, 1990, p. 241. matogr., 504,411 (1990).
102. J. H. Hildebrand, J. Phys. Chem., 72, 1841 A. Berthod, Y. I. Han, and D. W. Armstrong, J.
(1969). Liq. Chromatogr., 11, 1441 (1988).
103. H. S. Frank and M. W. EvansJ. Chem. Phys., A. Berthod, S. Carola-Broch, and M. C. Garcia-
13,507 (1945). Alvarex-Cogne, Anal. Chem., 71,879 (1999).
104. G. Nemethy and H. A. Scheraga, J. Chem. K. Valko, C. Beran, and D. Reynolds, Anal.
Phys., 36,3382 (1962). Chem., 69,2022 (1997).
105. A. D. J. Haymet, K. A. T. Silverstin, and K. A. K. Valko, C. M. Du, C. Bevan, D. P. ~ e p o l d s ;
Dill, Faraday Discuss., 103, 117 (1996). and M. H. Abraham, Curr. Med. Chem., 8,
106. K. A. T. Silverstein, K. A. Dill, and A. D. J. 1137 (2001).
Haymet, J. Chem. Phys., 114,6303 (2001). F. Lombardo, M. Y. Shalaeva, K. A. Tupper, F.
107. A. J. Leo and C. Hansch, Perspect. Drug Dis- Gao, and M. H. Abraham, J. Med. Chem., 43,
cov. Des., 17, 1 (1999). 2922 (2000).
108. R. N. Smith, C. Hansch, and M. A. Ames, J. L. Fauch6re and V. Pliska, Eur. J. Med.
J. Pharm. Sci., 64, 599 (1975). Chem., 18,369 (1983).
109. A. Leo and C. Hansch, J. Org. Chem., 36,1539
J . L. Fauch6re in B. Testa, Ed., Advances in
(1971). Drug Research, Vol. 15, Academic Press, Lon-
doflew York, 1986, p. 29.
110. B. C. Lippold and M. S. Adel, Arch. Pharm.,
305,417 (1972). M. Akarnatsu, Y. Yoshida. H. Nakamura. M.
Asao, H. Iwamura, and T. Fujita, Quant.
111. S. E. Debolt and P. A. Kollman, J. Am. Chem. Struct.-Act. Relat., 8, 195 (1989).
Soc., 117, 5316 (1995).
133. C. Hansch, A. Leo, and D. Hoekrnan in S. R.
112. A. Leo, J. Pharm. Sci., 76, 166 (1987). Heller, Ed., Exploring QSAR: Hydrophobic,
113. A. Leo, Methods Enzymol., 202,544 (1991). Electronic and Steric Constants, Vol. 2, Amer-
114. J. de Bruijn and J. Hermens, Quant. Struct.- ican Chemical Society Professional Reference
Act. Relat., 9, 11 (1990). Book, Washington, DC, 1995.
115. E. Tomlinson, S. S. David, G. D. Parr, M. 134. G. G. Nys and R. F. Rekker, Chim. Ther., 8,
James, N. Farraj, J. F. M. Kinkel, D. Gaisser, 521 (1973).
and H. J. Wynn in W. J. Dunn 111, J. H. Block, 135. A. Leo, P. Y. C. Jow, C. Silipo, and C. Hansch,
and R. S. Pearlman, Eds., Partition Coefficient, J. Med. Chem., 14,865 (1979).
136. D. Weininger, J. Chem. Znt. Comput. Sci., 28, 158. K. Hancock, E. A. Meyers, and B. J. Yager,
31 (1988). J. Am. Chem. Soc., 83,4211 (1961).
137. D. Weininger, A. Weininger, and J. L. Wein- 159. M. Charton in M. Charton and I. Motoc, Eds.,
inger, J. Chem. Znt. Comput. Sci., 29, 97 Steric Effects in Drug Design, Springer, Berlin,
(1989). 1983, p. 57.
138. A. Leo in C. A. Ramsden, Ed., Comprehensive 160. M . S. Tute in C. A. Ramsden, Ed., Comprehen-
Medicinal Chemistry. The Rational Design, sive Medicinal Chemistry. The Rational De-
Mechanistic Study and Therapeutic Applica- sign, Mechanistic Study and Therapeutic Ap-
tion of Chemical Compounds, Vol. 4, Quantita- plication of Chemical Compounds, Vol. 4,
tive Drug Design, Pergamon, Elmsford, NY, Quantitative Drug Design, Pergamon, Elms-
1990, p. 315. ford, NY,1990, p. 18.
139. A. Leo, personal communication. 161. C. Hansch and T . Klein, Acc. Chem. Res., 19,
140. A. Leo, Chem. Rev., 93, 1281 (1993). 392 (1986).
141. A. J. Leo and D. Hoekman, Perspect. Drug Dis- 162. A. Verloop, W . Hoogenstraaten, and J. Tipker
cov. Des., 18, 19 (2000). i n E. J. Ariens, Ed., Drug Design, Vol. VII,
142. H. van de Waterbeemd and R. Mannhold, Academic Press, New Yorknondon, 1976, p.
Quant. Struct.-Act. Relat., 15, 410 (1996). 165.
143. R. Mannhold and H. van de Waterbeemd, 163. A.Verloop, The STERZMOLApproach to Drug
J. Cornput.-Aided Mol. Des., 15,337 (2001). Design, Marcel Dekker, New York, 1987.
144. R. F. Rekker and H. M. DeKort, Eur. J. Med. 164. C. Hansch, D. Hoekman, A. Leo, D.Weininger,
Chem., 14,479 (1979). and C. D. Selassie, unpublished results.
165. V . A. Levin, J. Med. Chem., 23, 682 (1980).
145. G. Klopman, J. W . Li, S. Wang, a n d M. Dima-
yuga, J. Chem. Znf. Comput. Sci., 34, 752 166. E. J. Lien and P. H. Wang, J. Pharm. Sci., 69,
(1994). 648 (1980).
146. A. K. Ghose and G. M. Crippen, J. Med. Chem., 167. C. D. Selassie, C. Hansch, and T . Khwaja,
28,333 (1985). J. Med. Chem., 33,1914 (1990).
147. T . Suzuki and Y . Kudo, J. Cornput.-Aided Mol. 168. E. J. Lien, L. L. Lien, and H. Gao i n F. Sanz, J.
Des., 4, 155 (1990). Guiraldo, and F. Manaut, Eds., QSAR and Mo-
148. I. Moriguchi, S. Hirono, Q. Liu, I. Nakagome, lecular Modelling: Concepts, Computational
Tools and Biological Applications, Prous Sci-
and Y . Matsushita, Chem. Pharm. Bull., 40,
127 (1992). ence, BarcelonaPhiladelphia, 1995, p. 94. '
149. G. E. Kellogg, G. J. Joshi, and D. J. Abraham, 169. C. Selassie, unpublished results.
J. Med. Chem. Res., 1,444 (1992). 170. M. Recanatini, T . Klein, C. Z . Yang, J . McCla-
150. J. Devillers, D. Domine, C. Guillon, and W . J. rin, R. Langridge, and C. Hansch, Mol. Phar-
Karcher, J. Pharm. Sci., 87, 1086 (1998). macol., 29, 436 (1986).
151. M. J. Kamlet, P. W . Cam, R.W . Taft,and M. H. 171. Y . Naito, M. Sugiura, Y . Yamamura, C.
Abraham, J. Am. Chem. Soc., 103, 6062 Fukaya, K.Yokoyama,Y . Nakagawa, T . Ikeda,
(1981). M . Senda, and T . Fujita, Chem. Pharm. Bull.,
39, 1736 (1991).
152. M. J. Kamlet, J. L. Abboud, M. Abraham, and
R. T a f t , J. Org. Chem., 48,2877 (1983). 172. A. K. Debnath, R. L. L. de Compadre, G. Deb-
nath, A. J. Shusterman, and C. Hansch,
153. J. A. Platts, D. Butina, M. H. Abraham, and A.
J. Med. Chem., 34,786 (1991).
Hersey, J. Chem. Znf. Comput Sci., 39, 835
(1999). 173. M. Randic, J. Am. Chem. Soc., 97,6609 (1975).
154. Y . Ishihama and N. Asakawa, J. Pharm. Sci., 174. L. B. Kier and L. H. Hall, Molecular Connectiv-
88, 1305 (1999). ity in Chemistry and Drug Research, Academic
155. J. A. Platts, M. H. Abraham, D. Butina, and A. Press, New Yorknondon, 1976.
Hersey, J. Chem. Znf. Comput. Sci., 40, 71 175. L. B. Kier and M. H. Hall, J. Pharm. Sci., 72,
(2000). 1170 (1983).
156. A. J. Leo, J. Pharm. Sci., 89, 1567 (2000). 176. L. H. Hall and L. B. Kier, J. Pharm. Sci., 64,
157. R.W . T a f t in M. S. Newrnan, Ed., Steric Effects 1978 (1975).
i n Organic Chemistry, John Wiley & Sons, 177. J. Gough and L. H. Hall, J. Chem. Znf Comput.
New York, 1956, p. 556. Sci., 39, 356 (1999).
References
178. J. K. Boulamwini, K. Raghavan, M. Fresen, Y . 198. J. Kunes, J. Jachym, P. Tirasko, Z. Odlerova,

Pommier, K. Kohn, and J. Weinstein, Pharm. and K. Waisser, Collect. Czech. Chem. Com-
Res., 13,1892(1995). mun., 62,1503(1997).
179. V.E. F. Heinzen,V. Cechinel, and R. A.Yunes, 199. Y . Terada and K. Naya, Pharmazie, 55, 133
Farmaco, 54,125(1999). (2000).
180. R. L. Lopez de Compadre, C. M. Compadre, R. 200. S. P. Gupta and A. Paleti, Bioorg. Med. Chem.,
Castillo, and W . J. DunnIII, Eur. J. Med. 6,2213(1998).
Chem., 18,569(1983). 201. C. Tmej, P. Chiba, M. Huber, E. Richter, M .
181. H.Kubinyi, Quant. Struct.-Act. Relat., 14,149 Hitzler, K. J. Schaper, and G. Ecker, Arch.
(1995). Pharm., 331,233(1998).
182. P. A. J. Janssen and N. B. Eddy, J. Med. 202. C. Selassie and T . E. Klein in J. Devillers, Ed.,
Pharm. Chem., 2,31(1960). Comparative QSAR, Taylor & Francis, Wash-
183. T. Fujita in C. A. Ramsden, Ed., Comprehen- ington, DC, 1998, p. 235.
sive Medicinal Chemistry. The Rational De- 203. C. D. Selassie, W. X. Gan, M. Fung, and R.
sign, Mechanistic Study and Therapeutic Ap- Shortle i n F. Sanz, J. Giraldo, and F. Manaut,
plication of Chemical Compounds, Vol. 4, Eds., QSAR and Molecular Modelling: Con-
Quantitative Drug Design, Pergamon, Elms- cepts, Computational Tools and Biological Ap-
ford, NY, 1990, p. 503. plications, Prous Science, Barcelonflhiladel-
184. C. Hansch, D. Kim, A. J. Leo, E. Novellino, C. phia, 1995, p. 128.
Silipo, and A. Vittoria,CRC Crit. Rev. Toxicol., 204. A.Kurup, R.Garg, and C. Hansch, Chem. Rev.,
19,185(1989). 100,909(2000).
185. C. Hansch and W . J. DunnIII, J. Pharm. Sci., 205. J. M. Blaney, C. Hansch, C. Silipo, and A. Vit-
61, l(1972). torio, Chem. Rev., 84,333(2000).
186. T . W . Schultz and M. Tichy, Bull. Environ. 206. C. Hansch, Ann. N. Y. Acad. Sci., 186, 235
Contam. Toxicol., 51,681(1993). (1971).
187. J. T . Penniston, L. Beckett, D. L. Bentley, and 207. C. Hansch, B. A. Hathaway, Z. R. Guo, C. D.
C. Hansch, Mol. Pharmacol., 5,333(1969). Selassie, S. W . Dietrich, J. M. Blaney, R. Lang-
188. C. Hansch, Adv. Chem. Ser., 114,20(1972). ridge, K. W . Volz, and B. T . Kaufman, J. Med.
189. C. Hansch and J. M. Clayton, J. Pharm. Sci., Chem., 27,129(1984).
62, l(1973). 208. B. A. Hathaway, Z. R. Guo, C. Hansch, T . J.
190. R. Franke and W. Schmidt, Acta Biol. Med. Delcamp, S. S. Susten, and J. H. Freisheim,
J. Med. Chem., 27,144(1984).
-
Germ., 31,273(1973).
191. J. McFarland, J.Med. Chem., 13,1192(1970). 209. C. D. Selassie, C. D. Strong, C. Hansch, T . Del-
192. H . Kubinyi and 0 . H. Kehrhahn, Arzneim.- camp, J. H. Freisheim, and T . A. Khwaja, Can-
Forsch., 28,598(1978). cer Res., 46,744(1986).
193. H. Kubinyi, Arzneim.-Forsch., 29,1067(1979). 210. C. K.Marlowe, C. D. Selassie, and D. V. Santi,
194. R. Franke i n W . T h . Nauta and R. F. Rekker, J. Med. Chem., 38,967(1995).
Eds., Theoretical Drug Design Methods, 211. R. G. Booth, C. D. Selassie, C. Hansch, and
Elsevier, New York, 1984, p. 256. D.V . Santi, J. Med. Chem., 30,1218(1987).
195. H . Kubinyi i n C. A. Ramsden, Ed., Comprehen- 212. L. C. Chio and S. F . Queener, Antimicrob.
sive Medicinal Chemistry. The Rational De- Agents Chemother., 37,1916 (1993).
sign, Mechanistic Study and Therapeutic Ap- 213. J. Y . Fukunaga, C. Hansch, and E. E. Stellar,
plication of Chemical Compounds, Vol. 4, J. Med. Chem., 19,605(1976).
Quantitative Drug Design, Pergamon, Elms- 214. B. K. Chen, C. Horvath, and J. R. Bertino,
ford, NY, 1990, p. 539. J. Med. Chem., 22,483(1979).
196. C. John Blankley in J. G. Topliss, Ed., Quanti- 215. N. V. Harris, C. Smith, and K. Bowden, Eur.
tative Structure Activity Relationships of J. Med. Chem., 27,7(1992).
Drugs, Academic Press, New York, 1983, p. 5.
216. A. M. Klibanov, Nature, 409,241(2001).
197. E. Yalcin, S. E. Sener, I. O w e n , and 0. Temiz
in E. Sanz, J. Giraldo, and F. Manaut, Eds., 217. K. H. Kim, J. Cornput.-AidedMol. Des., 15,367
QSAR and Molecular Modelling: Concepts, (2001).
Computational Tools and Biological Applica- 218. K.H. Kim, Bioorg. Med. Chem., 9,1951(2001).
tions, Prous Science, Barcelonflhiladelphia, 219. K. Nakamura, K. Hayashi, I. Ueda, and H. Fu-
1995, p. 147. jiwara, Chem. Pharm. Bull., 43,369(1995).
220. C. M.Compadre, R. J. Sanchez, C. Bhurane- 236. R. Garg, S. Kapur, and C. Hansch, Med. Res.
swarm, R. L. Compadre, D. Plunkett, and Rev., 21,73 (2000).
S. G. Novick in C. G. Wermuth, Ed., Trends in 237. L. Zhang, H. Gao, C. Hansch, and C. Selassie,
QSAR and Molecular Modelling, Escom, J.Chem. Soc. Perkin Trans. 2,2553(1998).
Strasbourg, France, 1993,p. 112. 238. C. Hansch, S. McKarns, C. J. Smith, and D. J.
221. S. V.Frye, C. D. Haffner, P. R. Maloney, R. A. Doolittle, Chem.-Biol. Interact., 127, 61
Mook, Jr., G. F. Dorsey, R. N. Hiner, C. M. (2000).
Cribbs, T. N. Wheeler, J. A. Ray, R. C. An- 239. L. A.Oglesby, M. T. Ebon-McCoy, T. R. Logs-
d r e w ~ K.
, W. Batchelor, H. N. Branson, J. D. don, F. Copeland, P. E. Beyer, and R. J. Kav-
Stuart, S. L. Schwiker, J. Van Arnold, S. lock, Teratology, 45,11 (1992).
Croom, D. M. Bickett, M. L. Moss, G. Tian,
R. 3. Unwalla, F. W. Lee, T. K. Tippin, M. K. 240. C. Hansch and H. Gao, Chem. Rev., 97, 2995
James, M. K. Grizzle, J. E. Long, and S. V. (1997).
Schuster, J.Med. Chem., 37,2352(1994). 241. A. M.Richard, J. K. Hongslo, P. F. Boone, and
222. S. V.Frye, C. D. Haffner, P. R. Maloney, R. N. J. A. Holme, Chem. Res. Toxicol.,4,151(1991).
Hiner, G. F. Dorsey, R. A. Roe, R. J. Unwalla, 242. C. D. Selassie, A. J. Shusterman, S. Kapur,
K. W. Batchelor, H. N. Branson, J. D. Stuart, R. P. Verma, L. Zhang, and C. Hansch,
S. L. Schwiker, J. Van Arnold, D. M. Bickett, J. Chem. Soc. Perkin Trans. 2,2729(1999).
M. L. Moss, G. Tian, F. W. Lee, T. K. Tippin, 243. D.Boyd in A. L. Parrill and M. Rami-Reddy,
M. K. James, M. K. Grizzle, J. E. Long, and Eds., Rational Drug Design, ACS Symposium
D. K. Croom, J. Med. Chem., 38,2621(1995). Series 719,American Chemical Society, Wash-
223. M. T. D. Cronin and J. C. Dearden, Quant. ington, DC, 1999,p. 346.
Struct.-Act. Relat., 14,518 (1995). 244. E. Plummer in C. Hansch and T. Fujita, Eds.,
224. M. T. D. Cronin, B. W. Gregory, and T. W. Classical and Three-Dimensional QSAR in
Schultz, Chem. Res. Toxicol., 11,902 (1998). Agrochemistry, ACS Symposium Series 606,
225. T. W.Schultz, Chem. Res. Toxicol., 12, 1262 American Chemical Society, Washington, DC,
(1999). 1995,p. 241.
226. M. T. D. Cronin and T. W. Schultz, Chemo- 245. T. Fujita, Quant. Struct.-Act. Relat., 16, 107
sphere, 32,1453(1996). (1997).
227. R. Garg, A. Kurup, and C. Hansch, Crit. Rev. 246. H.Koga, A. Itoh, S. Murayarna, S. Suzue, and
Toxicol., 31,223(2001). T. Irikura, J. Med. Chem., 23,1358(1980).
228. C. D. Selassie, T. V. DeSoyza, M. Rosario, H. 247. H. Chuman, A. Ito, T. Shaishoji, and S.
Gao, and C. Hansch, Chem.-Biol. Interact., Kumazawa in C. Hansch and T. Fujita, Eds.,
113,175(1998). Classical and Three-Dimensional QSAR in
Agrochemistry, ACS Symposium Series 606,
229. M.T. D. Cronin and T. W. Schultz, Chem. Res. American Chemical Society, Washington, DC,
Toxicol., 14,1284 (2001). 1995,p. 171.
230. P.H.Hinderling, 0.Schmidlin, and J. K. Sey- 248. J. Ohtaka and G. Tsukamoto, Chem. Pharm.
del, J. Pharmacokinet. Biopharm., 12, 263 Bull., 35,4117(1987).
(1984).
249. M.Kuchar, E. Maturova, B. Brunova, J. Gri-
231. C. Selassie and T. E. Klein in H. Kubinyi, Ed., mova, H. Tomkova, and K. J. Holubek, Collect.
3 0 QSAR in Drug Design. Theory, Methods Czech. Chem. Commun., 53,1862 (1988).
and Applications, Escom Science, Leiden, The
Netherlands, 1993,p. 257. 250. T. Fujita in G. Jolles and K. R. H. Wooldridge,
Eds., Drug Design: Fact or Fantasy, Academic
232. 0. Geban, H. Ertepinar, M. Yurtsever, S. Press, London, 1984,p. 19.
Ozden, and F. Gumus, Eur. J. Med. Chem., 34,
753(1999). 251. J. G. Topliss, Perspect. Drug Discov. Des., 1,
233. S. Daunes, C. D'Silva, H. Kendrick, V. Yardley, 253(1993).
and S. L. Croft, J.Med. Chem.,44,2976(2001). 252. C. Hansch, J. P. Bjorkroth, and A. Leo,
234. C. Hansch, H. Gao, and D. Hoekman in J. Dev- J. Pharm. Sci., 76,663(1987).
illers, Ed., Comparative QSAR, Taylor & Fran- 253. C. Hansch, R. Garg, and A. Kurup, Bioorg.
cis, Washington, DC, 1998,p. 285. Med. Chem., 9, 283 (2001).
235. C. Hansch, B. R. Telzer, and L. Zhang, Crit. 254. R. Garg, A. Kurup, S. B. Mekapati, and C.
Rev. Toxicol., 25,67 (1995). Hansch, Bioorg. Med. Chem., in press (2002).
CHAPTER TWO
Recent Trends in Quantitative

Structure-Activity Relationships
A. TROPSHA
Laboratory for Molecular Modeling
School of Pharmacy
University of North Carolina
Chapel Hill, North Carolina
Contents
1 Introduction, 50
1.1 A Unified Concept of QSAR, 51
1.2 The Taxonomy of QSAR Approaches, 52
2 Multiple Descriptors of Molecular Structure, 54
2.1 Topological Descriptors, 54
2.2 3D Descriptors, 55
3 QSAR Modeling Approaches, 58
3.1 3D-QSAR, 58
3.2 The Descriptor Pharmacophore Concept and
Variable Selection QSAR, 60
3.2.1 Linear Models, 61
3.2.2 Nonlinear Models, 62
.
4 Validation of QSAR Models, 63
4.1 Beware of q2, 64
4.2 Rational Selection of Training and Test Sets,
64
4.3 Guiding Principles of Safe QSAR, 66
5 QSAR Models a s Virtual Screening Tools, 66
5.1 Data Mining and SAR Analysis, 66
5.2 Virtual Screening, 67
5.3 Rational Library Design by use of QSAR, 68
6 Conclusions, 69

Sixth Edition, Volume 1: Drug Discovery
Edited by Donald J. Abraham
ISBN 0-471-27090-3 O 2003 John Wiley & Sons, Inc.
Recent Trends in Quantitative Structure-Activity Relationships
1 INTRODUCTION
Quantitative structure-activity relationship

(QSAR) methodology was introduced by
Hansch et al. in the early 1960s (1, 2). The
approach stemmed from linear free-energy re-
lationships in general and the Hammett equa-
tion in particular (3). It is based on the as-
sumption that the difference in structural
properties accounts for the difference in bio- 1&30 1970 1980 1990 2000
logical activities of compounds. According to Year
this approach, the structural changes that af-
fect the biological activities of a set of conge- Figure 2.1. Growth in the number of chemical
ners are of three major types: electronic, compounds, excluding biopolymers, registered by
steric, and hydrophobic (4). These structural the Chemical Abstract Service (CAS).
properties are often described by Hammett growth has been phenomenal: CAS currently
electronic constants (51, Verloop STERIMOL contains more than 39 million compounds, in-
parameters (6), hydrophobic constants (51, to cluding biological sequences [and it does not
name but a few. The relationship between a include chemical libraries, which literally in-
biological activity (or chemical property) and clude billions of compounds (14)l. Naturally,
the structural parameters is obtained through the growth of molecular databases has been
the use of linear or multiple linear regression concurrent with the acceleration of the drug
(MLR) analysis. The fundamentals and appli- discovery process. According to an excellent,
cations of this method in chemistry and biol- recent historical account of drug discovery
ogy have been summarized by Hansch and Leo (15), as the result of high throughput screen-
(4) and an account of the most recent developing (HTS) technologies, the amount of raw
ments in this area of traditional QSAR ap- data points obtained by a large pharmaceuti-
pears in the chapter by Celassie in this series cal company per year has increased from ip-
(7). As discussed in that chapter, the history of proximately 200,000 at the beginning of last
modern QSAR counts over 40 years of active decade to around 50 million today. The total
research in method development and its appli- number of drugs used worldwide is approxi-
cations. It is practically impossible to review mately 80,000, which reportedly act at less
all, even relatively recent, developments in the than 500 confirmed molecular targets (15).
field in a single chapter. Several reviews and Recent estimates suggest that the number of
monographs on QSAR and its applications potential targets lies between 5000 and
have been published in recent years (4,8-12) 10,000, approximately 10-fold greater than
and the reader is referred to this collection of the number of targets currently pursued (15).
general references and publications cited Although traditional QSAR modeling has
therein for additional in-depth information. been typically limited to deal with a maximum
One of the most characteristic features of of several dozen compounds at a time, rapid
the modern age QSAR'as an integral part of generation of large quantities of data requires
drug design and discovery is an unprecedented new methodologies for data analysis. New ap-
growth of biomolecular databases, which con- proaches need to be developed to establish
tain data on chemical structure and, in some QSAR models for hundreds, if not thousands,
cases, biological activity (or other relevant of molecules. These new methods should be
drug properties such as toxicity or mutagenic- robust, yet computationally efficient, to com-
ity) of chemicals. Figure 2.1 illustrates the fast pete with the experimental methods of drug
growth of one of such databases, the Chemical discovery, such as combinatorial chemistry
Abstract Service (CAS) registry file (13). The and HTS.
1 Introduction
This chapter concentrates on recent trends

and developments in QSAR methodology, Structure Target Property Structural Properties
Id (EC,,, Ki,etc.) (descriptors)
which are characterized by the growing size of
the data sets subjected to the QSAR analysis, Comp. 1 P1 D l 1 Dl2 "' Dln
use of multiple descriptors of chemical struc-
ture, application of both linear and, especially, Comp. 2 P2 021 D22 "' D2n
nonlinear optimization algorithms applicable
to multidimensional data modeling, growing
... ...
emphasis on the rigorous model validation, Comp. m Pm Dm1 Dm2 ... Dmn
and application of QSAR models as virtual
screening tools in database mining and chem-
ical library design. We begin by presenting a
unified concept of QSAR, emphasizing com-
mon aspects of different QSAR methodologies.
We then consider some popular approaches to Figure 2.2. Standard QSAR table is a general
the derivation of molecular descriptors and starting point of any QSAR approach.
optimization algorithms in the context of
three important components of any QSAR in- as in the Hansch QSAR approach], where tar-
vestigation: model development, model valida- get property can be calculated directly from
tion, and model utility. We conclude with sev- the descriptor values, or nonlinear (such as
eral remarks on present status and future artificial neural networks or classification
developments in this exciting research disci- QSAR methods), where descriptor values are
pline. used in characterizing chemical similarity be-
tween molecules, which in turn is used to pre-
1.1 A Unified Concept of QSAR
dict compound activity. In general, each com-
An inexperienced user or sometimes even an pound can be represented by a point in a
avid practitioner of QSAR could be easily con- multidimensional space, in which descriptors
fused by the multitude of methodologies and Dl, D,, - . . , D, serve as independent coordi-
naming conventions used in QSAR studies. nates of the compound. The goal of QSAR
Two-dimensional (2D) and three-dimensional modeling is to establish a trend in the descrip- .
(3D)QSAR, variable selection and artificial tor values, which correlates, in a linear or non-
neural network methods, comparative molec- linear fashioin, with the trend in biological ac-
ular field analysis (CoMFA),and binary QSAR tivity. All QSAR approaches imply, directly or
present examples of various terms that may indirectly, a simple similarity principle, which
appear to describe totally independent ap- for a long time has provided a foundation
proaches, which cannot be even compared to for experimental medicinal chemistry: com-
each other. In fact, any QSAR method can be pounds with similar structures are expected to
generally defined as the application of mathe- have similar biological activities. This implies
matical and statistical methods to the problem that points representing compounds with sim-
of finding empirical relationships (QSAR mod- ilar activities in multidimentional descriptor
els)of the form Pi= $(D,, D,, - - . D,), where Pi space should be geometrically close to each
are biological activities (or other properties of other, and vice versa.
interest) of molecules, Dl, D,, . . , D, are cal- Despite formal differences between various
culated (or, sometimes, experimentally mea- methodologies, any QSAR method is based on
sured) structural properties (molecular de- a QSAR table, which can be generalized, as
scriptors) of compounds, and 6 is some shown in Fig. 2.2. To initiate a QSAR study,
empirically established mathematical trans- this table must include some identifiers of
formation that should be applied to descrip- chemical structures (e.g., company's ID num-
tors to calculate the property values for all bers, first column of the table in Fig. 2.2), re-
molecules. The relationship between values of liably measured values of biological activity
descriptors D and target properties P can be [or any other target property of interest (e.g.,
linear [e.g., multiple linear regression (MLR) solubility, metabolic transformation rate, etc.;
52 Recent Trends in Quantitative Structure-Activity Relationships
second column)], and calculated values of mo- sition and coordinates of all atoms. Thus, in
lecular descriptors in all remaining columns general, all QSAR models can be universally
(sometimes, experimentally determined phys- compared in terms of their statistical signifi-
ical properties of compounds can be used as cance and, most important, their ability to
descriptors as well). predict accurately biological activities (or
The differences in various QSAR method- other target properties) of molecules not in-
ologies can be understood in terms of types of cluded in the training set (cf. molecular me-
target property values, types of descriptors, chanics, where different methods are ulti-
and differences in optimization algorithms mately compared by their ability to reproduce
used to relate descriptors to the target proper- experimental molecular geometries). This
ties. The target property values can be defined concept of statistical robustness and the pre-
as activity classes [i.e., active or inactive, fre- dictive ability as universal characteristics of
quently encoded numerically for the purpose any QSAR model independent of the particu-
of the subsequent analysis as one (for active) lars of individual approaches should be kept in
or zero (for inactive)] or as a continuous range mind as we consider examples of QSAR tools,
of values; the corresponding methods of data their applications, and pitfalls in the subse-
analysis are referred to as classification or con- quent sections of this chapter.
tinuous property QSAR, respectively. Descrip-
1.2 The Taxonomy of QSAR Approaches
tors can be generated from various represen-
tations of molecules (e.g., 2D chemical graphs Many different approaches to QSAR have
or 3D molecular geometries), giving rise to the been developed since Hansch's seminal work.
terms of 2D- or 3D-QSAR, respectively. Fi- As briefly discussed above, the major differ-
nally, the types of optimization algorithms ences between these methods can be analyzed
used in the QSAR model development lead to from two viewpoints: (1)the types of struc-
the definitions of linear versus nonlinear tural parameters that are used to characterize
QSAR methods. molecular identities, starting from different
In some cases, the types of biological data, representation of molecules, from simple
the choice of descriptors, and the class of opti- chemical formulas to three-dimensional con-
mization methods are closely related and mu- formations; and (2) the mathematical proce-
tually inclusive. For instance, multiple linear dure that is employed to obtain the quantita-
regression can be applied only when a relative relationship between these structural '
tively small number of molecular descriptors parameters and biological activity.
are used (at least five to six times smaller than On the basis of the origin of molecular de-
the total number of compounds) and the tar- scriptors used in calculations, QSAR methods
get property is characterized by a continuous can be divided into three groups. One group is
range of values. The use of multiple descrip- based on a relatively small number (usually
tors makes it impossible to use MLR because many times smaller than the number of com-
of a high chance of spurious correlation (16) pounds in a data set) of physicochemical prop-
and requires the use of partial least squares or erties and parameters describing, for example,
nonlinear optimization techniques. However, hydrophobic, steric, and electrostatic effects.
in general, for any given data set a user could Usually, these descriptors are used as inde-
choose between various types of descriptors pendent variables in multiple regression ap-
and various optimization schemes, combining proaches (18).In the literature, these methods
them in a practically mix-and-match mode, to are typically referred to as Hansch analysis
arrive at statistically significant QSAR models (8).These types of descriptors and correspond-
in a variety of ways. This situation is in es- ing linear optimization methods used in tradi-
sence analogous to molecular mechanics cal- tional QSAR analyses are discussed exten-
culations (17), where different force fields and sively in the chapter by Celassie (7) and
differently derived parameters are developed therefore is not reviewed here.
by different groups, although the common More recent methods are based on quanti-
goal is to compute (unique) optimized geome- tative characteristics of molecular graphs
tries of molecules from their chemical compo- (molecular topological descriptors). Because
1 Introduction
molecular graphs or structural formulas are 3D-QSAR methods require 3D alignment of all
"two-dimensional," these methods are re- molecules according to a phannacophore
ferred to as 2D-QSAR. Most of the 2D-QSAR model or based on ligand docking to a recep-
methods are based on graph theoretical indi- tor-binding site. Descriptors in the case of
ces, which have been extensively studied by CoMFA (40, 43) and CoMFA-like methods
Randic (19) and Kier and Hall (20-22). They such as COMBINE (44), COMSiA (45), and
include, for example, molecular connectivity QsiAR (46) represent electrostatic, steric, and
indices (19, 20), molecular shape indices (23, hydrophobic field values (to name but a few
24), topological (25) and electrotopological examples) in the grid points surrounding mol-
state indices (26-291, and atom-pair descrip- ecules.
tors (30, 31). Sometimes, topological descrip- Finally, QSAR methods can also be classi-
tors are also combined with physicochemical fied by the type of the correlation methods
properties of molecules. Although these struc- used in model development. Linear methods
tural indices represent different aspects of include linear regression or MLR, PLS (41,42,
molecular structures, and, what is important 47), or principal component regression (PCR),
for QSAR, different structures provide nu- whereas nonlinear methods can be exempli-
merically different values of indices, their fied, for example, by k-Nearest Neighbors
physicochemical meaning is frequently un- (kNN) (48,49) and artificial neural networks
clear. The successful applications of topologi- (50) methods. An example of the linear meth-
cal indices combined with multiple linear ods is provided by the ADAPT system, which
regression (MLR) analysis have been summa- employs topological indices as well as other
rized by Kier and Hall (20,21,28). calculable structural parameters (e.g., steric
The third group of methods is based on de- and quantum mechanical parameters), and
scriptors derived from spatial (three-dimen- the MLR method for QSAR analysis. It has
sional) representation of molecular struc- been extensively applied to QSARIQSPR stud-
tures. Correspondingly, these methods are ies in analytical chemistry, toxicity analysis,
referred to as three-dimensional or 3D-QSAR; and other biological activity prediction (51-
they have become increasingly popular with 54). Parameters derived from various experi-
the development of fast and accurate compu- ments through chemometric methods have
tational methods for generating 3D conforma- also been used in the study of peptide QSAR
tions and alignments of chemical structures. (55), where PLS analysis was employed. The
The early examples of 3D-QSAR include mo- latter technique has been used almost exclu-
lecular shape analysis (MSA) (32),distance ge- sively in 3D-QSAR, where the number of de-
ometry (33, 34), and Voronoi techniques (35). scriptors characterizing molecular fields may
The first method uses shape descriptors and exceed the number of compounds by orders of
multiple linear regression analysis, whereas magnitude.
the latter methods apply atomic refractivity as There has been a great deal of interest, es-
structural descriptors and the solution of pecially more recently, in the use of data min-
mathematical inequalities to obtain the quan- ing methods to extract the information from
titative relationships. These two methods large andlor chemically inhomogeneous data
have been applied to the study of structure- sets. Examples of these methods include pat-
activity relationships of many data sets by tern recognition (56,571,automated structure
Hopfinger (e.g., Refs. 36,37) and Crippen (e.g., evaluation (58, 59), neural network (60-621,
Refs. 38, 39), respectively. and machine learning (63-65). Recent trends
Perhaps the most popular example of 3D- in QSAR studies also include developing opti-
QSAR is the comparative molecular field anal- mal QSAR models through variable selection,
ysis (CoMFA),developed by Cramer et al. (40), that is, by selecting a subset of available de-
which has elegantly combined the power of 3D scriptors in either MLR, PLS, or nonlinear
molecular modeling and partial least-square classification or artificial neural networks
(PLS) optimization technique (41, 42) and (ANN) analysis as applied either in 2D- (66-
found wide applications in medicinal chemis- 72) or in 3D-QSAR (73). These methods em-
try and toxicity analysis (see below). Most of ploy either generalized simulated annealing
(67), or genetic algorithms (68), or evolution- explanatory power, which has been a charac-
ary algorithms (69-72) as optimization tools. teristic feature of many traditional QSAR ap-
The effectiveness and convergence of these al- proaches.
gorithms are strongly affected by the choice of
a fitting function, which drives the optimiza- 2 MULTIPLE DESCRIPTORS OF
tion process (70-72). It has been demon-
MOLECULAR STRUCTURE
strated that optimization combined with vari-
able selection effectively improves QSAR
It has been said frequently that there are
models as compared to those without variable
three keys to the success of any QSAR model
selection. For example, GOLPE (74) was de-
building exercise: descriptors, descriptors,
veloped through the use of chemometric prin-
and descriptors. Many different molecular
ciples and q2-GRS(75) was developed on the
representations have been proposed, exempli-
basis of independent CoMFA analysis of small
fied by Hansch-type parameters (21, topologi-
areas of CoMFA descriptor space, to address
cal indices (19, 79), quantum mechanical de-
the issue of region selection. Both of these
scriptors (80), molecular shapes (32, 81),
methods have been shown to improve QSAR
molecular fields (40), atomic counts (821, 2D
models compared to the original CoMFA tech-
fragments (83-85), 3D fragments (86- 88),
nique.
molecular eigenvalues (89), molecular multi-
Different QSAR methods have their own
pole moments (go), E-state fields (28), molec-
strengths and weaknesses. For example, 3D-
ular fragment-based hash codes (91, 92), and
QSAR methods generally result in the dia-
molecular holograms (93). A recent review by
grams of important molecular fields that can
Livingstone provides an excellent survey of
be easily interpreted in terms of specific steric
various 2D and 3D descriptors, along with
and electrostatic interactions important for
some associated diversity and similarity func-
the ligand binding to their receptor. However,
tions (9). Various physicochemical parameters
the need to align structures in 3D, which is
such as the partition coefficient, molar refrac-
time-consuming and subjective, precludes the
tivity, and quantum mechanical quantities
use of 3D-QSAR techniques for the analysis of
such as highest occupied molecular orbital
large data sets. On the other hand, 2D-QSAR
(HOMO) and lowest occupied molecular or-
methods are much faster and more amenable
bital (LUMO) energies have been used to r e p
to automation because they require no confor-
resent molecular identities in early QSAR
mational search and structural alignment.
studies by the use of linear and multiple linear
Thus, 2D methods are best suited for the anal-
regression. However, these descriptors are not
ysis of large numbers of compounds and com-
suited for the analysis of large numbers of
putational screening of molecular databases;
molecules, either because of the lack of physi-
however, the interpretation of the resulting
cochemical parameters for compounds yet to
models in familiar chemical terms is fre-
be synthesized or because of the computa-
quently difficult, if not impossible.
tional expenses required by quantum mechan-
The generality of the QSAR modeling ap-
ical methods. Recent years have seen the ap-
proach as a drug discovery tool, irrespective of
plication of various topological descriptors
descriptor types or optimization algorithms,
that are usually derived from either 2D or 3D
can be best demonstrated in the context of in-
molecular structural information based on the
verse QSAR, which can be defined as design-
graph theory or molecular topology (20-22,
ing or discovering molecular structures with a
94). These descriptors are generated on the
desired property on the basis of QSAR models
basis of the molecular connectivity, 3D molec-
(76-78).In practical terms, inverse QSAR also
ular topography, and molecular field proper-
includes searching for molecules with a de-
ties.
sired target property in chemical databases or
virtual chemical libraries. These consider-
2.1 Topological Descriptors
ations emphasize the universal importance of
establishing QSAR model robustness and pre- Two widely applied examples of 2D molecular
dictive ability as opposed to concentrating on descriptors are molecular connectivity indices
2 Multiple Descriptors of Molecular Structure
(MCI) and atom-pair (AP) descriptors. Molec- mat (101) as follows: (1) negative charge cen-
ular connectivity indices, X , were first formu- ter (NCC); (2) positive charge center (PCC);
lated by Randic (19) and subsequently gener- (3) hydrogen bond acceptor (HA); (4) hydro-
alized and extended by Kier and Hall (20-22). gen bond donor (HD); (5)aromatic ring center
The fundamentals and applications of molec- (ARC); (6) nitrogen atoms (N); (7) oxygen at-
ular connectivity indices have been thor- oms (0); (8)sulfur atoms (S); (9) phosphorous
oughly reviewed (22,28).A popular MolConnZ atoms (P);(10) fluorine atoms (FL); (11)chlo-
software (95) affords the computation of a rine, bromine, iodine atoms (HAL); (12) car-
bon atoms (C); (13) all other elements (OE);
wide range of topological indices of molecular
(14) triple bond center (TBC);and (15)double
structure. These indices include (but are not
bond center (DBC). Apparently, the total
limited to) the following descriptors: simple
number of pairwise combinations of all 15
and valence path, cluster, pathlcluster and atom types is 120. Furthermore, distance bins
chain molecular connectivity indices, kappa should be defined to discriminate between
molecular shape indices, topological and elec- identical atom pairs separated by different
trotopological state indices, differential graph distances and therefore representing
connectivity indices, the graph's radius and different molecular substructures. Thus, 15
diameter, Wiener and Platt indices, Shannon distance bins can be introduced in the interval
and Bonchev-Trinajstic information indices, between graph distance zero (i.e., zero atoms
counts of different vertices, and counts of separating an atom pair) to 14 and greater.
paths and edges between different kinds of Thus, in this format a total of 1800 (120 X 15)
vertices (19, 20, 96-100). AP descriptors can be generated for any mo-
Overall, MolConnZ (95) produces over 400 lecular structure. An example of an atom-pair
different descriptors. Most of these descrip- descriptor is shown on Fig. 2.4. Frequently, as
tors characterize chemical structure, but sev- applied to particular data sets, many of the
eral depend on the arbitrary numbering of at- theoretically possible AP descriptors have
oms in a molecule and are introduced solely for zero value (implying that certain atom types
bookkeeping purposes. In a typical QSAR or atom pairs are absent in molecular struc-
study, only about one-half of all possible Mol- tures). For instance, in our recent studies of 48
ConnZ descriptors are eventually used, after anticonvulsant agents, only 273 descriptors
deleting descriptors with zero value or zero with nonzero value and nonzero variance were'
variance. Figure 2.3 provides a summary of generated (102).
these molecular descriptors and presents
2.2 3D Descriptors
some algorithms used in their derivation.
The idea of using atom pairs as molecular The rapid increase in structural three-dimen-
features in structure-activity studies was first sional (3D) information of bioorganic mole-
proposed by Carhart et al. (84). AP descriptors cules (103, 104), coupled with the develop-
are defined by their atom types and topological ment of fast methods for 3D structure
distance bins. An AP is a substructure defined generation [e.g., CONCORD (105, 106) and
by two atom types and the shortest path sep- CORINA (107)] and alignment [e.g., Active
aration (or graph distance) between the at- Analog Approach (43, 108)], have led to the
oms. The graph distance is defined as the development of 3D structural descriptors and
smallest number of atoms along the path con- associated 3D-QSAR methods. Many 3D-
necting two atoms in a molecular structure. QSAR methods (considered below) make use
The general form of an atom-pair descriptor is of so-called molecular field descriptors. To cal-
as follows: culate these descriptors, steric and electro-
static fields of all molecules are sampled with a
atom type i -(distance) -atom type j probe atom, usually carbon sp3 bearing a + 1
charge, on a rectangular grid that encom-
where atom chemical types are typically de- passes structurally aligned molecules. The
fined by the user. For example, 15 atom types values of both van der Wads and electrostatic
can be defined by use of the S Y B n mo12 for- interactions between the probe atom and all
Hydrogen - depleted molecular graph and vertex degrees a; I

Extended connectivity indices
nb, = x
k =1
n -1 b, is the s u m of vertex degrees
connected to vertex i, Obi= ai
Connectivity indices
I
1 All edges
1 f = - 0.5 "- X Molecular connectivity indices
f = l "'M Zagreb group indices
2
f=2 "'M Zagreb group indices

1
I Overall connectivity indices I
"b'ivertex degree of atom jin subgraph kjof order n,

f = 0, vertex degrees of subgraphs
f = 1 , vertex degrees of the whole molecular graph
Figure 2.3. Examples of topological descriptors frequently used in QSAR studies.
atoms of each molecule are calculated in every

lattice point by use of the force field equation
described above and entered into the CoMFA
QSAR table (Fig. 2.5), which typically contains
thousands of columns. Additional molecular
field descriptors such as HINT (Hydropathic
INTeraction) descriptors (109) could improve
the CoMFA model. PLS algorithms coupled
Figure 2.4. Example of an AP descriptor: two atom with leave-one-out (LOO) cross-validation is
types, aliphatic nitrogen and aliphatic sulfur, sepa- typically used to arrive at statistically signifi-
rated by the shortest chemical graph path of seven. cant CoMFA models.
ltiple Descriptors of Molecular Structure 57
Bio Act. sh01 ~ b 0 2... ~ 9 9 8 &0l ... E998

Cdpl 5.1 \,
v
\,
v
\,
Cdp2 6.8
Figure 2.5. Process of steric and electrostatic descriptor generation in CoMFA. Note that this
lrocess results in a familiar QSAR table (cf. Fig. 2.2). PLS is used as a standard analytical technique
n CoMFA.
0ne of the most attractive features of the fined through the use of similar atom types
CoMFA and CoMFA-like methods is that, be- and atom pairs and 3D molecular topography;
caus,e of the nature of molecular field descrip- in this case, a physical distance between atom
tors, these approaches yield models that are types is used in place of chemical graph dis-
relatively easy to interpret in chemical terms. tance. The distance between two "atoms" is
Famous CoMFA contour plots, which are ob- measured and then assigned into one or two
taint?d as a result of any successful CoMFA distance bins. Typically, the width of each dis-
stud:y, tell chemists in rather plain terms how tance bin is chosen as 1.0 A. Because it is also
the (:hange in the compounds' size or charge designed to let the adjacent bins have 10%
distribution as a result of chemical modifica- overlap with each other, the actual length of
tion correlate with the binding constant or ac- each distance bin is 1.2 A. Any distance located
tivit:y. These observations may immediately in the overlap region is assigned to both bins.
suggest to a chemist possible ways to modify This "fuzzy distance" concept is adopted to
mole!cules to increase their potencies. How- alleviate the possible unfavorable boundary
ever:, as demonstrated in the next section, effects of the distance bins. For example, with
thesc2 predictions should be taken with caution strict boundary conditions, a distance of 2.05
only after sufficient work has been done to A will be assigned only to bin No. 2, but it can
provle the statistical significance and predic- be reasonably argued that it is almost as close
tive ilbility of the models. to the upper half of bin No. 1 as to bin No. 2.
B:y analogy with 2D atom-pair descriptors With fuzzy boundary conditions, 2.05 A be-
(Fig. 2.4), 3D AP descriptors can also be de- longs to both bin No. 1 and bin No. 2, allowing
a possible match to either. All the distances ties, respectively. The summations in Equa-
greater than 20 A are assigned into the last tion 2.1 are performed over all compounds,
bin. which are used to build a model for the train-
ing set. The statistical meaning of the q2 is
3 QSAR MODELING APPROACHES different from that of the conventional r 2 : a q 2
value greater than 0.3 is often considered sig-
nificant (111).
Despite obviously successful and growing
Two original 3D-QSAR methods, CoMFA (40) application of CoMFA in molecular design,
and GRID (110), were developed almost simul- several problems intrinsic to this methodology
taneously in the mid- to late-1980s (9). Since its have persisted. Studies revealed that CoMFA
introduction, the CoMFA approach has rapidly results can be extremely sensitive to a number
become one of the most popular methods of of factors, such as alignment rules, overall ori-
QSAR. Over the years, this approach has been entation, lattice placement, step size, and
applied to a wide variety of receptor and enzyme probe atom type (40, 75, 112-114). The prob-
ligands [many reviews appeared in a recent lem of three-dimensional alignment has been
monograph (lo)]. Undoubtedly, the further de- the most notorious among others. Even with
velopment of this and related methods is of great the development of automated or semiauto-
importance and interest to many scientists mated alignment protocols such as the Active
working in the area of rational drug design. Analog Approach (108, 115) or DISCO (116)
CoMFA methodology is based on the as- and the opportunity to use, in some cases, the
sumption that because, in most cases, the structural information about the target recep-
drug-receptor interactions are noncovalent, tor (112, 117) to align molecules, in general
the changes in the biological activities or bind- there is no standard recipe as to how to align
ing affinities of sample compounds correlate all molecules under consideration in a unique
with changes in the steric and electrostatic and unambiguous fashion. A QSAR analysis of
fields of these molecules. In a standard 60 acetylcholinesterase inhibitors (117) is par-
CoMFA procedure, all molecules under inves- ticularly illustrative with respect to this point.
tigation are first structurally aligned, and the In that study, the combination of structure-
steric and electrostatic fields around them are based alignment and CoMFA was employed
sampled with probe atoms, usually sp3 carbon to obtain a QSAR model for 60 chemically di'-
with a +1charge, on a rectangular grid that verse inhibitors of acetylcholinesterase (AChE).
encompasses aligned molecules. The results of The great structural diversity of the AChE in-
the field evaluation in every grid point for ev- hibitors, ranging from choline to decametho-
ery molecule in the data set are placed in the nium, made it practically impossible to struc-
CoMFA QSAR table, which therefore contains turally align all the inhibitors in any unbiased
thousands of columns (Fig. 2.5). The analysis way and generate a unique three-dimensional
of this table by the means of standard multiple pharmacophore. X-ray crystallographicanalysis
regression is practically impossible; however, ofAChE from Torpedo californica (EC 3.1.1.7)
the application of special multivariate statisti- (118), followed by X-ray determination of
cal analysis routines, such as PLS analysis and the complexes of the enzyme with three
LOO cross-validation ensures the statistical structurally diverse inhibitors, tacrine, edro-
significance of the final CoMFA equation. The phonium, and decamethonium (1191, pro-
outcome from this procedure is a cross-vali- vided crucial information with respect to the
dated correlation coefficient R 2 (8), which is orientation of these inhibitors in the active
calculated according to the formula site of the enzyme. The crystallographic
data indicated that each of the three inhibi-
tors had a unique binding orientation in the
active site of the enzyme (Fig. 2.6). Their
natural structural alignment would probably
where y,, ii,and are the actual, estimated, never have been predicted by any of the exist-
and averaged (over the entire data set) activi- ing automated algorithms for ligand align-
ing Approaches
The grid orientation in CoMFA is fixed in

the coordinate system of the computer; thus,
every time when the orientation of the molec-
ular aggregate is changed, the size of the grid
may change but not its orientation. The orien-
tation of the assembled molecules therefore
affects the placement of probe atoms, which,
in turn, influences the field sampling process.
This leads to the variability of the q2 values,
mostly attributable to the reasons outlined
earlier. The effect of variability of q 2 as a func-
tion of molecular aggregate orientation was
more pronounced in the case of structurally
diverse molecules (e.g., cephalotaxine esters
and 5-HT,, receptor ligands) than in the case
of much less structurally diverse molecules
(e.g., HIV protease inhibitors) (75). This effect
may be attributed to the fact that the pattern
of probe atom placement with respect to the
aligned molecules changes more dramatically
when one changes the orientation of more
structurally diverse molecules than it does
when the data set is composed of structurally
similar molecules.
In the conventional CoMFA implementa-
tion, the steric and electrostatic fields, which
theoretically form a continuum, are sampled
on a fairly coarse grid. As a result, these fields
are represented inadequately, and the results
uperposition of three inhibitors of
ive site of the enzyme based on crys-
are not strictly reproducible. Intuitively, de- .
creasing the grid spacing may increase the ad-
ructures of enzyme-inhibitor com- equacy of sampling, as was suggested by Cra-
sly, no common pharmacophore can mer et al. (120). Indeed, it was shown that
ese molecules. decreasing the grid spacing from 2.0 to 1.0 A
minimized the fluctuation in the observed q2
values (75). Most probably, the reason for this
by the researcher's imagination phenomenon is that the decrease in grid spac-
)f the ligand chemical structure ing increases the number of probe atoms,
:onsideration demonstrates the which in turn should raise the probability of
~ l t of
y generating a unique and placing the probe atoms in a region where the
dignment in 3D-QSAR studies steric and electrostatic field changes can be
nterpretable and predictive mod- best correlated with biological activity. How-
ever, as was noticed by Cramer et al. (120), the
lignment problem is the main increase in the number of probe atoms also
)iguity in obtaining and analyz- increases the noise in PLS analysis and leads
esults, especially in the case of to a less statistically significant q 2 (121).
liverse compounds. However, it An important feature of conventional
m that, even if the structural CoMFA routine is that it assumes equal sam-
ixed, the resultingq2 value could pling and a priori equal importance of all lat-
ive to the orientation of rigidly tice points for PLS analysis, whereas the final
ules on the user terminals (75), CoMFA result actually emphasizes the limited
explained as follows. areas of three-dimensional space as important
for biological activity. Indeed, the deficiencies a similar way, with their pharmacophoric ele-
of conventional CoMFA routine mentioned ments interacting with the same functional
earlier may be effectively dealt with by elimi- groups of the receptor.
nating from the analyses those areas of three- The pharmacophore concept plays a very
dimensional space where changes in steric and important role in guiding the drug discovery
electrostatic fields do not correlate with process. Pharmacophore models help medici-
changes in biological activity. The q2-GRSrou- nal chemists gain an insight into the key inter-
tine was devised (75) to eliminate those areas actions between ligand and receptor when the
from the analysis based on the (low) value of receptor structure has not been determined
the q2 obtained for such regions individually. experimentally. A pharmacophore can be used
The major feature of this routine is that it as a basis for the alignment rules in 3D-QSAR
analysis for the lead compound optimization
optimizes the region selection for the final
(125). Furthermore, a pharmacophore can be
PLS analysis. In this regard, it is intellectually
directly used as the search query for 3D data-
analogous to the GOLPE approach (74). base mining, which is a common and efficient
3D-QSAR remains an active area of re- approach f;r discovery of lead compounds
search and method development. Several re- (126).
cent approaches such as COMSiA (45), QSiAR Pharmacophore identification refers to the
(461, and GRIND (122) address the most noto- computational way of identifying the essential
rious CoMFA problems dealing with the grid 3D structural features and configurations that
artifacts. However, it should be kept in mind are responsible for the biological activity of a
that 3D-QSAR modeling is a difficult process. series of compounds. It is computationally in-
It is reasonably successful when underlying tensive, requiring searching two huge spaces:
molecules are relatively rigid and similar, so the available conformations for each com-
that the identification of the 3D pharmaco- pound and the possible correspondence (align-
phore is straightforward. With the increased ment) between different compounds. A num-
complexity and flexibility of molecules and a ber of approaches and computer programs
possibility of multiple mechanisms of binding have been specifically developed for pharma-
with the receptor, the derivation of unambig- cophore identification including, for example,
uous pharmacophore and unique alignment is Active Analog Approach, AAA (108,127,128),
sometimes practically impossible (as shown Ensemble distance geometry (129), DISCO
above in the case of AchE inhibitors), and ex- (116), Chem-X (1301, CatalystIHypo (131,
treme care is important in trying to obtain 132), CatalystIHipHop (133, 134), and
reproducible and validated QSAR models. Apex-3D (135).
An obvious parallel can be established be-
tween the identification of descriptors contrib-
3.2 The Descriptor Pharmacophore Concept
uting the most to the correlation with biologi-
and Variable Selection QSAR
cal activity, and search for pharmacophoric
The termpharmacophore, introduced by Ehr- elements, which are mainly responsible for
lich in the early 1900s (1231, was originally the specificity of drug action. Indeed, individ-
referred to the molecular framework that car- ual pharmacophoric elements are typically
ries (phoms) the essential features responsible identified in the course of ex~erimentalstruc-
for a drug's (pharmacon) activity. Nowadays, ture-activity studies. Considering molecules
this term has almost the opposite meaning as as a collection of substructures, pharmaco-
applied to three-dimensional (3D) molecular phoric elements can also be viewed as specific
structure. A 3D pharmacophore is defined as a chemical features selected from all chemical
collection of particular chemical features fragments present in a molecular data set.
(functional groups) and their spatial arrange- Thus, the selection of specific pharmacophoric
ment, which define pharmacological specific- features responsible for biological activity is
ity of a series of compounds (124). The phar- directly analogous to the selection of specific
macophore concept assumes that structurally chemical descriptors contributing to the most
diverse molecules bind to their receptor site in explanatory QSAR model. Frequently, the
3 QSAR Modeling Approaches
QSAR modeling that involves descriptor (fea- descriptors (parents) is generated as follows.
ture) selection is referred to as variable selec- Each parent is described by a string of random
tion QSAR. binary numbers (i.e., one or zero), with the
This consideration emphasizes the analogy length (total number of digits) equal to the
between pharmacophore identification and total number of descriptors selected for each
variable selection QSAR. On the basis of this data set. The value of one in each string im-
analogy, we now expand the notion of chemi- plied that the corresponding descriptor is in-
cal pharmacophore to that of the more general cluded for the parent, and the value of zero
descriptor pharmacophore. We shall define de- implies that the descriptor is excluded.
scriptor pharmacophore as a special subset of Step 3. For every random combination of
molecular descriptors (of any nature, not only
descriptors (i.e., every parent), a QSAR equa-
chemical functional groups) optimized in the
tion is generated for the training data set by
process of variable selection QSAR, to achieve
the most significant correlation between de- use of the PLS algorithm (41). Thus, for each
scriptor values and biological activity. parent a q2 value is obtained, and some func-
Similar to the common areas of application tion of q2 is used as a fitness function to guide
of chemical pharmacophores, descriptor phar- GA.
macophores can be applied for database min- Step 4. Two parents are selected randomly
ing. First, a preconstructed QSAR model can and subjected to a crossover (i.e., the exchange
be used as a means of screening compounds of the equal length substrings), which pro-
from existing databases (or virtual libraries) duces two offspring. Each offspring is sub-
for high predicted biological activity. Alterna- jected to a random single-point mutation, that
tively, variables selected by QSAR optimiza- is, a randomly selected one (or zero) is changed
tion can be used for similarity searches to im- to zero (or one) and the fitness of each off-
prove the performance of the rational library spring is evaluated as described above (cf.
design or database mining methods. The ad- Step 3).
vantage of this approach for database mining Step 5. If the resulting offspring are char-
is that it affords not only the compound selec- acterized by a higher value of the fitness func-
tion but also the quantitative prediction of tion, then they replaced parents; otherwise,
their activity. the parents are kept. .
Step 6. Steps 3-5 are repeated until a pre-
3.2.1 Linear Models. Variable selection ap- defined convergence criterion is achieved. For
proaches can be applied in combination with the convergence criterion one can use the dif-
both linear and nonlinear optimization algo- ference between the maximum and minimum
rithms. Exhaustive analysis of all possible values of the fitness function. Calculations are
combinations of descriptor subsets to find a terminated when this difference falls below a
specific subset of variables that affords the certain threshold (e.g., 0.02).
best correlation with the target property is In summary, each parent in this method
practically impossible because of the combina- represents a QSAR equation with randomly
torial nature of this problem. Thus, stochastic chosen variables, and the purpose of the calcu-
sampling approaches such as genetic or evolu- lation is to evolve from the initial population
tionary algorithms (GA or EA) or simulated of the QSAR equations to the population with
annealing (SA)are employed. To illustrate one the highest average value of the fitness func-
such application we shall consider the GA-PLS tion. In the course of the GA-PLS process, the
method, which was implemented as follows initial number of members of the population
(136). (100) is maintained while the average value of
Step 1. Multiple descriptors such as molec- the fitness function for the whole population
ular connectivity indices or atom pair descrip- converges to a high number. The best model is
tors (cf. Section 2.1) are generated initially for characterized by the highest value of the fit-
every compound in a data set. ness function as well as by specific descriptor
Step 2. An initial population of 100 differ- selection (descriptor pharmacophore) that af-
ent random combinations of subsets of these fords such a model.
-
3.2.2 Nonlinear Models. Most of the QSAR ceptually simple, nonlinear approach to pat-
approaches assume the existence of a linear tern-recognition problems (147).In this method,
relationship between a biological activity and an unknown pattern is classified according to
molecular descriptors. However, the fast col- the majority of the class labels of its k nearest
lection of structural and biological data, as a neighbors of the training set in the descriptor
consequence of the recent development of space. Many variations of the kNN method
combinatorial chemistry and high throughput have been proposed in the past and new and
screening technologies, has challenged tradi- fast algorithms have continued to appear in
tional QSAR techniques. First, 3D methods recent years (148, 149). The applications of
may be computationally too expensive for the the kNN principle in chemistry have been
analysis of a large volume of data'; and in some
summarized by Strouf (150). In the area of
cases, an automated and unambiguous align-
biology, Raymer et al. have successfully ap-
ment of molecular structures is not achiev-
able. Second, although existing 2D techniques plied a kNN pattern-recognition technique
are computationally efficient, the assumption with simultaneous feature selection and clas-
of linearity in the SAR may not hold true, es- sification in the analysis of water distribution
pecially when a large number of structurally in protein structures (151). In the area of
diverse molecules are included in the analysis. QSPR, Basak et al. have applied this principle,
These considerations provide an impetus combined with principal component analysis
for the development of fast, nonlinear, vari- and graph theoretical indices, in the estima-
able selection QSAR methods that can avoid tion of physicochemical properties of organic
the aforementioned problems of linear QSAR. compounds (152-155).
Several nonlinear QSAR methods have been The assumptions underlying the kNN-
proposed in recent years. Most of these meth- QSAR method are as follows. First, structur-
ods are based on either artificial neural net- ally similar compounds should have similar bi-
work (ANN) (50, 61, 137-142) or machine ological activities, and the activity of a
learning techniques (65,143-145). Given that compound can be predicted (or estimated)
optimization of many parameters is involved simply as the average of the activities of simi-
in these techniques, the speed of the analysis lar compounds. Second, the perception of
is relatively slow. More recently, Hirst re- structural similarity is relative and should 4-
ported a simple and fast nonlinear QSAR ways be considered in the context of a partic-
method (1461, in which the activity surface ular biological target. Given that the physico-
was generated from the activities of training chemical characteristics of the receptor-
set compounds based on some predefined binding site vary from one target to another,
mathematical function. the structural features that can best explain
For illustration. we shall consider here one the observed biological similarities between
of the nonlinear variable selection methods compounds are different for different biologi-
that adopts a k-Nearest Neighbor (kNN) prin- cal endpoints. These critical structural fea-
ciple to QSAR [kNN-QSAR (4911. Formally, tures can be defined as the descriptor pharma-
this method implements the active analog cophore (DP) for the underlying biological
principle that lies in the foundation of the activity. Thus, one of the tasks of building a
modern medicinal chemistry. The kNN-QSAR kNN-QSAR model is to identify the best DP.
method employs multiple topological (2D) or This is achieved by the "bioactivity-driven"
topographical (3D) descriptors of chemical variable selection, that is, by selecting a subset
structures and predicts biological activity of of molecular descriptors that afford a highly
any compound as the average activity of k predictive kNN-QSAR model. Because the
most similar molecules. This method can number of all possible combinations of de-
be used to analyze the structure-activity scriptors is huge, an exhaustive search of
relationships (SARI of a large number of these combinations is not possible. Thus, a
compounds where a nonlinear SAR may stochastic optimization algorithm (i.e., simu-
predominate. lated annealing) has been adopted for an effi-
In principle, the kNN technique is a con- cient sampling of the combinatorial space. Fig-
4 Validation of QSAR Models
Randomly select a subset of descriptors

(a hypothetical descriptor pharmacophore, HDP)
Leave out a compound

I
1 Find its k nearest neighbors among N compounds in the HDP space I
i Predict the activity of the eliminated compound by weighted kNN

Calculate the predictive ability (qq of the model
I
Select QSAR model with the
highest q2 1
Figure 2.7. Flowchart of the kNN method (49).
ure 2.7 shows the overall flowchart of the Calculate the cross-validated R 2 (or q2)
kNN-QSAR method, which involves the fol- value (cf. Equation 2.1). (v) Repeat calcula-
lowing steps. tions fork = 2,3,4, . . . , n. The upper limit
of k is the total number of compounds in
1. Select a subset of n descriptors randomly (n the data set; however, the best value is'
is a number between 1 and the total num- found empirically between 1 and 5. The k
ber of available descriptors) as a hypothet- that leads to the best q 2 value is chosen for
ical descriptor pharmacophore (HDP). the current kNN-QSAR model.
2. Validate this HDP by a standard cross-val- 3. Repeat steps 1 and 2, the procedure of gener-
idation procedure, which generates the ating trial HTPs and calculating correspond-
cross-validated R 2 (or q2) value for the ingq2 values. The goal is to find the best HTP
kNN-QSAR model built by use of this HDP. that maximizes the q2 value of the corre-
The standard leave-one-out procedure has sponding kNN-QSAR model. This process is
been implemented as follows: (i) Eliminate driven by a generalized simulated annealing
a compound from the training set. (ii) Cal- by use of q2 as the objective fundion.
culate the activity of the eliminated com-
pound, which is treated as an unknown, as
the average activity of the k most similar 4 VALIDATION OF QSAR MODELS
compounds found in the remaining mole-
cules (k is set to 1 initially). The similarities One of the most important characteristics of
between compounds are calculated using QSAR models is their predictive power. The
only the selected descriptors (i.e., the cur- latter can be defined as the ability of a model to
rent trial HDP) instead of the whole set of predict accurately the target property (e.g., bi-
descriptors. (iii) Repeat this procedure un- ological activity) of compounds that were not
til every compound in the training set has used for model development. The typical prob-
been eliminated and predicted once. (iv) lem of QSAR modeling is that at the time of
Figure 2.8. Beware of q2! External R2(for the test set) presents no correlation with the "predictive"
LOO 92 (for the training set). (Adopted from Ref. 163.)
model development a researcher has, essen- validated correlation coefficient R 2 (q2)(Equa-

tially, only training set molecules, so prediction 2.1). Frequently, q2 is used as a criterion
tive ability can be characterized only by statis- of both robustness and predictive ability of the
tical characteristics of the training set model, model. Many authors consider high q2 (for in-
and not by true external validation. Recent stance, q2 > 0.5) as an indicator or even as the
research demonstrates that external valida- ultimate proof of the high predictive power of
tion must be made, indeed, a mandatory part ~ -
the QSAR model. They do not test the models
of model development. This goal can be for their ability to predict the activity of com-
achieved by a division of an experimental SAR pounds of an external test set (i.e., compounds
data set into the training and test sets, which that have not been used in the QSAR model
are used for model development and valida- development). There are several examples of
tion, respectively. recent publications, in which the authors
It has been shown that the more indepen- claim that their models have high predictive
dent variables are involved in MLR QSAR ability without validating them by use of an
analysis, the higher the probability of a chance external test set (156-160). Some authors val-
correlation between predicted and observed
A
idate their models by the use of only one or two
activities, even if only a small portion of vari- compounds that were not used in QSAR model
ables is included in the final QSAR equation development (161, 162) and still claim that
(16). This conclusion is true not only for MLR their models are highly predictive. In contrast
QSAR, but also for any QSAR approach when with such expectations, it has been shown that
the number of variables (descriptors) is com- if a test set with known values of biological
parable to or higher than the number of com- activities is available for prediction, there ex-
pounds in a data set. Thus, model validation is ists no correlation between LOO cross-vali-
one of the most important aspects of QSAR dated q2 and correlation coefficient R2 be-
analysis. tween the predicted and observed activities for
the test set [Fig. 2.8; (46, 163)l.
4.1 Beware of #
4.2 Rational Selection of Training
To validate a QSAR model, most of research-
and Test Sets
ers apply the leave-one-out (LOO) or leave-
some-out (LSO) cross-validation procedures. As discussed earlier, .to obtain a reliable (vali-
The outcome from this procedure is a cross- dated) QSAR model, an available data set
4 Validation of QSAR Models
should be divided into the training and test The division of a data set into the training
sets. Ideally, this division must be performed and test sets can be performed by the use of
such that points representing both training various clustering techniques. In Burden and
and test set are distributed within the whole Winkler (175) and Burden et al. (176) the K-
descriptor space occupied by the entire data means clustering algorithm (177) was used,
set, and each point of the test set is close to at and from each cluster one comr~oundfor the
least one point of the training set. This ap- training set was randomly selected. In Potter
proach ensures that the similarity principle and Matter (178), to select a representative
can be employed for the activity prediction of subset from a data set, hierarchical clustering
the test set. Unfortunately, as we shall see be- and the maximum dissimilarity method (179-
low, this condition cannot always be satisfied. 181) were used. The authors showed that both
Many authors use external test sets for val- methods choose representative subsets of
idation of QSAR models, but do not provide compounds much better than the random se-
any rationale as to how and why certain com- lection. Compounds selected through use of
pounds were chosen for the test set (164,165). the maximum dissimilarity method were used
One of the most widely used methods for dias training sets in 3D-QSAR studies, with all
viding a data set into training and test sets is a remaining compounds composing the test set.
mere random selection (166, 167). Some au- In Wu et al. (166) the Kennard-Stone (182-
thors assign whole structural subgroups of 184) method, which is similar to the maximum
molecules to the training set or the test set dissimilarity method, was applied to the clas-
(168,169). Another frequently used approach sification of NIR spectra and QSAR analysis.
is based on the activity sampling. The whole The drawbacks of clustering methods are that
range of activities is divided into bins, and different clusters contain different numbers of
compounds belonging to each bin are ran- points and have different densities of repre-
domly (or in some regular way) assigned to the sentative points. Therefore, the closeness of
training set or test set (170,171). These meth- each point of the test set to at least one point of
ods (166,170,171) cannot guarantee that the the training set is not guaranteed. The maxi-
training set compounds represent the entire mum dissimilarity and Kennard-Stone meth-
descriptor space of the original data set, and ods guarantee that the points of the training
that each compound point of the test set is set are distributed more or less evenly within
close to at least one point of the training set. the whole area occupied by representative
In several publications, the division of a points, and the condition of closeness of the
data set into training and test sets is per- test set points to the training set points is sat-
formed by use of the Kohonen's Self-Organiz- isfied. The maximum distance between train-
ing Map (SOM) (172). Representative points ing and test set points in these methods does
falling into the same areas of the SOM are not exceed the radius of the probe sphere.
randomly selected for the training and test .
To select a re~resentativesubset of sam-
sets (173, 174). SOM preserves the closeness ples from the whole data set, factorial designs
between points (points that are close to each (185, 186) and D-optimal designs (187) were
other in the multidimensional descriptor used (166, 173, 188). Factorial designs pre-
space are close to each other on the map). sume that different sample properties (such as
Therefore, it is anticipated that the training substituent groups at certain positions) are di-
and test sets must be scattered within the vided into groups. The training set includes
whole area occupied by representative points one representative for each combination of
in the original descriptor space, and that each properties. For a diverse data set this ap-
point of the test set is close to at least one point proach is impractical, and fractional factorial
of the training set. The drawback of this designs are used, in which only a part of all
method is that the quantitative methods of combinations is included into the training set.
prediction use exact values of distances be- Generally, this approach does not guarantee
tween representative points; because SOM is a the closeness of the test set points to the train-
nonlinear projection method, the distances being set points in the descriptor space. D-opti-
tween points in the map are distorted. mal design algorithms select samples that
maximize the IX'XI determinant, where X is

the information (variance-covariance)matrix
of independent variables (descriptors) (189,
190). The points maximizing the IXtXI deter-
minant are spanned across the whole area oc-
cupied by representative points. They can be
used as a training set, and the points not se-
lected then are used as the test set (166, 173).
In Wu et al. (166) four methods of sample The lack of the correlation between q2 and R 2
selection (random, SOM, Kennard-Stone de- was noted in Kubinyi et al. (461, Novellino et
sign, and D-optimal design) were compared. al. (192), Norinder (193), and in our recent
The best models were built when Kennard- publication (163), where we demonstrated
Stone and D-optimal designs were used. SOM that all of the above-mentioned criteria are
was better than random selection, and D-opti- necessary to adequately assess the predictive
mal design was slightly better than the ran- ability of a QSAR model. We suggest (163)that
dom selection. the external test set must contain at least five
compounds, representing the whole range of
both descriptor and activities of compounds
4.3 Guiding Principles of Safe QSAR included into the training set.
A widely used approach to establish the model
robustness is so-called y-randomization (ran-
5 QSAR MODELS AS VIRTUAL
domization of response, i.e., in our case, activ-
SCREENING TOOLS
ities) (191). It consists of repeating the calcu-
lation procedure with randomized activities
and subsequent probability assessment of the 5.1 Data Mining and SAR Analysis
resultant statistics. Frequently, it is used Data mining has been of interest to research-
along with cross-validation. It is expected that ers in machine learning, pattern recognition,
models obtained for the data set with random- artificial intelligence, database statistics, and
ized activity should have low values of q2;oth- so forth for many years, and widely applied in
erwise, the original model should be consid- science, business, and government. Now, che- .
ered insignificant. We suggest that the moinformatitians have also started to plunge
y-randomization test is a mandatory compo- into this field because of the increased quan-
nent of model validation. tity of data in the drug discovery process. Data
Several authors have suggested that the mining can be defined as the process of discov-
only way to estimate the true predictive power ering valid, novel, understandable, and poten-
of a QSAR model is to compare the predicted tially useful patterns in data (196, 197). Data
and observed activities of an (sufficiently mining is an interactive and iterative, multi-
large) external test set of compounds that ple-step process, involving the decisions made
were not used in the model development (46, by the user. It may include data collection,
163, 192-194). To estimate the predictive data cleaning, data engineering, algorithm en-
power of a QSAR model, we recommended use gineering, algorithm running, result evalua-
of the following statistical characteristics of tion, and knowledge utilization (198, 199).
the test set (163): (i) correlation coefficient R Data mining methods can be generally di-
between the predicted and observed activities; vided into two types, unsupervised and super-
(ii) coefficients of determination (195) (pre- vised. Whereas unsupervised methods seek in-
dicted vs. observed activities Ro2, and ob- formative patterns, which directly display the
served vs. predicted activities Rot'); (iii)slopes interesting relationship among the data, su-
k and k' of the regression lines through the pervised methods discoverpredictive patterns,
origin. We consider a QSAR model predictive, which can be used later to predict one or more
if the following conditions are satisfied (163): attributes from the rest.
A wide variety of supervised data mining
methods have been applied for analyzing
5 QSAR Models as Virtual Screening Tools
structure-activity data sets, besides the tradi- compounds that can be reasonably synthe-
tional linear regression methods. Most of sized, which is sometimes called "virtual
them are nonlinear and nonparametric and chemistry space," is still far beyond today's
need no statistical assumptions to apply them. capability of chemical synthesis and biological
Decision tree and rule induction methods, assay. Therefore, medicinal chemists continue
such as ID3 (200), CART (201), and FIRM to face the same problem as before: Which
(202-204) usually use univariate splits to gen- compounds should be chosen for the next
erate a model in the form of a tree or proposi- round of synthesis and testing? For chemoin-
tional logic. The inferred model is easy to com- formatitians, the task is to develop and utilize
prehend, but the approximation power may be various computer programs to evaluate a very
significantly restricted by a particular tree or large number of chemical compounds and rec-
rule representation. Inductive logic program- ommend the most promising ones for bench
mingmethods, such as GOLEM (64) and PRO- medicinal chemists. This process can be called
GOL (65),are designed to induce a model from virtual screening (208) or chemical database
the more flexible representation of first-order searching. A large number of computational
predicate logic. However, this generality methods exist for virtual screening, but which
comes at the price of significant computational one is chosen will depend on the information
demands. Nonlinear regression and classifica- available and the task at hand in practice.
tion methods, such as various neural networks A substructure search will typically be un-
(60-62), train a model by fitting linear and dertaken if a lead compound has been found.
nonlinear combinations of basis functions to The search query will retrieve all the struc-
the combinations of the input variables. They tures in a database that contain the substruc-
may be powerful in terms of approximation, tures present in the lead compound that are
but they are statistically poorly characterized, believed to be important for activity (209). Ac-
slow (205),and difficult to interpret in chemi- cording to graph theory, it is equivalent to
cal terms. Example-based methods, such as searching a series of topological graphs for the
nearest-neighbor methods (1471, use repre- existence of a subgraph isomorphism with a
sentative examdes from the database as an specified query graph. Subgraph isomorphism
approximate model and predicate new sam- is an NP-complete problem (210), which
ples on the basis of the properties of the most means that for it, there are no algorithms .
similar examples in the model. They are as- whose worst-case time requirements do not
ymptotically powerful for approximating rise exponentially with the size of the input.
properties, but also difficult to interpret. Fur- However, various backtracking algorithms
thermore, their performance is strongly de- (211-213) and partitioning algorithms (214-
pendent on a well-defined distance metric to 217) have been developed since the 1950s, to
evaluate distances between data points. reduce the average time required for chemical
Data mining of chemical databases is still substructure searching. Today, almost all the
at its very early stage. Nevertheless, as a re- chemical database software includes the func-
sult of the data explosion in pharmaceutical tion of substructure searching.
industry, it is expected that data mining tech- A similarity search provides a way forward
niques will play an increasingly important role by retrieving the structures that are similar,
in the drug discovery process. Future studies but not identical, to a lead compound (94).
may include, for example, the definition of Therefore, it overcomes some limitations of
chemical space, the validation of various algo- substructure search, for example, not requir-
rithms (206), and the representation of ex- ing specific knowledge about the substruc-
tremely large virtual databases (207). tures responsible for activity, and being able
to rank the output structures according to the
5.2 Virtual Screening
overall similarity. The search query usually
Although combinatorial chemistry and HTS involves a set of descriptors that collectively
have offered medicinal chemists a much specify the whole structure of the lead com-
broader range of possibilities for lead discov- pound. This set of descriptors is compared
ery and optimization, the number of chemical with the corresponding set of descriptors for
each compound in the database, and then a suming, or redundant (223). Modern rational
measure of similarity is calculated between approaches to the design of combinatorial li-
them. There are a wide variety of molecular braries have been explored in a recent mono-
descriptors for similarity searching (cf. Sec- graph (224). Theoretical analysis of available
tion 2). Not a single set of molecular descrip- experimental information about the biological
tors has been found as the best choice in all the target or pharmacological compounds capable
cases. The present trend in descriptor selec- of interacting with the target can significantly
tion is to use combined descriptors with many enhance the rational design of targeted chem-
different types. The similarity coefficients ical libraries. In many cases, the number of
that are often used for measuring the similar- compounds with known biological activity is
ity between two structures includes Manhat- sufficiently large to develop viable QSAR mod-
tan distance, Euclidean distance, Soergel dis- els for such data sets. These models can be
tance, Tanimoto coefficient, Dice coefficient, used as a means of selecting virtual library
Cosine coefficient, and so forth (2181, and compounds (or actual compounds from exist-
again no clear-cut winner has been found ing databases) with (high) predicted biological
among them (219). Virtual screening based on activity. Alternatively, if a variable selection
QSAR models can serve as a powerful ap- method has been employed in developing a
proach to the design of targeted chemical li- QSAR model, the use of only selected variables
braries, as illustrated in the following section. can improve the performance of the rational
library design or database mining methods on
5.3 Rational Library Design by use of QSAR the basis of the similarity to a probe. This pro-
As discussed earlier, combinatorial chemical cedure of use of only selected variables in a
synthesis and high throughput screening have similarity search in the descriptor space is
significantly increased the speed of the drug analogous to more traditional use of conven-
discovery process (220-222). However, it re- tional chemical pharmacophores in database
mains impossible to synthesize all of the li- mining.
brary compounds in a reasonably short period QSAR models can be employed for rational
of time. For instance, 30003 (2.7 X 10'') com- design of targeted chemical libraries and data-
pounds can be synthesized from a molecular base mining by predicting biologically active
scaffold with three different substitution posi- structures in virtual or actual chemical librar;
tions when each of the positions has 3000 dif- ies (225-227). To illustrate this approach, we
ferent substituents. If a chemist could synthe- consider the design of a pentapeptide combi-
size 1000 compounds per week, 27 million natorial library with the bradykinin activity
weeks (-0.5 million years) would be required by use of a QSAR model derived for a small
to synthesize all these compounds. Further- bradykinin peptide data set. Figure 2.9 shows
more, many of these compounds can be struc- the schematic diagram illustrating the tar-
turally similar to each other, thus making re- geted pentapeptide combinatorial library de-
dundant the chemical information contained sign by use of the FOCUS-2D method (225,
in the library. There is a need for rational li- 226). The algorithm includes the description,
brary design (i.e., rational selection of a subset evaluation, and optimization steps.
of available building blocks for combinatorial To identify potentially active compounds in
chemical synthesis), so that a maximum the virtual library, FOCUS-2D employs sto-
amount of information can be obtained while a chastic optimization methods such as SA (228,
minimum number of compounds are synthe- 229) and GA (230-232). The latter algorithm
sized and tested. Similarly, there is a closely was used for targeted pentapeptide library de-
related task in computational database min- sign as follows. Initially, a population of 100
ing, that is, rational selection of a subset of peptides is randomly generated and encoded
compounds from commercially available or by use of topological indices or amino acid-
proprietary databases for biological testing. dependent physicochemical descriptors. The
Thus, in many practical cases, the exhaus- fitness of each peptide is evaluated by its bio-
tive synthesis and evaluation of combinatorial logical activity predicted from a precon-
libraries is prohibitively expensive, time-constructed QSAR equation (see below). Two par-
6 Conclusions
Ba Bb Bc Bd
-Bf
Generate and Encode
Evaluate < QSAR prediction
Select Analyze
Figure 2.9. Flowchart of the library design approach by FOCUS-2D.
ent peptides are chosen by use of the roulette VEWAK and VKWAP (excluded from the
wheel selection method (i.e., high fitting par- training set for the QSAR model develop-
ents are more likely to be selected). Two off- ment). Furthermore, the actual spatial posi-
spring peptides are generated by a crossover tions of these amino acids were correctly iden-
(i.e., two randomly chosen peptides exchange tified: the first and fourth positions for V,the
their fragments) and mutations (i.e., a ran- second and fifth positions for E; the third po-
domly chosen amino acid in an offspring is sition for W; and the second and fifth positions
changed to any of 19 remaining amino acids). for K. More detailed analysis of these results
The fitness of the offspring peptides is then (cf. Fig. 2.10b,c) may suggest which residues
evaluated and compared with that of the par- should be preferably chosen for each position
ent peptides, and the two lowest scoring pep- in the pentapeptide to achieve a limited size .
tides are eliminated. This process is repeated library with high predicted bradykinin activ-
for 2000 times to evolve the population. ity.
Design of a Targeted Library with Bradykinin
(BK) Potentiating Activity. The results obtained
with the FOCUS-2D and a QSAR-based pre- 6 CONCLUSIONS
diction are shown in Figure 2.10. The position-
dependent frequency distributions of amino In this chapter, we have reviewed recent and
acids in the highest scoring pentapepeptides developing trends in the field of QSAR. We
are shown before (Fig. 2.10a) and after (Fig. have provided common terminology and pre-
2.10b,c) FOCUS-2D. To evaluate the effi- sented a unified concept of the QSAR ap-
ciency of stochastic sampling, the entire pen- proach. We have emphasized that, regardless
tapeptide library (which includes as many as of the origin of molecular descriptors, any
3.2 million molecules) was also generated and QSAR modeling exercise starts from con-
subjected to evaluation by use of the same structing a two-dimensional data array (Fig.
QSAR model, and the results are shown in Fig. 2.2), which lists molecular IDS, values of the
2.10~. Apparently, the results after FO- target (or dependent) property of each com-
CUS-2D and the exhaustive search were very pound, and values of descriptors (independent
similar to each other. FOCUS-2D selected the variables) for each compound. We have consid-
following amino acids: E, I, K, L, M, Q, R, V, ered various protocols employed by QSAR
and W. Interestingly, these selected amino ac- practitioners to develop quantitative models
ids included most of those found in the two of biological activity by the use of chemical
experimentally most active pentapeptides, descriptors and linear or nonlinear optimiza-
A C D E F G H I K L M N P Q R S T V W Y
Amino acid
Amino acid
(c)
-
0 u3
120
100
& $ 80 4th AA
R K
E a , 60
2 k E4 3rd AA
40 2nd AA
E 0 20 1st AA
0
Amino acid
Figure 2.10. Ratonal selection of building blocks for library design by use of FOCUS-2D and a QSAR
model for activity prediction: (a) initial population; (b)final population after FOCUS-2D; and (c)final
population after the exhaustive search.
tion techniques. We have particularly empha- 1. Establish an SAR database through the use
sized that the true power of any QSAR model of reliable quantitative measurements of
comes from its statistical significance and the the target property and a preferred set of
model's ability to predict accurately biological molecular descriptors.
properties of chemical compounds both in the 2. Divide the underlying data set into training
training and, most important, in the test sets. and test sets through the use of diversity
One of the important research challenges in sampling algorithms.
the QSPR modeling remains finding descrip-
tor types, correlation approaches, and ade- 3. Develop training set models through the
quate statistical characteristics of the training use of available QSAR methods or commer-
set only, which may ensure high predictive cial software. Characterize these models
power of the models. with internal validation parameters, as dis-
In conclusion, we strongly advocate rigor- cussed in this chapter, and define the appli-
ous validation of QSAR models before their cability domain for each model.
practical application or interpretation. The 4. Validate training set models through the
practical guidelines for the development of use of an external test set and calculate the
statistically robust and predictive QSAR mod- external validation parameters, as dis-
els can be summarized as follows: cussed in this chapter. Ideally, repeat the
References
procedure of training and test selection and 12. D. J. Livingstone, J. Chem. Znf. Comput. Sci.,
external validation several times to iden- 40,195-209(2000).
tify the QSAR model for the smallest train- 13. Chemical Abstracts Service (CAS), Columbus,
ing set that affords adequate prediction OH. May be accessed a t http://www.cas.org
power for the biggest test set. 14. D. S. Tan, M. A. Foley, M. D. Shair, and S. L.
5. Finally, explore and exploit validated Schreiber, J. Am. Chem. Soc., 120,8565-8566
QSAR models for possible mechanistic in- (1998).
terpretation and prediction. 15. J. Drews, Science, 287,1960-1964(2000).
16. J. G. Topliss and R. P. Edwards, J. Med.
In the modern age of medicinal chemistry, Chem., 22,1238 (1979).
QSAR modeling remains one of the most im- 17. U. Burkert and N. L. Allinger, Molecular Me-
portant instruments of computer-aided drug chanics, American Chemical Society, Washing-
design. Skillful application of various method- ton, DC, 1982.
ologies discussed in this chapter will afford 18. C. Hansch and T. Fujita, J. Am. Chem. Soc.,
validated QSAR models, which should con- 86,1616-1626(1964).
tinue to enrich and facilitate the experimental 19. M. Randic, J.Am. Chem. Soc., 97,6609-6615
process of drug discovery and development. (1975).
20. L. B. Kier and L. H. Hall, Molecular Connectiv-
ity in Chemistry and Drug Research, Academic
REFERENCES Press, New York, 1976.
1. C. Hansch, R. M. Muir, T. Fujita, P. P. Ma- 21. L.B. Kier and L. H. Hall, Molecular Connectiv-
loney, E. Geiger, and M. Streich, J. Am. Chem. ity in Structure-Activity Analysis, Research
Soc., 85,2817(1963). Studies Press, Chichester, UK, 1986.
2. T. Fujita, J. Iwasa, and C. Hansch. J. Am. 22. L. B. Kier and L. H. Hall in K. B. Lipkowitz and
Chem. Soc., 86,5175(1964). D. B. Boyd, Eds., Reviews in Computational
3. L. P. Hammett, Chem. Rev., 17,125(1935). Chemistry ZZ, VCH, Weinheimmew York,
4. C. Hansch and A. Leo in S. R. Heller, Ed., Ex- 1991,pp. 367-422.
ploring QSAR: Fundamentals and Applica- 23. L. B. Kier, Quant. Struct.-Act. Relat., 4,109-
tions in Chemistry and Biology, American 116(1985).
Chemical Society, Washington, DC, 1995. 24. L. B. Kier, Quant. Struct-Act. Relat., 6, 8-12 .
5. C. Hansch, A. Leo, and D. Hoekman in S. R. (1987).
Heller, Ed., Exploring QSAR: Hydrophobic, 25. L. H. Hall and L. B. Kier, Quant. Strut.-Act.
Electronic, and Steric Constants. American Relat., 9, 115-131(1990).
Chemical Society, Washington, DC, 1995.
26. L. H. Hall, B. K. Mohney, and L. B. Kier,
6. A. Verloop, W. Hoogenstraaten, and J. Tipker Quant. Struct.-Act. Relat., 10,43-51(1991).
in E. J. Ariens, Ed., Drug Design, Vol. VII, Ac-
ademic Press, New York, 1976,165pp. 27. L. H. Hall, B. K. Mohney, and L. B. Kier,
J. Chem. Znf. Comput. Sci., 31,76-82 (1991).
7. C. Selassie, this volume, Chapter 1.
28. L. B. Kier and L. H. Hall, Molecular Structure
8. H. Kubinyi in R. Mannhold, P. Krogsgaard-
Description: The Electrotopological State, Aca-
Larsen, and H. Timmerman, Eds., Methods demic Press, Orlando, FL, 1999.
and Principles in Medicinal Chemistry, Vol. 1,
VCH, New York, 1993. 29. G. E. Kellogg, L. B. Kier, P. Gaillard, and L. H.
Hall, J. Cornput.-Aided Mol. Des., 10,513-520
9. D. Livingstone, Data Analysis for Chemists:
(1996).
Applications to QSAR and Chemical Product
Design, Oxford University Press, Oxford, UK, 30. R.P. Sheridan, R. B. Nachbar, andB. L. Bush,
1995. J.Cornput.-Aided Mol. Des.,8,323-340(1994).
10. H. Kubinyi, G. Folkers, and Y. Martyn, Eds., 31. H. Matter, J. Med. Chem., 40, 1219-1229
3D QSAR in Drug Design, Vols. 2 and 3, Klu- (1997).
wer/ESCOM, Dordrecht, The Netherlands, 32. A. J. Hopfinger, J. Am. Chem. Soc., 102,7196
1998. (1980).
11. M.Karelson, Molecular Descriptors in QSARI 33. G. M. Crippen, J. Med. Chem., 22, 988-997
QSPR, Wiley-Interscience, New York, 2000. (1979).
34. G. M. Crippen, J. Med. Chem., 23, 599-606 58. G. Klopman, J. Am. Chem. Soc., 106, 7315-
(1980). 7321 (1984).
35. L. G. Boulu and G. M. Crippen, J. Comput. 59. G. Klopman, Quant. Struct.-Act. Relat., 11,
Chem., 10,673 (1989). 176-184 (1992).
36. U. Holzbrabe and A. J. Hopfinger, J. Chem. 60. T. Aoyama, Y. Suzuki, and H. Ichikawa,
Znf. Comput. Sci., 36, 1018 (1996). J. Med. Chem., 33,2583-2590 (1990).
37. A. J. Hopfinger, B. J. Burke, and W. J. Dunn, 61. S.-S. So and W. G. Richards, J. Med. Chem., 35,
J. Med. Chem., 37,3768 (1994). 3201-3207 (1992).
38. S. Srivastava and G. M. Crippen, J. Med. 62. F. R. Burden, B. S. Rosewarne, and D. A. Win-
Chem., 36,3572 (1993). kler, Chemom. Intel. Lab. Syst., 38, 127-137
39. M. P. Bradley and G. M. Crippen, J. Med. (1997).
Chem., 36,3171 (1993). 63. G. Bolis, L. Di Pace, and F. Fabrocini, J. Com-
40. R. D. Cramer 111, D. E. Patterson, and J. D. put.-Aided Mol. Des., 5, 617-628 (1991).
Bunce, J. Am. Chem. Soc., 110, 5959-5967 64. R. D. King, S. H. Mugglfton, R. A. Lewis, and
(1988). M. J. E. Sternberg, Proc. Natl. Acad. Sci. USA,
41. S. Wold, A. Ruhe, H. Wold, and W. J. Dunn 111, 89,11322-11326 (1992).
SZAM J. Sci. Stat. Comput., 5,735-743 (1984). 65. R. D. King, S. H. Muggleton, A. Srinivasan,
42. P. Geladi and B. R. Kowalski, Anal. Chim. and M. J. E. Sternberg, Proc. Natl. Acad. Sci.
Acta, 185, 1-17 (1986). USA, 93,438-442 (1996).
43. G. R. Marshall and R. D. Cramer 111, Trends 66. S. Clementi and S. Wold in H. van de Water-
Pharmacol. Sci., 9,285-289 (1988). beemd, Ed., Chemometrics Methods in Molec-
44. C. PBrez, M. Pastor, A. R. Ortiz, and F. Gago, ular Design, VCH, Weinheiflew York, 1995,
J. Med. Chem., 41,836-852 (1998). pp. 319-338.
45. G. Klebe in H. Kubinyi, G. Folkers, and Y. C. 67. J. M. Sutter, S. L. Dixon, and P. C. Jurs,
Martin, Eds., 3D QSAR in Drug Design, Vol. 3, J. Chem. Inf. Comput. Sci., 35, 77 (1995).
KluwerffiSCOM,Dordrecht, The Netherlands, 68. D. Rogers and A. J. Hopfinger, J. Chem. Znf.
1998, pp. 87-104. Comput. Sci., 34,854-866 (1994).
46. H. Kubinyi, F. A. Hamprecht, T. Mietzner, 69. H. Kubinyi, Quant. Struct.-Act. Relat., 13,
J. Med. Chem., 41,2553-2564 (1998). 285-294 (1994).
47. S. Wold in H. van de Waterbeemd, Ed., Chemo- 70. H. Kubinyi, Quant. Struct.-Act. Relat., J3,
metrics Methods in Molecule Design, VCH, 393-401 (1994).
Weinheimmew York, 1995, pp. 195-218.
71. B. T. Luke, J. Chem. Znf. Comput. Sci., 34,
48. B. Hoffman, S. J . Cho, W. Zheng, S. Wyrick, 1279-1287 (1994).
D. E. Nichols, R. B. Mailman, and A. Tropsha,
J. Med. Chem., 42,32173226 (1999). 72. S.-S. So and M. Karplus, J. Med. Chem., 39,
1521-1530 (1996).
49. W. Zheng and A. Tropsha, J. Chem. Znf. Com-
put. Sci., 40, 185-194 (2000). 73. K. Hasegawa, T. Kimura, and K. Funatsu,
J.Chem. Znf. Comput. Sci., 39,112-120 (1999).
50. Ajay, J. Med. Chem., 36, 3565-3571 (1993).
51. L. S. Anker and P. C. Jurs, Anal. Chem., 62, 74. M. Baroni, G. Costantino, G. Cruciani, D. Rig-
2676 (1990). anelli, R. Valigi, and S. Clementi, Quant.
Struct.-Act. Relat., 12,9-20 (1993).
52. P. C. Jurs, J. W. Ball, and L. S. Anker, J. Chem.
Znf. Comput. Sci., 32,272 (1992). 75. S. J. Cho and A. Tropsha, J. Med. Chem., 38,
53. T. M. Nelson and P. C. Jurs, J. Chem. Znf. Com- 1060-1066 (1995).
put. Sci., 34, 601 (1994). 76. L. B. Kier, L. H. Hall, and J. W. Frazer,
54. D. T. Stanton and P. C. Jurs, J. Chem. Inf. J. Chem. Znf. Comput. Sci., 33,143 (1993).
Comput. Sci., 32, 109 (1992). 77. L. H. Hall, L. B. Kier, and J. W. Frazer,
55. S. Hellberg, M. Sjostrom, B. Skagerberg, and S. J. Chem. Znf. Comput. Sci., 33,148 (1993).
Wold, J. Med. Chem., 30, 1126-1135 (1987). 78. L. H. Hall, R. S. Dailey, and L. B. Kier,
56. B. R. Kowalski and C. F. Bender, J. Am. Chem. J. Chem. Znf. Comput. Sci., 33, 598 (1993).
SOC.,96,916-918 (1974). 79. L. H. Hall and L. B. Kier in K. B. Lipkowitz and
57. K. C. Chu, R. J. Feldmann, N. B. Shapiro, G. F. D. B. Boyd, Eds., Reviews in Computational
Harard, and R. I. Geran, J. Med. Chem., 18, Chemistry IZ, VCH, W e i n h e i f l e w York,
539-545 (1975). 1991, pp. 367-422.
References
80. A. K. Debnath, R. L. Lopez de Compadre, G. 102. M. Shen, A. LeTiran, Y. Xiao, H. Kohn, and A.
Debnath, A. J. Shusterman, and C. Hansch, Tropsha, J. Med. Chem., 45, 2811-2823
J. Med. Chem., 34,786-797 (1991). (2002).
81. A. N. Jain, K. Koile, and D. Chapman, J. Med. 103. F. H. Allen, J. E. Davies, J . J. Galloy, 0.John-
Chem., 37,2315-2327 (1994). son, 0. Kennard, C. F. Macrae, E. M. Mitchell,
82. F. R. Burden, Quant. Struct.-Act. Relat., 15, G. F. Mitchell, J . M. Smith, and D. G. Watson,
7-11 (1996). J. Chem. Znf. Comput. Sci., 31,187-204 (1991).
83. P. G. Dittmar, N. A. Farmer, W. Fisanick, R. C. 104. F. H. Allen, S. Bellard, M. D. Brice, B. A. Cart-
Haines, and J. Mockus, J. Chem. Znf. Comput. wright, A. Doubleday, H. Higgs, T. Hum-
Sci., 23,93-102 (1983). melink, B. G. Hummelink-Peters, 0.Kennard,
W. D. S. Motherwell, J. R. Rodgers, and D. G.
84. R. E. Carhart, D. H. Smith, and R. Venkat- Watson, Acta Crystallogr. Sect. B, B35, 2331-
araghavan, J. Chem. Znf. Comput. Sci., 25, 2339 (1979).
64-73 (1985).
105. A. Rusinko 111, J. M. Skell, R. Balducci, C. M.
85. R. Nilakantan, N. Bauman, J. S. Dixon, and R. McGarity, and R. S. Pearlman, Concord, APro-
Venkataraghavan, J. Chem. Inf. Comput. Sci., gram for the Rapid Generation of High Quality
27,82-85 (1987). Approximate 3-Dimensional Molecular Struc-
86. C. A. Pepperrell and P. Willett, J. Cornput.- tures, The University of Texas at Austin and
Aided Mol. Des., 5,455-474 (1991). Tripos Associates, St. Louis, MO, 1988.
87. R. Nilakantan, N. Bauman, and R. Venkat- 106. R. S. Pearlman, Chem. Des. Aut. News, 2 , l - 6
araghavan, J. Chem. Znf. Comput. Sci., 33, (1987).
79-85 (1993). 107. J. Gasteiger, C. Rudolph, and J. Sadowski, Tet-
88. R. P. Sheridan, M. D. Miller, D. J. Underwood, rahedron Comput. Methodol., 3, 537-547
and S. K. Kearsley, J. Chem. Inf. Comput. Sci., (1990).
36,128-136 (1996). 108. G. R. Marshall, C. D. Barry, H. E. Bosshard,
89. F. R. Burden, Quant. Struct.-Act. Relat., 16, R. A. Dammkoehler, and D. A. Dunn in E. C.
309-314 (1997). Olson and R. E. Christoffersen, Eds., Com-
puter-Assisted Drug Design, Vol. 112, Arneri-
90. B. D. Silverman and D. E. Platt, J.Med. Chem.,
can Chemical Society, Washington DC, 1979,
39,2129-2140 (1996).
pp. 205-226.
91. D. A. Winkler, F. R. Burden, and A. Watkins,
109. G. E. Kellogg, S. F. Semus, and D. J. Abraham,
Quant. Struct.-Act. Relat., 17, 14-19 (1998).
J. Cornput.-AidedMol. Des., 5,545-552 (1991).
92. R. D. Brown and Y. C. Martin, J. Chem. Znf.
110. P. J. Goodford, J. Med. Chem., 28, 849-857
Comput. Sci., 37, 1-9 (1997).
(1985).
93. D. A. Winkler and F. R. Burden, Quant.
111. A. Agarwal, P. P. Pearson, E. W. Taylor, H. B.
Struct.-Act. Relat., 17, 224-231 (1998).
Li, T. Dahlgren, M. Herslof, Y. Yang, G. Lam-
94. G. M. Downs and P. Willett in K. B. Lipkowitz bert, D. L. Nelson, J. W. Regan, and A. R. Mar-
and D. B. Boyd, Eds., Reviews in Computa- tin, J. Med. Chem., 36,4006-4014 (1993).
tional Chemistry, Vol. 7, VCH, Weinheimmew
112. C. L. Waller, T. I. Oprea, A. Giolitti, and G. R.
York, 1996, pp. 1-65.
Marshall, J. Med. Chem., 36, 4152-4160
95. Molconn-Z version 3.5, Hall Associates Con- (1993).
sulting, Quincy, MA.
113. A. K. Debnath, C. Hansch, K. H. Kim, andY. C.
96. M. Petitjean, J. Chem. Znf. Comput. Sci., 32, Martin, J. Med. Chem., 36, 1007-1016 (1993).
331-337 (1992).
114. M. Y. Brusniak, R. S. Pearlman, K. A. Neve,
97. H. Wiener, J. Am. Chem. Soc., 69,17 (1947). and R. E. Wilcox, J. Med. Chem., 39,850-859
98. J. R. Platt, J. Phys. Chem., 56,328 (1952). (1996).
99. C. Shannon and W. Weaver, Mathematical 115. Y. C. Martin, Methods Enzymol., 203,587-613
Theory of Communication, University of Illi- (1991).
nois, Urbana, 1949. 116. Y. C. Martin, M. G. Bures, E. A. Danaher, J.
100. D. Bonchev, 0. Mekenyan, and N. Trinajstic, DeLazzer, I. Lico, and P. A. Pavlik, J. Cornput.-
J. Comput. Chem., 2,127-148 (1981). Aided Mol. Des., 7, 83-102 (1993).
101. The program Sybyl is available from Tripos 117. S. J. Cho, M. G. Serrano, J. Bier, and A. Trop-
Associates, St. Louis, MO. sha, J. Med. Chem., 39,5064-5071 (1996).
118. J. L. Sussman, M. Harel, F. Frolow, C. Oefner, 136. (a) Available from the author's WWW home
A. Goldman, L. Toker, and I. Silman, Science, page at http://mmlinl.pha.unc.edu/-jinl
253,8872-8879 (1991). QSARI (b) A. Tropsha, S. J. Cho, and W. Zheng
119. M. Harel, I. Schalk, L. Ehret-Sabatier, F. in A. L. Parrill and M. R. Reddy, Eds., Rational
Bouet, M. Goeldner, C. Hirth, P. H. Axelsen, I. Drug Design: Novel Methodology and Practi-
Silman, and J. L. Sussman, Proc. Natl. Acad. cal Applications, ACS Symposium Series 719,
Sci. USA, 90,9031-9035 (1993). 1999, pp. 198-211.
120. R. D. Cramer 111, S. A. DePriest, D. E. Patter- 137. T. A. Andrea and H. Kalayeh, J. Med. Chem.,
son, and P. Hecht in H. Kubinyi, Ed., 30 34,2824-2836 (1991).
QSAR in Drug Design: Theory, Methods, and 138. J. D. Hirst, R. D. King, and M. J. Sternberg,
Applications, ESCOM Scientific, Leiden, The J. Cornput.-Aided Mol. Des., 8, 405-420
Netherlands, 1993, pp. 443-485. (1994).
121. M. Baroni, G. Costantino, G. Cruciani, D. Rig- 139. J. D. Hirst, R. D. King, and M. J. Sternberg,
anelli, R. Valigi, and S. Clementi, Quant. J. Cornput.-Aided Mol. Des., 8, 421-432
Strut.-Act. Relat., 12, 9-20 (1993). (1994).
122. M. Pastor, G. Cruciani, I. McLay, S. Pickett, 140. I. V. Tetko, V. Yu. Tanchuk, N. P. Chentsova,
and S. Clementi, J.Med. Chem., 43,3233-3243 S. V. Antonenko, G. I. Poda, V. P. Kukhar, and
(2000). A. I. Luik, J. Med. Chem., 37, 2520-2526
123. P. Ehrlich, Dtsch. Chem. Ges., 42, 17 (1909). (1994).
124. C. Humblet and G. R. Marshall, Annu. Rep. 141. D. T. Manallack, D. D. Ellis, and D. J. Living-
Med. Chem., 15,267-276 (1980). stone, J. Med. Chem., 37,3758-3767 (1994).
125. S. A. DePriest, D. Mayer, C. B. Naylor, and 142. D. J. Maddalena and G. A. Johnston, J. Med.
G. R. Marshall, J. Am. Chem. Soc., 115,5372- Chem., 38,715-724 (1995).
5384 (1993). 143. G. Bolis, L. Pace, and F. A. Fabrocini, J. Com-
126. S. Wang, D. W. Zaharevitz, R. Sharma, V. E. put.-Aided Mol. Des., 5,617-628 (1991).
Marquez, N. E. Lewin, L. Du, P.M. Blumberg, 144. R. D. King, S. Muggleton, R. A. Lewis, and
and G. W. A. Milne, J. Med. Chem., 37,4479- M. J. Sternberg, Proc. Natl. Acad. Sci. USA,
4489 (1994). 89,11322-11326 (1992).
127. I. Motoc, R. A. Dammkoehler, and G. R. Mar- 145. A. N. Jain, T. G. Dietterich, R. H. Lathrop, D.
shall, Mathematics and Computational Con- Chapman, R. E. Critchlow Jr., B. E. Bauer,
cepts in Chemistry, Ellis Honvood, Chichester, T. A. Webster, and T. Lozano-Perez, J. Com-
UK, 1985, pp. 222-251. put.-Aided Mol. Des., 8,635-652 (1994).
128. D. Mayer, C. B. Naylor, I. Motoc, and G. R. 146. J. D. Hirst, J. Med. Chem., 39, 3526-3532
Marshall, J. Cornput.-Aided Mol. Des., 1, 3-16 (1996).
(1987). 147. V. S. Rose, J. Wood, and H. J. H. MacFie in H.
129. R. P. Sheridan, R. Nilakantan, J. S. Dixon, and van de Waterbeemd, Ed., Advanced Computer-
R. Venkataraghavan, J. Med. Chem., 29,899- Assisted Techniques in Drug Discovery, VCH,
906 (1986). WeinheimINew York, 1995, pp. 228-242.
130. G. W. A. Milne, M. C. Nicklaus, J. S. Driscoll, S. 148. Y. Hamamoto, S. Uchimura, and S. Tomita,
Wang, and D. Zaharevitz, J. Chem. Znf: Com- ZEEE Trans. Pattern Anal. Machine Zntell., 19,
put. Sci., 34, 1219-1224 (1994). 73-79 (1997).
131. CatalystMypo Tutorial, version 2.0, BioCAD 149. A. Djouadi and E. Bouktache, ZEEE Trans.
Corp., Mountain View, CA, 1993. Pattern Anal. Machine Zntell., 19, 277-282
132. P. W. Sprague, Perspect. Drug Discov. Des., 3, (1997).
1-20 (1995). 150. 0. Strouf, Chemical Pattern Recognition, Re-
133. D. Barnum, J. Greene, A. Smellie, and P. search Studies Press, Chichester, UK, 1986.
Sprague, J. Chem. Znf. Comput. Sci., 36,563- 151. M. L. Rayrner, P. C. Sanschagrin, W. F. Punch,
571 (1996). S. Venkataraman, E. D. Goodman, and L. A.
134. HipHop Tutorial, version 2.3, Molecular Sim- Kuhn, J. Mol. Biol., 265,445-464 (1997).
ulation Inc., Sunnyvale, CA, 1995. 152. S. C. Basak and G. D. Grunwald, SAR QSAR
135. V. Golender and B. Vesterman, Network Sci- Environ. Res., 3, 265-277 (1995).
ence (http://www.netsci.org/Science/Compchem/ 153. S. C. Basak, S. Bertelsen, and G. D. Grunwald,
featureO9. html). Toxicol. Lett., 79,239-250 (1995).
References
154. S. C. Bas& and G. D. Grunwald, Chemosphere, 175. F . R. Burden and D. A. Winkler, J. Med.
31,2529-2546 (1995). Chem., 42,3183-3187 (1999).
155. S. C. Basak and G. D. Grunwald, New 176. F. R. Burden, M. G. Ford, D. C. Whitley, and
J. Chem., 19,231 (1995). D. A. Winkler, J. Chem. Inf. Comput. Sci., 40,
156. X. Gironbs, A. Gallegos, and C.-D. Ramon, 1423-1430 (2000).
J. Chem Inf Comput. Sci., 46, 1400-1407 177. M. J. Adams, Chemometrics in Analytical
(2000). Spectroscopy, T h e Royal Society of Chemistry,
157. B. Bordhs, T . Kijmives, Z. Szant6, and A. London, 1995.
Lopata, J. Agric. Food Chem., 48, 926-931 178. T . Potter and H. Matter, J. Med. Chem., 41,
(2000). 478-488 (1998).
158. Y . Fan, L. M. Shi, K. W . Kohn, Y . Pommier, 179. M. Lajiness, M. A. Johnson, and G. M. Maggiora
and J. N. Weinstein, J. Med. Chem., 44,3254- in J. L. Fauchere, Ed., Quantitative Structure-
3263 (2001). Activity Relationships in Drug Design, Alan R.
Liss, New York, 1989, pp. 173-176.
159. M. Randic and S. C. Basak, J. Chem. Inf. Com-
put. Sci., 40,899-905 (2000). 180. R. Taylor, J. Chem. Inf. Comput. Sci., 35,
59-67 (1995).
160. T . Suzuki, K. Ide, M. Ishida, and S. Shapiro,
J. Chem. Inf. Comput. Sci., 41, 718-726 181. M. Snarey, N. K. Terrett, P. Willett, and D. J .
(2001). Wilton, J. Mol. Graphics Model., 15, 372385
(1997).
161. M. Recanatini, A. Cavalli, F. Belluti, L. Piazzi,
182. R.W . Kennard and L. A. Stone, Technometrics,
A. Rarnpa, A. Bisi, S. Gobbi, P. Valenti, V . An-
11,137-148 (1969).
drisano, M. Bartolini, and V . Cavrini, J. Med.
Chem., 43,2007-2018 (2000). 183. B. Bourguignon, P. F. Deaguiar, K. Thorre,
and D. L. Massart, J. Chromatogr. Sci., 32,
162. J. A. Morbn, M. Campillo,V . Perez, M. Unzeta, 144-152 (1994).
and L. Pardo, J. Med. Chem., 43, 1684-1691
184. B. Bourguignon, P. F. Deaguiar, M. S. Khots,
(2000).
and D. L. Massart, Anal. Chem., 66, 893-904
163. A. Golbraikh and A. Tropsha, J. Mol. Graphics (1994).
Model., 20,269-276 (2002). 185. S. Hellberg, L. Eriksson, J. Jonsson, F.
164. J. Huuskonen, J. Chem. Inc Comput. Sci., 41, Lindgren, M. Sjostrom, B. Skagerberg, S.
425-429 (2001). Wold, and P. Andrews, Int. J. Pept. Protein
165. I. V. Tetko, V . V . Kovalishyn, and D. J. Living- Res., 37,414-424 (1991). .
stone, J. Med. Chem., 44, 2411-2420 (2001). 186. L. Eriksson and E. Johansson, Chemom. Intell.
166. W. W u , B. Walczak, D. L. Massart, S. Heuerd- Lab. Syst., 34, 1-19 (1996).
ing, F. Erni, I. R. Last, and K. A. Prebble, Che- 187. R. Carlson, Design and Optimization in Or-
mom. Intell. Lab. Syst., 33, 35-46 (1996). ganic Synthesis, Elsevier, Amsterdam/New
167. A. Yasri and D. Hartsough, J. Chem. Inf Com- York, 1992.
put. Sci., 41, 1218-1227 (2001). 188. E. J. Martin and R. E. Critchlow, J. Comb.
168. P. Bernard, D. B. Kireev, J. R. Chretien, P. L. Chem., 1,32-45 (1999).
Fortier, and L. Coppet, J. Cornput.-Aided Mol. 189. A. Miller and N.-K. Nguyen, Appl. Stat., 43,
Des., 13,355-371 (1999). 669-678 (1994).
169. Y . Takeuchi, E. F. B. Shands, D. D. Beusen, 190. T . J. Mitchell, Technometrics, 42, 48-54
and G. R. Marshall, J. Med. Chem., 41,3609- (2000).
3623 (1998). 191. S. Wold and L. Eriksson in H. van de Water-
170. G. V . Kauffmanand P. C. Jurs, J. Chem. Inf. beemd, Ed., Chemometrics Methods i n Molec-
Comput. Sci., 41, 1553-1560 (2001). ular Design,VCH, WeinheimINewYork, 1995,
pp. 309-318.
171. B. E. Mattioni and P. C. Jurs, J. Chem. Inf. 192. E. Novellino, C. Fattorusso, and G. Greco,
Comput. Sci., 42,94-102 (2002). Pharm. Acta Helv., 70, 149-154 (1995).
172. J. Gasteiger and J. Zupan, Angew. Chem., 32, 193. U. Norinder, J. Chemom., 10,95-105 (1996).
503 (1993). 194. N. S. Zefirov and V . A. Palyulin, J. Chem. Inf.
173. Y . L. Loukas, J. Med. Chem., 44, 2772-2783 Comput. Sci., 41, 1022-1027 (2001).
(2001). 195. L. Sachs, Applied Statistics: A Handbook of
174. P. Bernard, M. Pintore, J.Y. Berthon, and J. R. Techniques, Springer-Verlag, BerlirdNew
Chretien, Eur. J. Med. Chem., 36,l-19 (2001). York, 1984.
196. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, 214. E. H. Sussenguth, J. Chem. Doc., 5, 36-43

and R. Uthurusamy, Adavnces in Knowledge (1965).
Discovery and Data Mining, AAAI PressiThe 215. J. Figueras, J. Chem. Doc., 12,237-244 (1972).
MIT Press, Cambridge, MA, 1996. 216. J. R. Ullmann, J. Assoc. Comput. Mach., 23,
197. G. H. John, Enhancements to the Data Mining 31-42 (1976).
Process, Ph.D thesis, Stanford University, 217. A. Von Scholley, J. Chem. Inf. Comput. Sci.,
1997. 24,235-241 (1984).
198. U. M. Fayyad, G. Piatetsky-Shapiro, and P. 218. P. H. A. Sneath and R. R. Sokal, Numerical
Smyth, From Data Mining to Knowledge Dis- Taxonomy, Freeman, San Francisco, 1973.
covery, AAAI Press/The MIT Press, Cam-
219. P. Willett and V. A. Winterman, Quant.
bridge, MA, 1995.
Strut.-Act. Relat., 5, 18-25 (1986).
199. E. Simoudis, ZEEE Expert., 11, 26-33 (1996).
220. M. A. Gallop, R. W. Barret, W. J. Dower,
200. M. A. Razzak and R. C. Glen, J.Cornput.-Aided S. P. A. Fodor, and E. M. Gordon, J. Med.
Mol. Des., 6,349-383 (1992). Chem., 37,1233-1251 (1994).
201. R. D. King, J.D. Hirst, and M. J. E. Sternberg, 221. E. M. Gordon, R. W. Barret, W. J. Dower,
Appl. Artif. Zntell., 9, 213-234 (1994). S. P. A. Fodor, and M. A. Gallop, J. Med.
202. S. S. Young and D. M. Hawkins, J.Med. Chem., Chem., 37,1385-1401 (1994).
38,2784-2788 (1995). 222. W. A. Warr, J. Chem. Znf. Comput. Sci., 37,
203. S. S. Young and D. M. Hawkins, SAR QSAR 134-140 (1997).
Environ. Res., 8, 183-193 (1998). 223. R. P. Sheridan and S. K. Kearsley, J. Chem.
204. D. M. Hawkins, S. S. Young, and A. Rusinko, Znf. Comput. Sci., 35,310-320 (1995).
Quant. Struct.-Act. Relat., 16, 1-7 (1997). 224. A. K. Ghose and V. N. Viswanadhan, Eds.,
205. R. King, R. Henery, C. Feng, and A. Suther- Combinatorial Library Design and Evaluation
land in D. Michie, S. Muggleton, and F. Fu- for Drug Discovery: Principles, Methods, Soft-
rukawa, Eds., Machine Intelligence and Induc- ware Tools and Applications, Marcel Dekker,
tive Learning, Vol. 13, Oxford University New York, 2001.
Press, Oxford, UK, 1994. 225. W. Zheng, S. J. Cho, and A. Tropsha, J. Chem.
206. S. S. Young, M. Farmen, and A. Rusinko, Net- Znf. Comp. Sci., 38,251-258 (1998).
work Science (http://www.netsci.org/Science/ 226. S. J. Cho, W. Zheng, and A. Tropsha, J. Chem.
Screening/featureO9.html) Znf. Comp. Sci., 38,259-268 (1998).
207. J. M. Barnard and G. M. Downs, Perspect. 227. A. Tropsha, S. J. Cho, and W. Zheng in A. L.
Drug Discov. Des., 718, 13-30 (1997). Parrill and M. R. Reddy, Eds., Rational Drug
208. W. P. Walters, M. T. Stahl, and M. A. Murcko, Design: Novel Methodology and Practical Ap-
Drug Discov. Today, 3,160-178 (1998). plications, ACS Symposium Series 719, Arner-
209. J. M. Barnard, J. Chem. Znf. Comput. Sci., 33, ican Chemical Society, Washington, DC, 1999,
532-538 (1993). pp. 198-211.
210. S. A. Cook, Proceedings of the Third Annual 228. I. 0. Bohachevsky, M. E. Johnson, and M. L.
ACM Symposium on the Theory of Computing, Stein, Technometrics, 28,209-217 (1986).
ACM, New York, 1971, pp. 151-158. 229. J. H. Kalivas, J. M. Sutter, and N. Roberts,
211. L. C. Ray and R. A. Kirsch, Science, 126,814- Anal. Chem., 61,2024-2030 (1989).
819 (1957). 230. D. E. Goldberg, Genetic Algorithm in Search,
212. X. J u n and Z. Maosen, Tetrahedron Comput. Optimization, and Machine Learning, Addi-
Methodol., 2, 75-83 (1989). son-Wesley, Reading, MA, 1989.
213. A. Dengler and I. Ugi, Comput. Chem., 15, 231. J. H. Holland, Sci. Am., 267, 66-72 (1992).
103-107 (1991). 232. S. Forrest, Science, 261,872-878 (1993).
CHAPTER THREE
Molecular Modeling in Drug

Design
GARLAND R. MARSHALL
Washington University
Center for Computational Biology
St. Louis, Missouri
DENISE D. BEUSEN
Tripos, Inc.
St. Louis, Missouri
Contents
1 Introduction, 78
2 Background and Methods, 79
2.1 Molecular Mechanics, 79
2.1.1 Force Fields, 79
2.1.2 Electrostatics, 81
2.1.2.1 The Dielectric Problem and
Solvation, 83
2.1.2.2 The "Hydrophobic" Effect, 85
2.1.2.3 Polarizability, 85 .
2.1.3 The Potential Surface, 85
2.1.3.1 Optimization, 86
2.1.3.2 Potential Smoothing, 86
2.1.3.3 Genetic Algorithm, 87
2.1.4 Systematic search and Conformational
Analysis, 89
2.1.4.1 Rigid Geometry Approximation,
89
2.1.4.2 Combinatorial Nature of the
Problem, 89
2.1.4.3 Pruning the Combinatorial
Tree, 90
2.1.4.4 Rigid Body Rotations, 90
2.1.4.5 The Concept and Exploitation
of Rings, 91
2.1.4.6 Conformational Clustering and
Families, 92
2.1.4.7 Conformational Analysis, 93
2.1.4.8 Other Implementations of
Systematic Search, 94
Burger's Medicinal Chemistry and Drug Discovery 2.1.5 Statistical Mechanics Foundation, 94
Sixth Edition, Volume 1: Drug Discovery 2.1.6 Molecular Dynamics, 95
Edited by Donald J. Abraham 2.1.6.1 Integration, 95
ISBN 0-471-27090-3 O 2003 John Wiley & Sons, Inc. 2.1.6.2 Temperature, 96
Molecular Modeling in Drug Design
2.1.6.3Pressure and Volume, 96 3.4.4Simulations and the Thermodynamic

2.1.7 Monte Carlo Simulations, 96 Cycle, 120
2.1.8 Thermodynamic Cycle Integration, 99 3.4.5 Multiple Binding Modes, 121
2.1.9 Non-Boltzmann Sampling, 100 3.5 Protein Structure Prediction, 122
2.2 Quantum Mechanics: Applications 3.5.1 Homology Modeling, 123
in Molecular Mechanics, 100 3.5.2 Inverse Folding and Threading, 123
2.2.1 Parameterization of Charge, 101 3.5.3 Contact Matrix, 125
2.2.1.1 Atom-Centered Point Charges, 4 Unknown Receptors, 127
101 4.1 Pharmacophore versus Binding-Site Models,
2.2.1.2 Methods to Reproduce the 127
Molecular Electrostatic
4.1.1 Pharmacophore Models, 127
Potential (MEP),102
4.1.2 Binding-Site Models, 130
2.2.2 Parameter Derivation for Force Fields,
4.1.3 Molecular Extensions, 130
102
4.1.4 Activity versus Affinity, 131
2.2.3 Modeling Chemical Reactions and
Design of Transition-State Inhibitors, 4.2 Searching for Similarity, 135
103 4.2.1 Simple Comparisons, 135
3 Known Receptors, 103 4.2.2 Visualization of Molecular Properties,
3.1 Definition of Site, 103 137
3.2 Characterization of Site, 105 4.3 Molecular Comparisons, 138
3.2.1 Volume and Shape, 105 4.3.1 Volume Mapping, 139
3.2.2 Hydrogen-Bonding and Other Group 4.3.2 Field Effects, 140
Binding Sites, 107 4.3.3 Directionality, 140
3.2.3 Electrostatic and Hydrophobic Fields, 4.3.4 Locus Maps, 140
108 4.3.5 Vector Maps and Conformational
3.3 Design of Ligands, 110 Mimicry, 140
3.3.1 Visually Assisted Design, 110 4.4 Finding the Common Pattern, 142
3.3.2 Three-Dimensional Databases, 110 4.4.1 Constrained Minimization, 143
3.3.3 De Novo Design, 113 4.4.2Systematic Search and the Active
3.3.4 Docking, 113 Analog Approach, 144
3.3.4.1Docking Methods, 113 4.4.3 Strategic Reductions of Computational
3.3.4.2 Scoring Functions, 115 Complexity, 145
3.3.4.3 Search for the Correct Binding 4.4.4 Alternative Approaches, 146 .
Mode, 116 4.4.5 Receptor Mapping, 148
3.4 Calculation of Affinity, 118 4.4.6 Model Receptor Sites, 149
3.4.1 Components of Binding Affinity, 118 4.4.7 Assessment of Model Predictability,
3.4.2 Binding Energetics and Comparisons, 151
120 5 Conclusions, 153
3.4.3 Atom-Pair Interaction Potentials, 120 6 Acknowledgments, 155
1 INTRODUCTION rapid increase in relevant structural informa-

tion, attributed to advances in molecular biol-
By historical imperative, the role of molecular ogy to generate the target proteins in ade-
modeling in drug design has been divided into quate quantities for study, and the equally
two separate paradigms, one centered on the impressive gains in NMR (1-9) and crystallog-
structure-activity problem that attempts to raphy (10, 11) to provide three-dimensional
rationalize biological activity in the absence of structures as well as identify leads, have stim-
detailed, three-dimensional structural infor- ulated the need for design tools and the molec-
mation about the receptor, and the other fo- ular modeling community is rapidly evolving
cused on understanding the interactions seen useful approaches. The more common prob-
in receptor-ligand complexes and using the lem, however, is one in which the receptor can
known three-dimensional structure of the only be inferred from pharmacological studies
therapeutic target to design novel drugs. The and little, if any, structural information is
2 Background and Methods
available to guide in modeling. Nevertheless, the distance between the atoms. It is balanced
useful information to guide the design and by a repulsion between the electronic clouds as
synthesis of potential novel therapeutics can the atoms come close and this interaction has
be developed from an analysis of structure- been represented empirically by a variety of
activity data in the three-dimensional frame- functional forms: exponential, 12th power, or
work provided by current molecular modeling 9th power of the distance between the atoms.
techniques. Although most of the techniques The coefficients for these two interactions are
and approaches described have broader appli- parameterized for atom types, usually by ele-
cation than shown, the examples chosen ment, so that the minimum of the combined
should be sufficient to illustrate their use. A functions corresponds to the sum of the exper-
number of reviews (12-18) of computer-aided imental van der Waals radii for the two atoms.
drug design have relevant sections covering In addition, bonded atoms are considered
portions of this chapter with different per- as a special case, with a "spring constant" de-
spectives and are recommended for a more termining the energy of deformation from ex-
complete overview. perimental bond lengths. Atoms directly
bonded to the same atom (one-three interac-
tions) are eliminated from the van der Waals
2 BACKGROUND AND METHODS
list and have a special energetic term relating
the deviation from an ideal bond angle. Atoms
2.1 Molecular Mechanics
having a one-four interaction define a tor-
Molecular mechanics (19) treats a molecule as sional relation that is usually parameterized
a collection of atoms whose interactions can be based on the types of the four connected atoms
described by Newtonian mechanics. Because defining the torsion angle. The numerous
the mass of the nuclei is much greater than combinations of atom types require an enor-
the mass of the electrons, one can separate mous number of parameters to be determined
(the Born-Oppenheimer approximation) the from either theoretical (quantum mechanics)
Schrodinger equation into a product of two and/or experimental data. Simplified force
functions: one for electrons, one for nuclei. fields in which the torsional parameters de-
For the purposes of molecular mechanics, the pend only on the atoms at the end of a bond
electronic function, initially developed to in- have been developed, to give approximate ge- .
terpret spectroscopic data, is ignored; that is, ometries for further refinement by quantum
the charge distribution is assumed to remain mechanics.
constant during changes in the position of the
nuclei. Because molecular mechanics is based 2.1.1 Force Fields. The basic assumption
on classical physics, it cannot provide informa- underlying molecular mechanics is that classi-
tion about the electronic properties of mole- cal physical concepts can be used to represent
cules under study that are generally assumed the forces between atoms. In other words, one
fmed during the parameterization of the force can approximate the potential energy surface
field with experimental data. by the summation of a set of equations repre-
A few words about the basics of molecular senting pairwise and multibody interactions.
mechanics (19, 20) may provide the elements These equations represent forces between at-
of understanding for what follows. This is not oms related to bonded and nonbonded interac-
meant to be comprehensive, but rather a sim- tions. Pairwise interactions are often repre-
ple overview, to remind the reader of a few sented by a harmonic potential [YzKb(b- bJ2]
crucial points. For a comprehensive overview that obeys Hooke's law (derived for a spring)
of molecular modeling, the reader is referred for bonded atoms, restoring the bond distance
to the excellent text by Leach (21). The inter- to an equilibrium value b,, and a van der
actions between atoms are divided into Wads potential [C,,(i, j)/rG12- CJi, j)/rG6]for
bonded and nonbonded classes. Nonbonded nonbonded atoms. Similarly, distortion from
forces between atoms are based on an attrac- an equilibrium valence angle (8,) describing
tive interaction that has a firm theoretical ba- the angle between three bonded atoms shar-
sis and varies as the inverse of the 6th power of ing a common atom is also penalized [YzKe(8-
00)2].A third class of interaction dependent on - bJ(0 - O,)], dihedral angles and bond an-
the dihedral angle C#J between four bonded at- gles, and so forth. Because of the lack of ade-
oms is the torsional potential {KJl + cod+ - quate parameterization of the more complex
S)]} used to account for orbital delocalization force fields that are usually specialized to one
and to compensate for other deficiencies in the kind of molecule (e.g., proteins or nucleic ac-
force field. A harmonic term [?hK5(5- is ids), more simplified force fields have gained
often introduced for dihedral angles 5 that are some popularity because of their general ap-
relatively fixed, such as those in aromatic plicability, despite limited accuracy.
rings. Coulomb's law [qiqj/(4m0&yij)lis the Examples are the Tripos force field (221,the
simplest approach to the contribution of elec- COSMIC force field (23), and that of White
trostatics to the potential V: and Bovill (24), which uses only two atom
types, those at the end of the bond to parame-
terize the torsional potential rather than the
four types of the atoms used to define the tor-
sional angle. One has only to consider the
number of combinations of 20 atom subtypes
taken four at time (160,000) versus two at a
time (400) to understand the explosion of pa-
rameters that occurs with increased atom sub-
types. The simplifying assumption in parame-
terization of the torsional potential reduces to
A central issue is the number of different some extent the quality of the results (251, but
atom types that are used in a particular force allows the use of the simplified force fields (22)
field. There is always a compromise between in many situations where other force fields
increasing the number to allow for the inclu- would lack appropriate parameters. The situ-
sion of more environmental effects (i.e., local ation can become complicated, however. For
electronic interactions) vs. the increase in the example, the amide bond is normally repre-
number of parameters to be determined to ad- sented by one set of parameters, whether the
equately represent a new atom type. In gen- configuration is cis or trans. Experiment$
eral, the more subtypes of atoms (how many data are quite compelling that the electronic
different kinds of nitrogen, for example), the state is different between the two configura-
less likely that the parameters for a particular tions, and different parameter sets should be
application will be available in the force field. used for accurate results (Fig. 3.1). Only AM-
The extreme, of course, would be a special BERIOPLS currently distinguishes between
atom type for each kind of atomic environ- these two conformational states (26). Cer-
ment in which the parameters were chosen, so tainly, the limited parameterization of simpli-
that the calculated properties of each molecule fied force fields would not allow accurate pre-
would simply reproduce the experimental ob- diction of spectra that is more reflective of the
servations. One major assumption, therefore, dynamic behavior of the molecule.
is that the force constants (parameters) and Accurate estimates of energy may require
equilibrium values of the equations are func- accurate representation of the dynamics of
tions of a limited number of atom types and molecules and justify derivation of the larger
can be transferred from one molecular envi- number of parameters. The new version (27)
ronment to another. This assumption holds of the Allinger force field, MM3, has the objec-
reasonably well where one may be primarily tive of reproducing spectral data more accu-
interested in geometric issues, but is not so rately than MM2. Much of the chemistry re-
valid in molecular spectroscopy. This had led mains to be incorporated into appropriate
to the introduction of additional equations, force fields. Only recently have adequate mod-
the so-called "cross-terms" which allow addi- ifications been made to the force fields devel-
tional parameters to account for correlations oped for organic molecules to include some
between bond lengths and bond angles [K,,(b metals (28-31). Carlsson (32, 33) recently de-
trans-amide cis-amide
H3C 116.6 0.995 Figure 3.1. Differences in

OPLS charge distribution (top)
2 7 7 1.445 between cis- and trans-isomers of
CH3 amide bond and geometries (bot-
tom) as calculated by ab initio pa-
trans-amide cis-amide rameterization (26).
veloped a functional form that allows elec- gen bond is included. Because atoms involved
tronic d-orbitals of metals to be reasonably in a hydrogen bond are often closer than the
represented within molecular mechanics. sum of their VDW radii, they must be handled
Because different force fields may use dif- in a special manner. Several force fields have
ferent mathematical representations of the special functional forms with angular depen-
forces between atoms and the details of their dency that not only have special VDW param-
parameterization will in general differ also, it eters, to ensure that the close approach of the
is unwise to use parameters derived for one atoms involved is calculated correctly, but .
force field to replace missing parameters in that the angular distribution observed for hy-
another. One often hears of a "balanced" pa- drogen bonds is also reproduced. Hagler et al.
rameter set that reproduces well the phenom- (34) used an amide hydrogen with a zero VDW
ena under consideration, but which is inade- radius for hydrogen bonding and a slightly
quate for other applications. A comparison by greater nitrogen radius to give a correct amide
Burkert and Allinger (19) shows the different hydrogen bond distance. The charges on the
van der Wads (VDW) potentials used in sev- atoms involved (including the amide hydro-
eral of the popular force fields, and the situa- gen) are adjusted to give an appropriate bal-
tion has not improved significantly in the in- ance of VDW repulsion and dipole attraction.
tervening years. Because of other differences Clearly, the method for handling the electro-
in parameters and functional forms of the static interaction is an integral part of each
equations used in the rest of the individual force field and cannot be modified indepen-
force fields, these quite different approaches dently.
to the VDW potential give excellent results
when used in the correct combination. Indis- 2.1.2 Electrostatics. The most difficult as-
criminant combination of one part of a force pect of molecular mechanics is electrostatics
field with another derived independently (35-38). In most force fields, the electronic dis-
would lead to considerable divergence in the tribution surrounding each atom is treated as
calculated results from those obtained by ex- a monopole with a simple coulombic term for
perimental observation. the interaction. The effect of the surrounding
The most extreme difference between force medium is generally treated with a continuum
fields arises in the method by which the hydro- model by use of a dielectric constant. More
detailed approaches with distributed multi- connecting the center of the dipole with
pole representations of the electron distribu- charge and dipole orientation, and r is the dis-
tion (39,40) andlor efforts to deal with dielec- tance between the center of the ion and the
tric inhomogeneity through solution of the center of the dipole.
Poisson equation are clear improvements and Charge-Charge lnteractions (r- '1. The en-
have become routine in many studies. Other ergy of interaction between two charges q,
difficulties arise in dealing with macromolec- and q, is given by Coulomb's law:
ular systems, given that the electrostatic in-
teraction is long ranged (llr)and the interac-
tions cannot be arbitrarily terminated with
distance. Electrostatic interactions range
from those operating only at very short dis- where r,, is the distance separating charges
tances that are nonspecific (dispersiveinterac- and E is the dielectric constant of the medium.
tions, rP6dependency) to those operating at To evaluate atom-atom interactions using
very long distances with a high degree of spec- Coulomb's law, the concept of net atomic
ificity (charge-charge interactions, r-' depen- charge is invoked. This amounts to represent-
dency). ing charge as a point, a monopole, and is an
Dispersive lnteractions (rF6). These are at- artificial construct. Nevertheless, this is the
tributed to interaction of induced dipoles common method. Recent improvements in cal-
within the electron clouds as molecules come culating an appropriate set of point charges, to
in proximity and are responsible for the at- accurately reproduce the molecular electro-
tractive part of the nonbonded van der Wads static potential derived by quantum calcula-
interaction. tions, have been reported (41).
Dipole-Dipole lnteractions (rP3). Because of In an effort to increase the quality of elec-
the nonsymmetrical distribution of electrons trostatic representations, dipole and higher
between atoms of different size and electro- multipole moments have been used. There are
negativity, bonds have associated permanent advantages in these more accurate represen-
dipoles. The interaction energy between two tations, with a relatively small computational
of these dipoles depends on their relative ori- increase attributed to the reductions in dis-
entation. This is basically the interaction un- tances over which the higher moments have to
derlying the phenomenon of the hydrogen be summed, although they do require addi-
bond. Although some force field authors use a tional effort in the derivation of the parame-
special hydrogen bonding potential with an ters for the higher moments themselves. A
orientation dependency, simple partial charge good example is the distributed multipole
representations combined with appropriate model of electrostatics derived for peptides. A
VDW parameters can reproduce the effect as review by Williams (42) discusses the prob-
well (34). lems of deriving a distributed multipole ex-
Charge-Dipole lnteractions (rP2). A charge pansion of charge representation that accu-
interacting with a permanent dipole can be rately reproduces the molecular electrostatic
handled simply by considering the charge in- potential derived from quantum calculations.
teracting with the two charges at the poles of Comparisons were made between atomic mul-
the dipole. Alternatively, if the distance be- tipoles, bond dipole, and restricted bond dipole
tween the poles of the dipole is small compared models. Williams finds that a model for the
with that between the centers of the ion and electrostatic potential based on bond dipoles
the dipole, then the potential energy @ can be supplemented with monopoles (for ions) and
approximated as atomic dipoles (for lone pairs) is most useful.
Dipole-dipole energy converges much faster
@ = e p cos 01r2 than monopole-monopole energy. Molecular
charge at any desired position in a molecule is
where e is the charge of ion, p is the dipole not a physically measurable quantity; one can
moment, O is the angle between the vector only calculate a delocalized electron probabil-
ity distribution from quantum theory. Clearly, gin of solvent effects on conformational equi-
the more complex the representation, the libria and reaction rates. The current status of
more accurately one can approximate the such efforts, as well as simulations to rational-
quantum mechanical results, and the more re- ize solvation effects, has been reviewed by
alistic should be the results obtained. One Richards et al. (55). There are two general ap-
complexity of electrostatics is the long dis- proaches to the continuum models. The first is
tances over which interactions occur. Appro- reaction field theory (Bell, Kirkwood, On-
priate means of truncating the long-range sager) that follows the classical treatment of
forces to maintain the accuracy of simulations Debye-Huckel. The solvent is considered in
are necessary (43-45) and progress in better
terms of charge distribution, polarizability,
approximations has been reported (46). The
and dielectric constant. The solvation energy
difficulties with cutoff schemes were demon-
strated (47,481by significant variations in the is determined simply by considering the solute
behavior of a 17-residue helical peptide simu- as a point dipole that interacts with the in-
lated with explicit waters, using various elec- duced charge distribution in the solvent (On-
trostatic schemes and by studies (49) of a pen- sager reaction field). An extension by Si-
tapeptide in aqueous ionic solution (50). In nangolou in the 1960s partitioned solvation
both cases, the Ewald approximation in which energy into cavity formation, solvent-solute
periodicity is assumed (which allows summa- interaction, and the "free volume" of the sol-
tion over much longer distances) gave supe- ute. The logical extension of this approach is
rior results (47-49). scaled-particle theory (56), in which the free
2.1.2.1 The Dielectric Problem and Solva- energy of formation of a hard-sphere cavity of
tion. Although methods of localizing charge diameter a2 in a hard-sphere solvent of diam-
just described may give reasonable results, the eter a and number density p is scaled to the
use of Coulomb's law with a dielectric con- exact solution for small cavity sizes. Alter-
stant, a scaling factor related to the polariz- natively, the virtual charge approach used a
ability of the medium between the charges, is system of effective and virtual charges inter-
clearly of concern. The dielectric at the molec- acting in the gas phase. The Hamiltonian of
ular level is neither homogeneous nor contin- the system is modified to include an imagi-
uous, nor even well defined, and thus violates
the basic assumption of Coulomb's law. Al-
nary particle, a "solvaton" with an opposite .
charge for each of the solute atoms and
though the use of a low, uniform dielectric is solved by the SCF procedure. These contin-
more nearly correct in dynamical simulations
uum models have met with limited success
where all solute and solvent atoms are explic-
(trends and relative effects of solvation
itly included, a variety of comparisons of ex-
can be predicted), although highly specific
perimental data with the results of calculation
by use of a simplified solvent model have led to molecular interactions, such as those involv-
the realization that much better approaches ing hydrogen-bonding groups, cannot be
are needed. Initial efforts (51) led to the pro- accommodated.
posal of a variable dielectric (1/R or 1/4R). In the equation for calculating affinity of
More recently, the use of approaches that a drug for a receptor, the ligand is solvated
model the inhomogeneity of the dielectric at either by the receptor or by the solvent. This
the interface between the solute and solvent competition means that accurate determina-
by use of the Poisson-Boltzman equation have tion of the free energy of solvation is impor-
shown considerable promise (52,53). An alter- tant in understanding differences in affini-
native approach that uses the mirror charge ties. Solvation free energy (G,,,) can be
approximation has been described by Schaefer approximated by three terms: G, the for-
and Froemmel(54). Excellent reviews (35-38) mation of a cavity in the solvent to hold the
of the electrostatic problem have appeared, to solute; Gvdw and G,,,, the interaction be-
which the reader is referred. tween solute and solvent divided be-
Much effort has been given to simple con- tween VDW and electrostatic forces, respec-
tinuum models of solvation to explain the ori- tively:
the solvation term in the Hamiltonian of

the solute. The charge distribution on the
There are four theoretical approaches to the surface of the cavity depends on the sol-
problem: ute's electric field, which is affected in turn
by polarization from the cavity's surface.
1. Scaled Particle Theory (56) An iterative QM procedure is used to ob-
The essence of the scaled particle theory tain the perturbation term. Cramer and
is that formation of a cavity in a fluid re- Trular have developed AMSOL to include a
quires work. The theory for hard spheres solvent approximation in calculations of
has been well developed from statistical molecular systems. The approach has been
mechanics, and the work, W(R, p), can be calibrated by comparison of theoretical and
calculated as follows: experimental solvation free energies for
numerous molecular species (60).
4. Poisson-Boltzmann Equation (53) (Method
for GPO,calculation)
Generalization of the Debye-Huckel
theory leads directly to the Poisson-Boltz-
mann equation that describes the electro-
where y = ~ ~ ~R = ~ u2/ul,
~ 1 u26is the
, static potential of a field of charges with
diameter of the hard-sphere solute, u, is dielectric discontinuities. This equation
the diameter of the solvent, and p is the has been solved analytically for spherical
number density of the fluid (Nm. and elliptical cavities, but must be solved
Because this theow" includes no interac- by finite-difference methods on a grid for
tion between solvent and solute (i.e., only more complicated systems. One exciting
G,, is calculated), effective volumes for advance in this area is the development of
nonspherical compounds with interactive an approximate equation for the reaction
groups are normally calibrated from experi- field acting on a macromolecular solute, at-
ment. This is one way to deal with the energy tributed to the surrounding water and ions
of interaction between solvent and solute. (61). By combining these equations yith
For further discussion, see Pollack (57). conventional molecular dynamics, solva-
2. Charge Image (or Virtual Charge) Method tion free energies were obtained similar to
(54) (Method for GPO,calculation) those with explicit solvent molecules, at lit-
This model replaces the solute-contin- tle computational cost over vacuum simu-
uum model with one in which a system of lations. This implies that a more nearly
charges
- derived from the solute and virtual correct solution to the electrostatics prob-
charges in the adjacent space interact in lem might minimize the solvation problem.
the gas phase. A set of mirror charges re- Other approaches to evaluations of G,,,
flected at the dielectric boundary are cre- have recently appeared in the literature.
ated and used in the calculation of the Still et al. (62) estimated G,, + Gvdw by
electrostatics. the solvent-accessible surface area times
3. Boundary Element Method (58) (Method 7.2 caI/moI/A2. GPO,is estimated from the
for G,,, calculation) generalized Born equation. Effective solva-
1n- this approximation, the system is tion terms have been added (63,641 to mo-
modeled by calculating the appropriate lecular mechanics force fields to improve
surface charges at the dielectric boundary. molecular dynamics simulations without
This is similar to fitting charges at atomic the cost of modeling explicit solvent. Zau-
centers to reproduce the molecular electro- har (65) combined the polarization-charge
static potential. For a quantum-mechani- technique with molecular mechanics to ef-
cal equivalent, Tomasi et al. (59) intro- fectively minimize a tripeptide in solvent.
duced a charge distribution on the surface One final refinement may be necessary in
of a cavity of realistic shape to introduce some situations: the inclusion of electric polar-
izability, for example, by inclusion of induced are highly polar. A recent paper (83) from the
dipoles, or distributed polarizability (66) in Kollman group described nonadditive many-
the electrostatic representation of the model. body potential models to calculate ion solva-
Kuwajima and Warshel (67) recently exam- tion in polarizable water with good agreement
ined the effects of this refinement in modeling with experimental observation. It was neces-
crystal structures of polymorphs of ice. Such sary to include a three-body potential (ion-wa-
models including polarizability have been pre- ter-water) in the molecular dynamics simula-
viously shown useful for predicting the prop- tion of the ionic solution to obtain quantitative
erties of crystalline polymorphs of polymers agreement with solvation enthalpies and coor-
by Sorensen et al. (68). Caldwell et al. (69) dination numbers. Inclusion of a bond-dipole
included implicit nonadditive polarization en-
-
model with polarizability in molecular dynam-
ergies in water-ion outcomes, resulting in im- ics simulations has given excellent agreement
proved accuracy. At the semiempirical level of in predicting physical properties of polymers
quantum theory, Cramer and Truhlar (70-73) by Sorensen et al. (68).
added solvation and solvent effects on polariz- A novel approach based on the concept of
ability to AM1, with impressive agreement be- charge equilibration has been suggested by
tween experimental and calculated solvation Rappe and Goddard (84) that allows the inclu-
energies (60). Rauhut et al. (74) also intro- sion of polarizabilities in molecular dynamics
duced an arbitrarily shaped cavity model by calculations.
use of standard AM1 theory.
2.1.2.2 The "Hydrophobic" Effect. Water 2.1.3 The Potential Surface. The set of
has been the nemesis of solvation modeling equations that describe the sum of interac-
because of its rather unique thermodynamic tions between the ensemble of atoms under
properties, as reviewed by Frank (75) and consideration is an analytical representation
Stillinger (76). The biochemical literature dis- of the Born-Oppenheimer surface, which de-
cusses at length "hydrophobic effects" (77). scribes the energy of the molecule as a func-
This effect is not "hydrophobic" at all because tion of the atomic positions. Many important
the enthalpic interaction of nonpolar solutes properties of the molecule can be derived by
with water is favorable. This, however, is evaluation of this function and its derivatives.
counterbalanced by an unfavorable entropic For example, setting the value of the first de-.
interaction that is interpreted as an induced rivative to zero and solvingfor the coordinates
structuring of the water by the nonpolar sol- of the atoms leads one to minima, maxima,
ute. Water interacts less well with the nonpo- and saddlepoints. Evaluation of the sign of the
lar solute than it does with itself because of the second derivative can determine which of the
lack of hydrogen-bonding groups on the sol- above have been found. It is a straightforward
ute. This creates an interface similar to the procedure to calculate the vibrational fre-
air-water interface, with a resulting surface quencies from the force constants by evalua-
tension attributed to the organization of the tion of the eigenvalues of the secular determi-
hydrogen-bonded patterns available. This is nant (the mass-weighted matrix; see textbook
the so-called iceberg formation around nonpo- on vibrational spectroscopy). Gradient meth-
lar solutes in water, first suggested by Frank ods for the location of energy minima and
and Evans. Studies by both molecular dynam- transition states are an essential part of any
ics (78-80) and Monte Carlo simulations (81) molecular modeling package. It is essential to
support this interpretation (76), although remember, however, that minimization is an
there is still considerable controversy in inter- iterative method of geometrical optimization
pretation of experimental data (82). that is dependent on starting geometry, unless
2.1.2.3 Polarizabilify. The traditional ap- the potential surface contains only one mini-
proaches in molecular mechanics have ex- mum (a condition not found for any system of
cluded the effects of charge on induced dipoles sufficient complexity to be of real interest).
and multibody effects. This approximation be- The ability to locate both minima and tran-
comes a serious limitation when dealing with sition points enables one to determine the
charged systems and molecules like water that minimum energy reaction path between any
two minima. In the case of flexible molecules, within the subset and can readily be identified
these minima could correspond to conformers by its potential value compared with that of
and the reaction path would correspond to the the other minima.
most likely reaction coordinate. One could es- 2.1.3.2 Potential Smoothing. One ap-
timate the rate of transition by determination proach to global optimization that has shown
of the height of the transition states (the acti- promise is potential smoothing (88). This ap-
vation energy) between the minima. Elbers proach uses a mathematical transformation to
(85) developed a new protocol for the location smooth the multidimensional -potential en-
of minima and transition states and applied it ergy surface of a molecule, reducing the high
to the determination of reaction paths for the frequency complexity of the surface and mak-
ing it much easier to search for minimum en-
conformational transition of a tetrapeptide
ergy conformations. This concept was first
(86). Huston and Marshall (87) used this ap-
used to deform the conformational potential
proach to map the reaction coordinates of the
energy surface in the diffusion equation
a- to 3,,-helical transition in model peptides.
method (DEM) of Piela and coworkers (89).
Despite the limitations that curtail exact Search procedures will not confront multiple
quantitative applications, molecular mechan- local minima on the deformed surface. If the
ics can provide three-dimensional insight as procedure is reversed iteratively, then one can
the geometric relations between molecules are trace the path back into a region that lies near
adequately represented. Electrical field poten- the global minimum of the undeformed poten-
tials can be calculated and compared to give a tial surface. Ponder et al. (88, 90) improved
qualitative basis for rationalizing differences the procedure for tracing back from one par-
in activity. Molecular modeling and its graph- tially deformed surface to the next by includ-
ical representation allow the medicinal chem- ing a local search procedure to limit detection
ist to explore the three-dimensional aspects of of false minima.
molecular recognition and to generate hypoth- One of the best known benchmark prob-
eses that lead to design and synthesis of new lems for conformational search involves the
ligands. The more accurate the representation determination of the low energy conforma-
of the potential surface of the molecular sys- tions of the highly flexible cycloheptadecane
tem under investigation, the more likely that (91, 92). This system continues to serve as a
the modeling studies will provide qualitatively test for newly developed search methods (93).
correct solutions. Although not a particularly large molecule,
2.1.3.1 Optimization. The search for the
this system is a challenge because of its flexi-
bility and the close energy spacing of the lower
optimal solution to a complex problem is com-
lying minima. Extensive analysis through a
mon to many areas in science and engineering
variety of search methods has located ex-
and does not have a general solution. Numer-
actly- 263 minima within 3.0 kcal/mole of the
ous approaches to this problem, which is gen- purported global minimum. The potential
erally referred to as optimization, have been smoothing search (PSS) (88) was dramatically
used in chemistry: most commonly, distance effective at locating many of the lowest energy
geometry, molecular dynamics, stochastic structures for cycloheptadecane. Although the
methods such as Monte Carlo sampling, and global minimum for cycloheptadecane was not
systematic, or grid, search. Most rely on min- located, the second lowest energy structure
imization, often combined with a stochastic was located and differed by only 0.01 kcall
search. Minimization algorithms have been mole. Based on its MM2 vibrational frequen-
thoroughly characterized with regard to their cies, the global minimum is entropically disfa-
convergence properties, but, in general only vored relative to all of the minima located by
locate the closest local minima to the starting the smoothing procedure. The PSS method
geometry of the system. A stochastic approach was also applied to obtain the minimum en-
to starting geometries can be combined with ergy conformation of the TM helix dimer of
minimization to find a subset of minima in the glycophorin A (GpA) (941, previously solved by
hope that the global minimal is contained solution NMR spectroscopy (95).
2.1.3.3 Genetic Algorithm. Another ap- in this case internal energy, to be numerically
proach to global optimization is the genetic evaluated by molecular mechanics. Each chro-
algorithm. This approach is based on biologi- mosome in the population is evaluated for its
cal evolution and is analogous to natural selec- internal energy and a subset of the more fit
tion (96-98).In applications to computational selected for reproduction. The degree of limi-
chemistry, evolution on the computer has tation on reproductive fitness is analogous to
been shown to be an efficient approach to the selective pressure brought to bear on a
global optimization, although because of sam- population (i.e., selection of the fittest). This is
pling issues, there is no guarantee that the a parameter that can be varied in most GA
global optimum has been found in any partic- programs and one must balance selective pres-
ular application (99). sure against maintaining some variation in
2.1.3.3.1 Characteristics of the Genetic A/- the population for evolution to occur (to avoid
gorithrn. In analogy to natural selection, the being trapped in a local minimum). The set of
parameters to be optimized are encoded in a chromosomes to be reproduced can be based
bit string and strung together in a "chromo- on some arbitrary criteria (the top 50%), all
some." Each chromosome in the population those with fitness at least half that of the most
represents a particular genotype or solution to fit chromosome detected, or the fitness scaled
the problem under consideration (i.e., a spe- in some way and chromosomes reproduced in
cific set of values for the parameters that de- proportion to their scaled fitness.
termine the configuration of the system under Given a subset of chromosomes to repro-
study). The values of the parameters have to duce, several operations analogous to evolu-
be decoded for the "fitness" of a particular ge- tion are invoked. First is mutation, where a
notype to be evaluated. Once the fitness of certain number of randomly selected bits are
each chromosome in the population has been mutated from 0 to 1 or vice versa in the daugh-
evaluated, then the more "fit" members are ter chromosome. This would allow for changes
allowed to reproduce, mutate, or cross over in the settings of one or more torsional angles.
with other members of the parent population A certain number of pairs of chromosomes are
to generate a new daughter population. This also selected for crossover and one or more
process is repeated until the fitness of the pop- locations between genes (if specified) are ran-
ulation converges, or until the available com- domly selected and the two pieces derived .
puter cycles are consumed. from each parent chromosome swapped, to
2.7.3.3.2 Example of Conformational Analy- generate two or more novel chromosomes.
sis. The simplifying assumption of rigid geom- This would allow for different subsets of con-
etry is used to reduce the computational formations to be combined; this provides a
complexity of the model problem of conforma- mechanism for concerted changes or jumps
tional analysis. The elimination of variables is over barriers to find minima that would be
rationalized based on the high energy cost as- difficult to sample by mutation alone. This
sociated with bond length distortions and the would appear to be the feature that provides
ability to accommodate bond angle deforma- the analogous behavior to simulated anneal-
tions by a reduced set of van der Wads radii. ing in efficient searching of parameter space.
To represent the conformation of a molecule, In this case, however, the search is more di-
one needs only to specify the values of the tor- rected by the selective pressure of increasing
sional angles associated with rotatable bonds. the "fitness" or facing elimination from the
One can assign a set number N of bits, 6 for population. In other words, each new genera-
example, to represent 2N values for the tor- tion should have eliminated a significant por-
sional angles. Each set of 6 bits can be consid- tion of the less fit members of the previous
ered a "gene" and crossover allowed only at generation and propagated those torsional
gene boundaries, if desired. Thus, the confor- values that generate good local conforma-
mation of a molecule can be encoded as a set of tional states.
torsional genes. The actual coordinates of the 2.1.3.3.3 Schema and the Building Block
molecule corresponding to each genotype Hypothesis. Once a population of good local
must be generated for the fitness function F, substates has been established, then crossover
can probe the combination of these subconfor- (97) to generalize the process of crossovers
mations that have positive interactions lead- without requiring customized crossover oper-
ing to more fit progeny. In the jargon of com- ators that are problem specific, although this
puter science, the subpattern of 1's and 0's is beyond the scope of this chapter.
giving a preferred subconformation would be a 2.1.3.3.6 Examples o f Applications to Bio-
schema (or building block). According to the chemical Problems. McGanah and Judson
most accepted theory, the building block hy- (100) explored the impact of different param-
pothesis, the genetic algorithm initially de- eters setting on the ability of the GA to explore
tects biases toward fitness in lower order the conformational space of cyclo(Gly,). Each
(fewer identical bits) schemas and converges residue was represented by four angles, each
on this part of search space (the entire set of with a string of four bits (1116 of range). A
bit strings). By combining information from selection fraction of 50% was used, which
lower order schema through crossovers, biases eliminated the lower half in fitness from re-
in higher order schemas are detected and production. Population sizes of 10,50, and 100
propagated. were tested. Each group was divided into four
The strong convergence property of the ge- niche populations with communication be-
netic algorithm is a major attraction. Given tween groups. Local minimization was per-
sufficient members of the population and suf- formed for each chromosome before evalua-
ficient evolutionary time (number of generation. They- concluded that it was of little use to
tions), then one can expect convergence if the examine a population size of less than 100
fitness function is based on the optimal com- members for the 24 variables examined. As
bination of locally optimized substructures. soon as convergence in the average is detected
Some fitness functions are termed "decep- in a population, it should be cross-fertilized
tive," in that low order schemas are not from another niche or GA evolution should
present in higher order schemas and their terminate. It is a clear example of a hybrid
propagation slows detection of the more fit approach, in which GA does a rough search for
higher order schemas. Another problem arises minima and local minimization to find the
when the population size is too small or the closest local minimum.
selection factor too high. Then, the genetic al- Judson et al. (101) examined the use of a
gorithm can magnify a small sampling error genetic algorithm to find low energy conform-
and prematurely converge in a local optimum. ers of 72 small to medium organic molecdes
2.1.3.3.4 Mutations and Encoding. There (1-12 rotatable bonds) whose crystal struc-
are different ways to encode binary numbers tures were known. They used the elitist strat-
by bit strings and these can have some influ- egy, in which the best individual from each
ence on the impact of mutation. Traditional generation is propagated without modifica-
binary encoding requires that all bits be tion. A population size of 10 times the number
changed for some cases if the digital value is to of the nonring dihedral angles being varied
be simply incremented. This causes erratic be- was chosen. Each molecule was allowed to run
havior near an optimum, with mutation and for 10,000 energy evaluations, or until the
mutations in higher order bits having more population was bit converged. In a few cases,
effect than in lower order bits. conformers with lower energies than those ob-
2.1.3.3.5 Crossovers and Encoding. In our served in the crystal structure were found. A
example, we indicated that one might want to comparison with CSEARCH in SYJ3YL (Tri-
separate the bit string into genes correspond- pos, Inc.) was made, but the differences in ef-
ing to torsional angles because the gene has a ficiencies found were not compelling. In only 9
coherent meaning in the context of the prob- of the 72 cases examined. did the GA find its
lem. If one restricts crossovers to the junctions best conformer had energy greater than the
between genes, then the coherence of the con- crystal structure, with the largest deviation
formation of molecular fragments is preserved being only 0.8 kcallmol.
and one is more likely to make a successful The GA approach has also been applied to
crossover producing more fit offspring. There the docking problem with dihydrofolate reduc-
are methods such as random-key encoding tase, arabinose binding protein, and sialidase
(98).A typical run took minutes on a worksta-

tion and the predicted conformations agreed
with those observed crystallographicallyin all
cases. Meadows and Hajduk (102) used exper-
imental constraints with a GA algorithm to
dock biotin to stepavidin. Judson et al. (101)
also reported docking of flexible molecules
into the active sites of thermolysin, car-
boxypeptidase, and dihydrofolate reductase. H 0.91 (1.08)
In 9 of the 10 cases examined, the GA found Peptide bond
conformations within 1.6 A root-mean-square
Figure 3.2. Calibrated set of van der Wads radii
(rms) of the relaxed crystal conformation. for peptide backbone for use with rigid geometry
This approach has also been used in the approximation (109). Usual radii shown in paren-
PRO LIGAND de novo design program (103) theses. Carbonyl carbon not modified.
to optimize the structure of ligands for a bind-
ing site. A set of candidate structures was gen-
erated and then crossover between molecular based on the high energy cost associated with
fragments used to optimize the predicted bond-length distortions and the ability to ac-
binding mode. This is similar to the SPLICE commodate bond-angle deformations by a re-
program of Ho and Marshall (104) that evolves duced set of VDW radii. This approach is com-
ligands with more favorable interactions with patible with problems where one is most
a given site. interested in eliminating conformations that
Payne and Glen (105) studied several dif- are energetically unlikely (i.e., sterically disal-
ferent aspects of molecular recognition with lowed) because of VDW interactions, which
genetic algorithms. Conformations and orien- cannot be relieved by bond-angle deformation.
tations were determined which best-fit con- A successful application requires that one cal-
straints such as inter- or intramolecular dis- ibrate an appropriate set of VDW radii for the
tances, electrostatic surface potentials, or particular application area. Iijima et al. (109)
volume overlaps with up to 30 degrees of free- calibrated such a set (Fig. 3.2) for peptide ap-
dom. plication by comparison with experimental .
crystallographic data from proteins and pep-
2.1.4 Svstematic Search and Conforma- tides.
tional Analysis. Because of the convoluted na- 2.1.4.2 Combinatorial Nature of the Prob-
ture of the potential energy surface of mole- lem. Using the rigid geometry assumption,
cules, minimization usually leads to the one can analyze the combinatorial complexity
nearest local minimum (106,107) and not the of a simplified approach to the problem with
global minimum. In addition, many problems some ease. Let us assume a molecule (Fig. 3.3)
in structure-activity studies require geometric of N atoms with T torsional degrees of freedom
solutions that may not be at the global mini- (i.e., rotatable bonds). For each torsional de-
mum of the isolated molecule. To scan the po- gree of freedom T, explored at a given angular
tential surface with some surety of complete- increment in degrees A, there are 360/A values
ness, systematic, or grid, search procedures to be examined for each T. This means that
have been developed. To understand the (360/A)T sets of angles, each describing a
strengths and limitations of this approach, unique conformation, need to be examined for
some of the algorithmic details must be con- steric conflict. For each conformer, the start-
sidered. These are discussed in depth in a re- ing geometry will have to be modified by ap-
view by Beusen et al. (108). plying the appropriate transformation matri-
2.1.4.1 Rigid Geometry Approximation. A ces to different subsets of atoms to generate
simplifying assumption that is usually in- the coordinates of the conformation. For each
voked to reduce the computational complexity conformation, N(N - 1112distance determina-
of the problem through elimination of vari- tions will have to be calculated to a first ap-
ables is that of rigid geometry. The rationale is proximation (this does not exclude bonded at-
Figure 3.3. Schematic diagram

of molecule with N atoms and T
rotatable bonds.
oms and atoms bonded to the same atom from formation. For linear molecules, there are n -
the check, which is necessary) and checked 1 bonds and the number of 1-3 interactions
against the allowed sum of VDW radii for the d e ~ e n don
s the valence of the atom. This sim-
two atoms involved. The number of VDW com- plication leads to a reduction of the number
parisons V is given by of VDW checks by the factor N(N - 1)/2,which
is multiplied by the number of conformations.
How can one reduce the number of confor-
mations that have to be checked? Here the
It should be clear that the VDW comparisons concept of construction becomes useful. One
are the rate-limiting step by their sheer constructs the conformations in a stepwise
number, and any algorithmic improvement fashion, starting with an initial aggregate and
that reduces the number of such checks or adding a second aggregate at a given torsional
enhances the efficiency of performing such increment for the torsional variable T that is
checks is of value. applied to the rotatable bond connecting the
2.1.4.3 Pruning the Combinatorial Tree. two. If any pair of atoms overlaps for that in-
From this simplified analysis, a systematic crement, then one can terminate the construc-
search of other than the smallest molecules at tion because no addition operation will reli'eve
a coarse increment would appear daunting. A that steric overlap. In effect, one has trun-
hybrid approach with a coarse grid search fol- cated the combinatorial possibilities that
lowed by minimization has been successfully would have included that subconformation;
used to locate minima. There are a number of that is, one has pruned the combinatorial tree.
algorithmic improvements over the "brute 2.1.4.4 Rigid Body Rotations. If one con-
force" approach that enhances the applicabil- structs the &oleculestepwise by the addition
ity of the systematic search itself. To under- of aggregates, then one has two sets of atoms
stand these improvements, some concepts to consider. First are those in the partial mol-
need to be defined. First is the concept (110) of
aggregate, a set of atoms whose relative posi-
tions are invariant to rotation of the T rota-
tional degrees of freedom. n-Butane is divided
into aggregates as an illustration (Fig. 3.4).
In this simple example, the atoms in an ag-
gregate are all either directly bonded or have a
1-3 relationship (i.e., are related by a bond
angle). Because of the rigid geometry approx-
imation, their relative positions are fixed. At-
oms contained within the same aggregate do
not, therefore, have to be included in the set of Figure 3.4. Decomposition of n-butane molecule
those that undergo VDW checks for each con- into aggregates.
2 ~ackgroundand Methods
Figure 3.6. Scheme for combining systematic

Figure 3.5. Distance between atoms (1-7) and search with analytical solution for closure. Bonds
atom 10 separated by a single rotatable bond T can indicated by arrows were systematically scanned,
be described with a transformation of the equation whereas those indicated by A were analytically de-
of a circle describing the locus of atom 10 as bond T termined. Dotted bond can represent either chemi-
is rotated. Notice that distance D between any atom cal bond or experimental distance determination
(1-7) and center of circle of rotation of atom 10that (NOE, etc.).
is on axis of rotation is fixed, regardless of value of T.
torsional circle from consideration for other
atom pairs. If all segments of the torsional
ecule (set A), previously constructed, that circle are disallowed by combinations of the
have been found to be in a sterically allowed angular requirements of different atom
partial conformation. For each possible addi- pairs, then the partial conformation of the
tion of the aggregate, the atoms of the aggre- molecule is disallowed because further con-
gate (set B) must be checked against those in struction is not feasible. As a first approxi-
the partial molecule. If one uses the concept of mation, this removes a degree of torsional
a rigid body rotation, then one can describe freedom from the problem, reducing T to
the locus of possible positions of any atom in T - 1 torsional degrees of freedom. At a 10"
set B as a circle whose center lies on the axis of torsional scan, an approximate reduction in
rotation Ti(the interconnectingbond) at a dis- computational complexity of a factor of 36
tance along the axis that can be calculated. results.
The formula for a circle can be transformed to
represent the possible distances between the 2.1.4.5 The Concept and Exploitation of,
atom b in set B and any atom a in set A, as Rings. Realization that many of the relevant
shown in Fig. 3.5. An equation with scalar co- constraints in chemistry can be expressed as
efficients that describes the variable distance interatomic distances, VDW interactions, nu-
between two atoms as a function of a single clear Overhauser effect constraints, and so
torsional variable was derived ( I l l ) , which forth allows use of the concept of a virtual ring
has a discriminant whose evaluation can be in which the constraint forms the closure
used to determine whether atom a and atom b bond. Small rings up to six members can be
will: solved analytically (112), so that one can
search the torsional degrees of freedom asso-
r he in contact, despite changes in the value of ciated with a constraint until only five remain
the torsional rotation of the aggregate, and then solve the problem analytically (Fig.
which implies that the current partial con- 3.6). The torsional angles for those degrees of
formation has to be discarded, given that freedom are no longer sampled on a grid, thus
there is no possible way to add the aggregate removing the problem of grid tyranny, in
that is sterically allowed; which valid conformations are missed by the
never come in contact for any value of the choice of increment and starting conforma-
torsional rotation, so that this pair of atoms tion. This approach is then a hybrid because
can be removed from consideration regard- only part of the conformational space is
ing this aggregate; or searched with regular torsional increments. It
0 come in contact for some values of the tor- is, however, much more efficient to solve a set
sional rotation that can be calculated for of equations than search 5 torsional degrees of
that pair and that removes a segment of the freedom.
I
I I I I
-150 150
Phi
Figure 3.7. (a) Two-dimensional (Ram-

achandran) plot of energy vs. backbone tor-
sional angles, @ and T, for N-acetyl-valine-
methylarnide. (b) Three-dimensional plot
and XI,
of energy vs. torsional angles, @, 9,
for N-acetyl-valine-methylamide.
2.1.4.6 Conformational Clustering and @, ? is known as a Ramachandran plot. When

families. In a congeneric series, the corre- more than three torsional variables become
spondence between torsional rotation vari- necessary to define the conformation of the
ables is maintained as one compares mole- molecule under consideration, then multiple
cules, and a direct comparison of the values plots become necessary to represent the vari-
allowed for one molecule with those allowed ables. Unless special graphical functions are
for another is meaningful. Two- (2D) or three- included in the software, then correlations be-
dimensional (3D) plots (Fig. 3.7) of torsional tween plots become difficult, given that each
variables against energy often provide consid- plot is a projection of a multidimensional
erable insight into the difference in conforma- space. One approach to this problem is to use
tional flexibility between two molecules. Such cluster analysis programs to identify those
a plot of the peptide backbone torsional angles values of the multidimensional variables that
= n = 96*
are adjacent in N-space. The clusters of con-

Figure 3.8. Cycloalkane rings and
number of local minima found by vari-
ous search strategies. n, number of con-
formers with MM2 (117); parentheses,
number of conformers with MM3 (117);
#, number of conformers within 25 kJ1
mol of global minima [MM2 (9211; *,
number of minima found within 3 kcall
mol of global minima (115).
energy minima. Mapping the energy surface of

formers that result have been referred to as - in isolation to determine the low
the ligand
families. A member of a family is capable of energy minima will, at the very least, provide a
being transformed into another conformer be- set of candidate conformations for consider-
longing to the same family without having to ation, or as starting points for further analy-
pass over an energy barrier; that is, the mem- ses. The problem of finding the global mini-
bers of a family exist within the same energy mum on a complicated potential surface is .
valley. common to many areas, and lacks a general
Because of the combinatorial nature of sys- solution. Minimization .~rocedureslocate the
tematic search, one is often faced with large closest local minimum depending on the start-
numbers of conformers that have to be ana- ing conformation. Several strategies have de-
lyzed. For some problems, energetic consider- veloped to map the potential surface and lo-
ations are appropriate and conformers can be cate minima. For an excellent overview of the
clustered with the closest local minimum, pro- different approaches, the reader is referred to
viding to a first approximation an estimate of the surveys by Leach (113) and by Burt and
the entropy associated with each minima by Greer (114). Stochastic methods such as
the number of conformers associated, in that Monte Carlo have been advocated (115) for
they can come from a grid search that approx- conformational analysis and their usefulness
imates the volume of the potential well. A sin- demonstrated on carbocyclic ring systems (91,
gle conformer, perhaps the one of lowest en- 115-121) (Fig. 3.8). Molecular dynamics can
ergy, can be used with appropriately adjusted be used to explore the potential energy sur-
error limits in further analyses as representa- face, often with simulated annealing to help
tive of the family. overcome activation-energy barriers, but ex-
2.1.4.7 Conformational Analysis. Although ploration is concentrated in local minima and
interaction with a receptor will certainly per- duplication of the surface explored is con-
turb the conformational energy surface of a trolled by Boltzmann's law. A systematic, or
flexible ligand, high affinity would suggest grid, search samples conformations in a regu-
that the ligand binds in a conformation that is lar fashion, at least in the parameter space
not exceptionally different from one of its low (usually torsional space) that is incremented.
Comparisons of a variety of methods were with solvent. If more configurations of the sur-
made on cycloheptadecane by Saunders et al. rounding solvent molecules of equivalent en-
(91) and it was concluded that the stochastic ergy were available to the staggered than to
method was most efficient. In one of the few the eclipsed, then the staggered would have a
independent comparisons of the effectiveness higher statistical weight. From the inscription
of these procedures, Boehm et al. (122)studied on Boltzmann's tomb, we all recall that S = k
the sampling properties on the model system In W, where S is the entropy and k is Boltz-
caprylolactam, a nine-membered ring, and mann's constant. Thus, we have a link be-
concluded that systematic search was both in- tween statistics and thermodynamics. W in
efficient and ineffective at finding the minima this case would be the number of configura-
found by the other methods when the number
tions associated with the particular conforma-
of conformers examined was limited.
tion of ethane under consideration divided by
2.1.4.8 Other Implementations of System-
the total number of configurations sampled.
atic Search. Numerous other implementa-
tions of systematic, or grid, search programs This would have to be weighted by their en-
exist in the literature and those with protein ergy, of course, unless the distribution was al-
applications have been reviewed by Howard ready Boltzmann weighted, as happens when
and Kollman (123), whereas those for small or one uses the Metropolis algorithm (127).
medium sized molecules are included in the Another way of stating this is that the prob-
reviews by Burt and Greer (114) and by Leach ability Piof a particular configuration Ni is
(113). One of the more widely used programs proportional to its Boltzmann probability di-
in organic chemistry, MACROMODEL, has a vided by the Boltzmann probability of all the
search module (124) coupled to energy mini- other configurations or states:
mization for conformational analysis. MAC-
ROSEARCH has been developed by Beusen et
al. (125) to generate the set of conformers con-
sistent with experimental NMR data and used
to determine the conformation of a 15-residue
peptide antibiotic.
The denominator in this equation has been
2.1.5 Statistical Mechanics Foundation (126). given a special name, partition function, 6ften
To understand the relationships between the symbolized by Z, which is derived from the
simulation methods and the desired thermo- German Zustandsumme (sum over states).
dynamic quantities, a short review of the ma- The successive terms in the partition function
jor concepts of statistical mechanics may be in describe the partition of the configurations
order. This is not meant to be comprehensive, among the respectives states available. One
but rather to remind the reader of the relevant can express the thermodynamic state func-
ideas. tions of an ideal gas in terms of the molecular
The set of configurations generated by the partition function Z as follows:
Monte Carlo simulation generates what J.
Willard Gibbs would call an "ensemble," as-
suming that the number of molecules in the
simulation was large and the number of con-
figurations was also large. This ensures that where N is the number of molecules and U is
the possible arrangements of molecules that the internal energy. From this and the as-
are energetically reasonable have been ade- sumption of an ideal gaspV = NkT, the Gibbs
quately sampled. One is often interested in the free energy G = U - TS + pV leads to
statistical weight Wof a particular observable.
For example, a particular conformation of a G = -NkT in ZIN
solute molecule, say, the staggered rotamer of
ethane, could be compared with another con- and similarly, the Helmholtz free energy A =
former, the eclipsed rotamer, in a simulation U - TS leads to the expression
by evaluation of the energy of the system using

the appropriate force field. From physics,
all of which may be more familiar if expressed
in terms of enthalpy, H = U +pV.
In summary, by simulating a relevant sta-
tistical sample of the possible arrangements of where F is the force on the atom, m is the mass
molecules when interacting, one can derive of the atom, a is the acceleration, V is the po-
the macroscopic thermodynamic properties by tential energy function, and r represents the
statistical analysis of the results. In this case, cartesian coordinates of the atom. Using the
one is deriving the partition function not by fist derivative of the analytical expression for
theoretical analysis of the quantum states the force field allows the calculation of the
available to the molecule, but through simula- force felt on any atom as a function of the po-
tion. In other words, the average properties sition of the other atoms.
are valid if the Monte Carlo or molecular dy- 2.1.6.1 integration. In this simulation, we
namics trajectories are ergodic, that is, con- use numerical integration; that is, we choose a
structed such that the Boltzman distribution small time step (smaller than the period of fast-
law is in accord with the relative frequencies est local motion in the system) such that our
with which the different configurations are simulation moves atoms in sufficiently small in-
sampled. (An ergodic system is by definition crements, so that the position of surrounding
one in which the time average of the system is atoms does not change significantly per incre-
the same as the ensemble average.) A basic
mental move. In general, this means that the
concept in statistical mechanics is that the time increment is on the order of 10-l5 s (1fem-
system will eventually sample all configura- tosecond). This reflects the need to adequately
tions, or microscopic states, consistent with represent atomic vibrations that have a time
the conditions (temperature, pressure, vol- scale of 10-l5 to 10-l1 s. For each picosecond of
ume, other constraints) given sufficient time. simulation. we need to do 1000 iterations of the
In other words, a trajectory of sufficient length simulation. For each iteration, the force on each
(in time) would sample configuration space. atom must be evaluated and its next position
calculated. For simulations involving molecules
2.1.6 Molecular Dynamics (37, 126, 128). in solvent. sufficient solvent molecules must be
Molecular dynamics is a deterministic process included, so that the distance from any atom in
based on the simulation of molecular motion the solute to the boundary of the solvent is
by solving Newton's equations of motion for larger than the decay of the intermolecular in-
each atom and incrementing the position and teraction between the solute and solvent mole-
velocity of each atom by use of a small time cules. This requires several hundred solvent
increment. If a molecular mechanics force molecules for even small solutes, and the com-
field of adequate parameterization is available putations to do a single iteration are sufficiently
for the molecular system of interest and the large that simulations of more than several hun-
phenomenon under study occurs within the dred picoseconds for proteins with explicit sol-
time scale of simulation, this technique offers vent are still rare. Efforts to increase the time
an extremely powerful tool for dissecting the step and thus allow for longer simulations with-
molecular nature of the phenomenon and the out sacrificing the accuracy of the methodology
details of the forces contributing to the behav- are under investigation. Combination of normal
ior of the system. mode calculations with explicit numerical inte-
In this paradigm, atoms are essentially a gration allows time steps up to 50 ps for model
collection of billiard balls, with classical me- systems (129). A similar approach has been
chanics determining their positions and veloc- shown effective by Schlick and Olson (130) in
ities at any moment in time. As the position of modeling supercoiling of DNA.
one atom changes with respect to the others, Let us attempt a rough trajectory through
the forces that it experiences also change. The molecular dynamics. We have a system of N
forces on any particular atom can be calculated atoms obeying classical Newtonian mechan-
Molecular Modeling in Drug Desig11
ics. In such a system, we can represent the Vi(t+ At121 . AT, to the original position Vi(t).
total energy E,,, as the sum of kinetic energy By staggering the evaluation of the velocity
E,, and potential energy V,,,: and force calculations by Atl2, an improve-
ment in the simulation performance is ob-
Etot ( t )= Ekin ( t ) + Vpot( t ) tained.
2.1.6.2 Temperafure. For simulations that
where the potential energy is a function of the can be compared with experimental results,
coordinates, V , = f(ri)for atoms i to N and ri one must be able to control the temperature of
represents cartesian coordinates of atom i; and the simulation. The temperature of a system is
the kinetic energy depends on the motion of the a function of the kinetic energy, E,,,(t):
atoms:
where k is Boltzmann's constant.

One can perform molecular dynamics sim-
where Mi is the mass of atom i and Vi is the
ulations, at a constant temperature Tc, by
velocity of atom i.
scaling all atomic velocities Vi(t)at each step
The energy undergoes constant redistribu-
by a factor t derived from
tion because of the movements of the atoms, re-
sulting in changes in their positions on the po-
tential surface and in their velocities. At each 6T(t)/6t = [ T , - T ( t ) ] l t
iteration (t +t + At), an atom i moves to a new
position [ri(t)+ri(t + At)],and it experiences a where Tc is the desired temperature.
new set of forces. The basic assumption is that 2.1.6.3 Pressure and Volume. Depending
the time step At is sufficiently small that the on the simulation that one desires to accom-
position of atom i at t + At can be linearly ex- plish, either the pressure or volume must be
trapolated from its velocity at time t and the maintained constant. Constant volume is the
acceleration resulting from the forces felt by easiest to perform because the boundaries of
atom i at time t. If At is long enough for the the system are maintained with all molecules
atoms surrounding atom i to change their posi- confined within those boundaries and the
tion so that the forces felt by atom i will change pressure allowed to change during the simula-
during At, then the approximation is not valid tion.
and the simulation will deviate from that ob-
served with a shorter At. f i r each atom is 2.1.7 Monte Carlo Simulations. The Monte
moved, the forces on the &st atom based on the Carlo method (126)is based on statistical me-
new positions of the other N - 1 atoms can be chanics and generates sufficient different con-
recalculated and a new iteration begun. Several figurations of a system by computer simula-
algorithms exist for numerical integration. The tion to allow the desired structural, statistical,
ones by Verlet and Gear are in common use, and thermodynamic properties to be calcu-
with the one by Verlet being computationally lated as a weighted average of these properties
more efficient (126).A variant of the Verlet al- over these configurations. The average value
gorithm in common use is called the leapfrog (X)of the property X can be calculated by the
algorithm. The calculation of the velocity is done following formula:
at t - Atl2, whereas the calculation of the force
occurs at t to derive the new velocity at t = At12.
In other words,
The atomic position of atom i is calculated by

adding the incremental change in position,
Figure 3.9. Schematic diagram of simulation with periodic boundary conditions in which adjacent
cells are generated by simple translations of coordinates.
where N is the number of configurations, Eiis the results. To approximate an "infinite" li6-
the energy of configuration i, k is Boltzmann's uid, one can surround the box of molecules by
constant, and T is temperature. simple translations to generate periodic im-
If we have sufficiently sampled the possible ages. Each atom in the central box has a set of
arrangements of molecules in the simulation related molecules in the virtual boxes sur-
and have an accurate method to calculate rounding the central one (Fig. 3.9). The en-
their energy E, then the above formula will ergy calculations for pairwise interactions
give a Boltzmann weighted average of the consider only the interaction of a molecule, or
property X. its "ghost," with any other molecule, but not
In practice, one must compromise the num- both. In practice, this is accomplished by lim-
ber of molecules in the simulation and/or the iting pairwise interactions to distances less
number of configurations calculated to con- than one-half the length of the side of the box.
serve computer cycles. Two essential tech- Real concerns often arise regarding conver-
niques that are utilized are periodic boundary gence of electrostatic terms because of the lin-
conditions and sampling algorithms, which we ear dependency on distance.
discuss separately. For any large nontrivial system, the total
Although it is important to minimize the number of possible configurations is beyond
number of molecules in either Monte Carlo or comprehension. Consider a set of protons in a
molecular dynamics simulations for computa- magnetic field: the magnetic moments can be
tional convenience, surface effects at the in- either aligned with or opposed to the magnetic
terface between the simulated solvent and the field. For only 50 protons, there are 250 com-
surrounding vacuum could seriously distort binations, which is a large number. For a
small cyclic pentapeptide, there are poten- tropolis et al. (127). One essentially uses a
tially 36'' conformations if one considers a 10" Markov process in which the current config-
scan of the torsional variables @, V.Clearly, uration becomes the basis for generating the
some of these are energetically unreasonable next.
because the conformation requires overlap of
two or more atoms in the structure. Monte 1. A molecule in the current configuration is
Carlo simulations are successfully performed chosen at random and its degrees of free-
by sampling only a limited set of the energeti- dom randomly varied by small increments.
cally feasible conformations, say, lo6 out of
2. The energy of the new configuration is
10lo0theoretical possibilities. The reason for
evaluated and compared with that of the
this success is that the Monte Carlo schemes
sample those states that are statistically most starting configuration.
important. One could sample all states, calcu- 3. If the new energy is lower, the new config-
late the energy of each, and then Boltzmann- uration is accepted and becomes the basis
weight its contribution to the average. Alter- for the next random perturbation.
natively, one can ignore those states that are 4. If the energy is higher, E(new) >
energetically high so that they contribute lit- E(old), then a random number between 0
tle, if any, weight to the average, and concen- and 1 is generated and compared with
trate on those of low energy. In other words, exp{-[E(new) - E(old)])/kT. If the num-
we look only where there are reasonable an- ber is less, then the configuration is ac-
swers energetically. This is called importance cepted and the process continues by gen-
sampling, which is the key to the Monte Carlo erating a new configuration. If the
procedure. number is greater, then the configuration
One aspect shared by Monte Carlo meth-
is rejected and the process resumes with
ods and molecular dynamics is the ability to
the old configuration.
cross barriers. In the case of Monte Carlo,
barrier crossing occurs both by random se-
lection of variables and by acceptance of In this way, configurations of lower en-
higher energy states on occasion. Both ergy are accepted and the system eventually
methods require a n equilibration period to "minimizes" to sample the higher populated
eliminate bias associated with the starting lower energy configurations; at the sam'e
configuration. When one considers ran- time, higher energy configurations are in-
domly filling a box with molecules with arbi- cluded but only in proportion to their Boltz-
trary choices for position and orientation, it mann distribution, which is clearly a func-
should be obvious that most examples would tion of temperature of the simulation.
result in high energy, especially if the den- Because the configurations occur with a
sity of such a simulation is made to resemble probability depending on their energy and
that of a liquid in which adjacent molecules
proportional to the Boltzmann distribution,
are often in VDW contact. High energy con-
one can simply average thermodynamic
figurations contribute very little to the prop-
erties we are trying to evaluate because they properties over this distribution of configu-
are Boltzmann weighted. It is, therefore, ex- rations,
tremely inefficient to randomly calculate
configurations. One needs procedures, often
referred to as importance sampling, that se-
lectively calculate configurations that will
be representative of allowed states. In fact, if
one can guarantee that the energy of the where the sum covers the N configurations
configurations actually has a Boltzmann dis- generated. Because one often does not know
tribution, then one can simply average the an appropriate starting configuration, the
properties. In practice, this has been accom- initial part of the run may be used to "min-
plished by a n algorithm suggested by Me- imize," or equilibrate the system, and only
Figure 3.10. Estimation of

difference in affinity (AAG)
of the two anions C1- and
Br- for the cryptand SC24
[(a) structural formula; (b)
schematic of complex formed
with halide ion] as the pa-
rameters for C1- are slowly
mutated into those for Br-
in water (- - -) as well as in
the complex (-1. Used with
permission (138).
the latter part of the simulation analyzed

once the configurational energy has stabi- R+L + RL
lized.
A useful application has combined Monte
Carlo sampling with variable temperatures
(simulated annealing) to encourage barrier
crossing to optimize the docking of ligands
into active sites. Random displacements of Because the thermodynamic values of the two
rigid body translation and rotation (6 degrees states do not depend on the path between the
of freedom) and of internal torsional rotations states, one can write the following equation:
in a substrate within the binding site cavity
were performed with Metropolis sampling and AAA = difference in affinity of L and M for R
a temperature program. This procedure repro-
duced the crystallographically observed struc-
ture ofthe complex for several test cases (131).
By simulating the mutation of L into M, paths
2.1.8 Thermodynamic Cycle Integration A3 and A4, one can avoid the long simulation
(132-1 34). Thermodynamic cycle integration required for diffusion of the ligands, paths A1
is an approach that allows calculation of the and A2,into the receptor. One simply incre-
free energy difference between two states. In mentally modifies the potential functions rep-
this method, one takes advantage of the state- resenting ligand L to those representing li-
function nature of a thermodynamic cycle and gand M during the course of the simulation,
eliminates the paths of the simulation with making sure that the perturbations are intro-
long time constants (e.g., formation of a com- duced gradually and that the surrounding at-
plex requiring diffusion). As an example, the oms have time to relax from the perturbation
difference in affinity of two ligands (Land M) (Fig. 3.10). Either Monte Carlo (135) or molec-
for the same enzyme or receptor R is described ular dynamics simulations can utilize this
by the following thermodynamic cycle: technique. Many interesting applications have
appeared in the literature (132,134,136,137).

Their success appears directly related to sam-
pling problems and minimal perturbation of
the ligand to ensure equilibration.
2.1.9 Non-Boltzmann Sampling. There are

equivalent molecular dynamics and Monte
Carlo procedures that allow one to sample re-
gions of configuration space that are not min-
ima, transition states, for example. One can
generate a Monte Carlo trajectory for a system
E, that has energetics similar to that of the
Boltzmann system E,, with sampling in the
region associated with a transition barrier by
subtracting a potential V, to reduce the bar- Reaction Coordinate
rier:
Alternatively, one may want to obtain mean-

ingful statistics for a rare event without over-
sampling the lower energy states. This can be
accomplished by adding a potential W, which
is zero for the the interesting class of configu-
rations and very large for all others (Fig. 3.11):
The details of these sampling procedures that

allow one to focus on the aspect of the problem
of interest are the subject of a review by Bev-
eridge (133). Application of this approach to
Reaction Coordinate
determining conformational transitions in
model peptides (137,139,140) are exemplified Figure 3.11. Schematic diagrams of methods for
in the work of Elber's group on helix-coil (85, modifying the potential surface to allow adequate
86, 141), the Brooks group on turn-coil (142- sampling during simulations.
1461, and Huston and Marshall and Smythe et
al. (147,148) on helical transitions in peptides.
2.2 Quantum Mechanics: Applications about the nuclear positions and the electron
in Molecular Mechanics distribution of the molecules involved. At con-
Detailed discussion of quantum mechanics siderable computational cost, quantum me-
(149) is clearly beyond the scope of this review, chanics provides information about both nu-
and its applications to molecular mechanics clear position and electronic distribution.
and modeling will be briefly summarized. Mo- Molecular mechanics is built on the assump-
lecular mechanics is based on the laws of clas- tion that electronic interactions can be ade-
sical physics and deals with electronic interac- quately accounted for by parameterization.
tions by highly simplified approximations Although most of the systems of interest in
such as Coulomb's law. All forces operating in biology are too large for the direct application
intermolecular interactions are essentially of quantum mechanics, quantum mechanics
electronic in nature. Any effort to quantitate has at least three essential roles to play in drug
those forces requires detailed information design (149): (1) charge approximations, (2)
(a) Atom-centered charges

and bond dipoles
(b) Atom-centered dipoles ! "
(c) Atom-centered quadrupole
X+ z
0.4931 Figure 3.12. Different approaches

. I
to localization of charge used in elec-
trostatic models. (a) Atom-centered
0.4236 monopole; 6) atom-centered dipole;
H
and (c)atom-centered quadrapole.
characterization of molecular electrostatic po- butions for all the electrons in a molecule and
tentials, and (3) parameter development for then partitioning those distributions to yield
molecular mechanics. representations for the net atomic charges of
atoms in the molecule, either as atom-cen-
2.2.1 Parameterization of Charge. Esti- tered charges or as more complex distributed
mates of charges in molecular mechanics can multipole models (39,42) (Fig. 3.12).
be derived, in general, by application of one of 2.2.1.1 Atom-Centered Point Charges. In
the many different quantum chemical ap- the Mulliken population analysis, all the one-
proaches, either ab initio or semiempirically. center charge on an atom is assigned to that
Quantum mechanical methods are available atom, whereas the two-center charge is di-
for calculating the electron probability distri- vided equally between the two atoms in the
overlap (even if the electronegativities of the Williams (42) derived a procedure to derive
two atoms are quite dissimilar). The sum is the best fit to a given MEP with a defined set of
the gross atomic population, and the net monopoles, dipoles, and so forth.
atomic charge is simply this plus the nuclear Typically, fragments of molecules of inter-
charge. The result is very sensitive to the basis est are analyzed by ab initio techniques to gen-
set (the number of atomic orbitals) used. De- erate their MEPs that are the reference for
spite poor fit of the molecular electrostatic po- parameterization of charge. Besler et al. (152)
tential derived with point charges to the ab reported fitting of atomic charges to the elec-
initio electrostatic potential, or that derived trostatic potentials calculated by the semiem-
from a distributed multipole analysis (150), pirical methods AM1 and MINDO. The
widespread use continues because they do re- MINDO charges derived by fitting the MEP
flect chemical trends and are reportedly com- can be linearly scaled to agree with results de-
patible with known electronegativities. In ad- rived from ab initio calculations. Among the
dition, this option is commonly available in motivations for semiempirical methods are
software packages. Unfortunately, poor repre- the facts that semiempirical methods using
sentation of the electric field surrounding the high quality basis sets often yield better re-
molecule results from use of atom-centered sults than ab initio techniques employing min-
monopole models (42), even when more care- imal basis sets, and the significant reduction
ful methods are used to distribute the charge. in computational time in moving from ab ini-
2.2.1.2 Methods to Reproduce the Molecu- tio to semiempirical calculations. Rauhut and
lar Electrostatic Potential (MEP). The electro- Clark (153) used the AM1 wave function to
static potential surrounding the molecule that develop a multicenter point-charge model in
is created by the nuclear and electronic charge which each hybrid natural atomic orbital is
distribution of the molecule is a dominant fea- represented by two charges located at the cen-
ture in molecular recognition. Williams re- troid of each lobe. Thus, up to nine charges (4
views (42) methods to calculate charge models orbitals and 1 core charge) are used to repre-
to accurately represent the MEP as calculated sent heavy atoms. Results using this approach
by ab initio methods by use of large basis sets. aMirm the observations that distributed
The choice between models (monopole, dipole, charges are more successful than atom-cen-
quadrapole, bond dipole, etc., Fig. 3.12) de- tered charges in reproducing intermolecular
pends on the accuracy with which one desires interactions (154, 155).
to reproduce the MEP. This desire has to be
balanced by the increased complexity of the 2.2.2 Parameter Derivation for Force Fields.
model and its resulting computational costs Because molecular mechanics is empirical, pa-
when implemented in molecular mechanics. rameters are derived by iterative evaluation of
The first problem is to select points where computational results, such as molecular ge-
the MEP is to be evaluated and eventually fit- ometry (bond lengths, bond angles, dihedrals)
ted, the position of the shell outside the VDW and heats of formation, compared with exper-
radii of the atoms in the molecule, and the imental values (20). Lifson has coined the ex-
spacing of grid points on that shell. Sampling pression "consistent" for force fields in which
too close to the nuclei gives rise to anomalies structures, energies of formation, and vibra-
because the potential around nuclei is always tional spectra have all been used in parame-
positive. Singh and Kollman (151) report the terization by least-squares optimization. In
use of four surfaces at 1.4, 1.6, 1.8, and 2.0 the case of bond lengths, bond angles, and
times the VDW radii, with a density of one to VDW parameters, crystallography has pro-
five points per A'. This paradigm was reported vided most of the essential experimental data-
to give an adequate sampling to which the fit- base. Major efforts (156) to derive general sets
ted charges were fairly insensitive, at least at of parameters from quantum mechanical cal-
the higher values. An improved procedure, the culation have been made, especially for sys-
restrained electrostatic potential fit (RESP), tems for which adequate experimental data
was developed by Bayly et al. (41) to enhance are unavailable. Although quantum mechan-
transferability of the resulting point charges. ics is certainly adequate for initial approxima-
3 Known Receptors
tions of parameters and essential for charge mined by either X-ray crystallography or
approximations, a detailed analysis indicates NMR (12, 13, 166). The availability of the co-
that in vacuo calculations neglect many-body ordinates of all the atoms of the target sug-
effects and can be misleading. A major effort gests use of modeling of the site and interac-
by Hehre (personal communication) to derive tion with prospective ligands. Qualitative
parameters for water from extensive ab initio information can be discerned by simple exam-
calculations with large basis sets failed even to ination of complexes by the use of molecular
give a parameter set that reproduced the ra- graphics and improvement of known ligands
dial distribution for bulk water. Parameters made by searching for accessory binding inter-
derived from relevant experimental data in actions through ligand modification. This ap-
condensed phase (especially if available in the proach was pioneered by groups at Wellcome
solvent of theoretical interest) are generally Research Laboratories (167-169) in designing
more capable of accurately predicting results analogs of 2,3-diphosphorylglycerate (Fig.
because the many-body effects are implicitly 3.131, to modulate oxygen binding to hemoglo-
included in the parameterization. The basic bin, and at Burroughs-Wellcome (170), to en-
assumption is that these "effective" two-body hance affinity of dihydrofolate reductase
potentials implicitly incorporate many-body (DHFR) antagonists. When used in an itera-
interaction energies. tive fashion, novel compounds with improved
Jorgensen has parameterized by fitting affinity result (166, 171, 172). Quantification
properties of bulk liquids to Monte Carlo sim- of interactions and design of novel ligands re-
ulations to give the AMBERIOPLS force field quire application of molecular and statistical
(26, 157, 158).Conceptually, one is attracted mechanics to quantify the enthalpy and en-
by the use of liquids and their observable prop- tropy of binding. In other words, experimental
erties as constraints during the derivation of a measurements reflect free energies of binding
force field that is destined to study the proper- and both enthalpic and entropic contributions
ties of solvated molecules. must be estimated for prediction of affinities
as part of the design process. When combined
2.2.3 Modeling Chemical Reactions and De- with combinatorial chemistry and high
sign of Transition-State Inhibitors. In cases, throughput screening, rapid identification of
such as enzyme reactions, where chemical therapeutic candidates is feasible, as wit-
transformations occur, quantum chemical nessed in the case of factor Xa antagonists
methods must be used to deal with electronic (173) or TAR RNA inhibitors as possible HIV
changes in hybridization and bond cleavage drugs (174).
(159, 160). Hybrid applications (161-163) in 3.1 Definition of Site
which the reaction core is modeled quantum
mechanically and the rest by molecular me- The availability of three-dimensional struc-
chanics would appear a viable option. Alterna- tural information on a potential therapeutic
tively, the geometry of the transition state has target does not guarantee identification of the
been modeled by molecular mechanics, with site of action of the substrate, or inhibitor, un-
force constants derived from ab initio calcula- less the structure of a relevant complex has
tions that predict with amazing accuracy the been determined. In fact, conformational
relative selectivity of reactions. Andrews and changes often occur during binding of ligands
coworkers (164) pioneered modeling of transi- to enzymes that are not r'eflected in the three-
tion states (165) of enzymatic reactions to de- dimensional structure of the enzyme alone. 11-
sign transition-state inhibitors. lustrative examples are the major conforma-
tional changes seen (175,176) in HIV protease
on binding the inhibitor MVT-101 (Fig. 3.14)
3 KNOWN RECEPTORS and the changes in domain orientation ob-
served (177) in the complex of an anti-HIV
A significant challenge is the design of novel peptide antibody with the peptide. Until the
ligands for therapeutic targets in which the two P-strand flaps have been folded in, to com-
three-dimensional structure has been deter- plete the active site of HIV protease, many of
104 Molecular Modeling in Drug Design
Figure 3.13. Diphosphoglycerate (a) and analogs (b-d)designed to optimize interactions bound in
schematic model of hemoglobin. Used with permission (169).
the important interactions for recognition in One significant concern of structure-based

this proteolytic system have not been defined. design is the dynamics of the target itself. How
In other cases of therapeutic targets, allosteric stable is the active site to modifications in the
sites are involved in regulation of binding and ligand? Are there alternative potential bind-
cannot clearly be discerned from the crystal ing sites that could compete for the ligand?
structure available. Here NMR offers a highly The geometrical identity of serine protease
complementary approach where transfer and catalytic residues, for example, argues that
isotope-edited NOESas well as magic angle spin- the specificity essential for biological utility
ning NMR on solid samples can help identlfy ensures a relatively rigid three-dimensional
those residues of the therapeutic target (Fig. arrangement of functionality in the active site
3.15)involved in receptor interaction (178-180). that determines molecular recognition and
Known Receptors 105
Figure 3.14. Ribbon diagram of HIV-1 protease in the absence of inhibitor (a) and when bound to the
inhibitor MVT-10103). Diagrams based on crystal structures as reported by Miller et al. (175,176).
mimination. The active site has had no evo- accessible at room temperature may be diffi-
lutiionary pressure to optimize binding per se, cult to characterize experimentally because of
bu t rather rates of interaction and discrimina- relatively low abundance and/or lack of reso-
I tion among the limited repertoire of the bio- lution of the experimental techniques used.
log$calmilieu. One classic example (181)of dif- Computationally, they are problematic as well
fic1dty in interpretation of binding as a result because of the complexity of the energy sur-
of ligand modification occurred when an ana- face for a macromolecule.
1%;designed to bind to a specific site on hemo-
d o,bin actually found a more appropriate site 3.2 Characterization of Site
wil;hin the packed side chains of the protein
ma~lecule(Fig. 3.16).This example emphasizes 3.2.1 Volume and Shape. Most substrate-
thc? importance of protein dynamics. Alternate enzyme or receptor-ligand interactions occur
corlformations of the protein that are easily within pockets, or cavities, buried within pro-
Figure 3.15. Bound conformation of cyclosporin (a)as determined by NMR compared with solution
conformation (b) (178). Residues involved with interaction with cyclophilin are indicated on (a) in
bold.
teins. Inside these invaginations, a microenvi- of the relative distance paradigm allows c&-
ronment is established that favors desolvation parison without the need for orientation of
and binding of the ligand, despite the entropic one shape with respect to the other. Potential
cost of fixing the relative geometries of the two ligands are characterized in a similar fashion
molecules. Knowledge of the three-dimen- by generating a set of spheres that mimic the
sional structure of such cavities can assist the shape of the ligand. Matching the distance ma-
study of binding interactions and the design of trix of the cavity with that of a potential ligand
novel ligands as potential therapeutics. Sev- provides an efficient screen for selection of
eral algorithms to find, display, and character- complementary shapes. Voorintholt et al.
ize cavity-like regions of proteins as potential (184)used three-dimensional lattices to calcu-
binding sites have been developed. Kuntz et al. late density maps of proteins. In these maps,
(13, 183) described a program, DOCK, to ex- lattice points were assigned as a function of
plore the steric complementarity between li- the distance to the nearest atom. This tech-
gands and receptors of known three-dimen- nique is effective in delineating regions of low
sional structure. Using the molecular surface density where channels and cavities exist. Ho
of a receptor, a volumetric representation of and Marshall (185) implemented a search
the chosen binding cavity is approximated by function in CAVITY to allow the investigator
use of a set of spheres of various sizes that to isolate a single cavity of interest by specify-
have been mathematically "packed" within it ing a seed point. From this seed point, the al-
(Fig. 3.17). The set of distances between the gorithm systematically explored the entire
centers of the spheres serves as a compact rep- volume of the cavity, following its borders and
resentation of the shape of the cavity. The use effectively filling every crevice within it; that
3 Known Receptors
tions of parameters and essential for charge mined by either X-ray crystallography or
approximations, a detailed analysis indicates NMR (12, 13, 166). The availability of the co-
that in vacuo calculations neglect many-body ordinates of all the atoms of the target sug-
effects and can be misleading. A major effort gests use of modeling of the site and interac-
by Hehre (personal communication) to derive tion with prospective ligands. Qualitative
parameters for water from extensive ab initio information can be discerned by simple exam-
calculations with large basis sets failed even to ination of complexes by the use of molecular
give a parameter set that reproduced the ra- graphics and improvement of known ligands
dial distribution for bulk water. Parameters made by searching for accessory binding inter-
derived from relevant experimental data in actions through ligand modification. This ap-
condensed phase (especially if available in the proach was pioneered by groups at Wellcome
solvent of theoretical interest) are generally Research Laboratories (167-169) in designing
more capable of accurately predicting results analogs of 2,3-diphosphorylglycerate (Fig.
because the many-body effects are implicitly 3.131, to modulate oxygen binding to hemoglo-
included in the parameterization. The basic bin, and at Burroughs-Wellcome (170), to en-
assumption is that these "effective" two-body hance affinity of dihydrofolate reductase
potentials implicitly incorporate many-body (DHFR) antagonists. When used in an itera-
interaction energies. tive fashion, novel compounds with improved
Jorgensen has parameterized by fitting affinity result (166, 171, 172). Quantification
properties of bulk liquids to Monte Carlo sim- of interactions and design of novel ligands re-
ulations to give the AMBERIOPLS force field quire application of molecular and statistical
(26, 157, 158).Conceptually, one is attracted mechanics to quantify the enthalpy and en-
by the use of liquids and their observable prop- tropy of binding. In other words, experimental
erties as constraints during the derivation of a measurements reflect free energies of binding
force field that is destined to study the proper- and both enthalpic and entropic contributions
ties of solvated molecules. must be estimated for prediction of affinities
as part of the design process. When combined
2.2.3 Modeling Chemical Reactions and De- with combinatorial chemistry and high
sign of Transition-State Inhibitors. In cases, throughput screening, rapid identification of
such as enzyme reactions, where chemical therapeutic candidates is feasible, as wit-
transformations occur, quantum chemical nessed in the case of factor Xa antagonists
methods must be used to deal with electronic (173) or TAR RNA inhibitors as possible HIV
changes in hybridization and bond cleavage drugs (174).
(159, 160). Hybrid applications (161-163) in 3.1 Definition of Site
which the reaction core is modeled quantum
mechanically and the rest by molecular me- The availability of three-dimensional struc-
chanics would appear a viable option. Alterna- tural information on a potential therapeutic
tively, the geometry of the transition state has target does not guarantee identification of the
been modeled by molecular mechanics, with site of action of the substrate, or inhibitor, un-
force constants derived from ab initio calcula- less the structure of a relevant complex has
tions that predict with amazing accuracy the been determined. In fact, conformational
relative selectivity of reactions. Andrews and changes often occur during binding of ligands
coworkers (164) pioneered modeling of transi- to enzymes that are not r'eflected in the three-
tion states (165) of enzymatic reactions to de- dimensional structure of the enzyme alone. 11-
sign transition-state inhibitors. lustrative examples are the major conforma-
tional changes seen (175,176) in HIV protease
on binding the inhibitor MVT-101 (Fig. 3.14)
3 KNOWN RECEPTORS and the changes in domain orientation ob-
served (177) in the complex of an anti-HIV
A significant challenge is the design of novel peptide antibody with the peptide. Until the
ligands for therapeutic targets in which the two P-strand flaps have been folded in, to com-
three-dimensional structure has been deter- plete the active site of HIV protease, many of
overlap (even if the electronegativities of the Williams (42) derived a procedure to derive
two atoms are quite dissimilar). The sum is the best fit to a given MEP with a defined set of
the gross atomic population, and the net monopoles, dipoles, and so forth.
atomic charge is simply this plus the nuclear Typically, fragments of molecules of inter-
charge. The result is very sensitive to the basis est are analyzed by ab initio techniques to gen-
set (the number of atomic orbitals) used. De- erate their MEPs that are the reference for
spite poor fit of the molecular electrostatic po- parameterization of charge. Besler et al. (152)
tential derived with point charges to the ab reported fitting of atomic charges to the elec-
initio electrostatic potential, or that derived trostatic potentials calculated by the semiem-
from a distributed multipole analysis (150), pirical methods AM1 and MINDO. The
widespread use continues because they do re- MINDO charges derived by fitting the MEP
flect chemical trends and are reportedly com- can be linearly scaled to agree with results de-
patible with known electronegativities. In ad- rived from ab initio calculations. Among the
dition, this option is commonly available in motivations for semiempirical methods are
software packages. Unfortunately, poor repre- the facts that semiempirical methods using
sentation of the electric field surrounding the high quality basis sets often yield better re-
molecule results from use of atom-centered sults than ab initio techniques employing min-
monopole models (42), even when more care- imal basis sets, and the significant reduction
ful methods are used to distribute the charge. in computational time in moving from ab ini-
2.2.1.2 Methods to Reproduce the Molecu- tio to semiempirical calculations. Rauhut and
lar Electrostatic Potential (MEP). The electro- Clark (153) used the AM1 wave function to
static potential surrounding the molecule that develop a multicenter point-charge model in
is created by the nuclear and electronic charge which each hybrid natural atomic orbital is
distribution of the molecule is a dominant fea- represented by two charges located at the cen-
ture in molecular recognition. Williams re- troid of each lobe. Thus, up to nine charges (4
views (42) methods to calculate charge models orbitals and 1 core charge) are used to repre-
to accurately represent the MEP as calculated sent heavy atoms. Results using this approach
by ab initio methods by use of large basis sets. aMirm the observations that distributed
The choice between models (monopole, dipole, charges are more successful than atom-cen-
quadrapole, bond dipole, etc., Fig. 3.12) de- tered charges in reproducing intermolecular
pends on the accuracy with which one desires interactions (154, 155).
to reproduce the MEP. This desire has to be
balanced by the increased complexity of the 2.2.2 Parameter Derivation for Force Fields.
model and its resulting computational costs Because molecular mechanics is empirical, pa-
when implemented in molecular mechanics. rameters are derived by iterative evaluation of
The first problem is to select points where computational results, such as molecular ge-
the MEP is to be evaluated and eventually fit- ometry (bond lengths, bond angles, dihedrals)
ted, the position of the shell outside the VDW and heats of formation, compared with exper-
radii of the atoms in the molecule, and the imental values (20). Lifson has coined the ex-
spacing of grid points on that shell. Sampling pression "consistent" for force fields in which
too close to the nuclei gives rise to anomalies structures, energies of formation, and vibra-
because the potential around nuclei is always tional spectra have all been used in parame-
positive. Singh and Kollman (151) report the terization by least-squares optimization. In
use of four surfaces at 1.4, 1.6, 1.8, and 2.0 the case of bond lengths, bond angles, and
times the VDW radii, with a density of one to VDW parameters, crystallography has pro-
five points per A'. This paradigm was reported vided most of the essential experimental data-
to give an adequate sampling to which the fit- base. Major efforts (156) to derive general sets
ted charges were fairly insensitive, at least at of parameters from quantum mechanical cal-
the higher values. An improved procedure, the culation have been made, especially for sys-
restrained electrostatic potential fit (RESP), tems for which adequate experimental data
was developed by Bayly et al. (41) to enhance are unavailable. Although quantum mechan-
transferability of the resulting point charges. ics is certainly adequate for initial approxima-
and receptor (185). At every cavity-pocket in- areas that are less well packed and available
terface point, the electrostatic potential of for ligand modification.
both the atoms forming the cavity and those of
the binding ligand are calculated. A rough ap- 3.3.2 Three-Dimensional Databases. Medici-
proximation of complementarity is computed nal chemists have recognized the potential of
by multiplying these potentials together. A fa- searching three-dimensional chemical data-
vorable electrostatic interaction is produced bases to aid in the process of designing drugs
when the electrostatic potentials are opposite for known, or hypothetical, receptor sites. Sev-
in sign. Therefore, favorable interactions are eral databases are well known, such as the
indicated when the product of these values is a Cambridge Crystallographic Database (194)
(CSD). The crystal coordinates of proteins and
negative number. Likewise, unfavorable in-
other large macromolecules are deposited into
teractions are indicated when the product of
the Brookhaven Protein Databank (195). The
these values is a positive number and the po-
conformations present in crystallographic da-
tential of the cavity and that of the binding tabases reflect low energy conformers that
ligand have the same sign. These products are should be readily attainable in solution and in
then normalized, assigned a color, and dis- the receptor complex. The three-dimensional
played. orientation of the key regions of the drug that
In a similar way, an estimate of the hydro- are crucial for molecular recognition and bind-
phobic character of a segment of the surface ing are termed thepharmacophore. The inves-
can be quantitated and indicated through tigator searches the three-dimensional data-
color coding. The ability to rapidly switch be- base through a query for fragments that
tween these hydrophobic and electrostatic contain the pharmacophoric functional
surface representations, to visually integrate groups in the proper three-dimensional orien-
the optimal complementarity between site tation. Using these fragments as "building
and potential ligand to be designed, is helpful. blocks," completely novel structures may be
constructed through assembly and pruning
3.3 Design of Ligands (196). Receptor sites are complex both in geo-
metrical features and in their potential energy
3.3.1 Visually Assisted Design. In the pro- fields, and many diverse compounds can bind
cess of optimization of a lead, one needs to to the same protein by occupying various com-
ascertain where modification is feasible. Al- binations of subsites. Noncrystallographic da-
though visualization of the excess space avail- tabases have been developed as well. One ex-
able in the active-site cavity by directly exam- ample is the three-dimensional database of
ining ligands is useful for locating selected structures from Chemical Abstracts gener-
regions where ligand modifications may be ated through CONCORD (197-199) that con-
made, it is not well suited for fully character- tains over 700,000 entries. The use of such
izing the void that exists between the ligand databases is most applicable when the binding
and the receptor, the ligand-receptor gap re- of a particular ligand and its receptor is well
gion; information concerning the relative di- understood in terms of functional group rec-
mensions of free space is difficult to discern. ognition, and a crystal structure of the com-
To facilitate the display of this information, plex is known (200). One approach to ligand
Ho and Marshall (185) developed another al- design is to develop novel chemical architec-
gorithm to color-code the cavity display by the tures (i.e., scaffolds) that position the pharma-
ligand-receptor nearest atom gap distance. cophoric groups, or their bioisosteres, in the
The actual VDW, surface-to-surface distance correct three-dimensional arrangement.
(not center to center) between the ligand and Gund conceived the first prototypic pro-
enzyme atoms is calculated. When the ligand- gram designed to search for molecules that
receptor distances have been calculated at all match three-dimensional pharmacophoric
cavity-pocket interface lattice points, a user- patterns (201, 202). This program, MOLPAT,
defined color-coding scale is implemented to performed atom-by-atom searches to verify
generate the displays. This highlights those comparable interatomic distances between
3 Known Receptors
pattern and candidate structures. Although mentarity. Furthermore, CHEM-X (210) per-
rigorous, this approach was tedious and re- forms a rule-based conformational search on
quired optimization. Lesk (203) devised a each structure in the database to account for
method that used the geometric attributes of conformational flexibility. For a comprehen-
the query to screen potential candidates. Sim- sive review of three-dimensional chemical da-
ilarly, Jakes and Willett (204) proposed that tabase searching, see Martin et al. (212,213).
screens based on interatomic distances and Pharmaceutical companies have developed
atom types could considerably augment three-dimensional databases for their com-
search efficiency. Furthermore, Jakes et al. pound files to help prioritize candidates for
(205) showed that methods widely used in screening (210, 214). An essential component
two-dimensional structure retrieval could be in such a system is a method for assessing sim-
applied to three-dimensional searches, to re- ilarity (212,215). Because most compound da-
move the vast majority of compounds before tabases were entered as two-dimensional
more rigorous comparisons. This was vali- structures, this has required conversion to a
dated in test searches against a subset of the three-dimensional format. Programs have
CSD. This concept was furthered by Sheridan proved (197-199, 216) useful in generating
et al. (200),who included screens based on aro- plausible three-dimensional structures from
maticity, hybridization, connectivity, charge, the connectivity data, as reviewed by Sa-
position of lone pairs, and centers of mass of dowski and Gasteiger (217). Because of the in-
rings. To contain this wealth of information, herent flexibility in most compounds, the use
an inverted bit map [the presence or absence of a single conformation to represent the
of a feature is encoded as a 1 or 0 (bit) at a three-dimensional potential for interaction of
particular location in a "keyword"] was em- a molecule is a clear limitation. Development
A
ployed for highly efficient screening, hundreds of three-dimensional databases with a com-
of thousands of compounds in minutes. pact, coded representation of the conforma-
Similar database searching methods have tional states available to each compound is a
been incorporated into a number of current logical next step. Efficient use of such a data-
database searching systems. Programs such as base requires methods for evaluating three-
CAVEAT (206), ALADDIN (Abbott) (2071, dimensional similarities. In addition to identi-
3DSEARCH (Lederle) (208), MACCS-3D fication of compounds that can present an.
(209),CHEM-X (2101, UNITY (2111, and oth- appropriate three-dimensional pattern, com-
ers contain considerable functionality useful pounds must also fit within the receptor cav-
for such an approach. CAVEAT (206) is de- ity. Based on a shape-matching algorithm,
signed to assist a chemist in identifying cyclic Sheridan et al. (200) screened candidate com-
structures that could serve as the foundation pounds to select those whose volumes would
*
for novel compounds. In particular, it allows fit within the combined volumes of known ac-
an investigator to rapidly search structural tive compounds. Previously, this group used
databases for compounds containing substitu- (218) the same algorithm to help identify po-
ent bonds that satisfy a specific geometric re- tential ligands for papain and carbonic anhy-
lationship. ALADDIN (2071, 3DSEARCH drase, by screening compounds from the CSD.
(208), MACCS-3D (209), and CHEM-X (210) Screening of the active site of HIV protease
are similar, in that geometric relationships be- identified (219) haloperidol (Fig. 3.20) as an
tween various user-defined atomic compo- - inhibitor of the enzyme and provided a novel
nents can be used as a query to retrieve match- chemical lead for further investigation. Burt
ing structures. Features have been included to and Richards (220) introduced flexible fitting
allow the user to delineate molecular charac- of molecules to a target structure, with assess-
teristics (atom type, bond angles, torsional ment of molecular similarity as a means of
constraints, etc.) to ensure the retrieval of rel- dealing with the conformational problem.
evant compounds. Additional constraints have The use of preliminary screens can elimi-
been incorporated into 3DSEARCH (208) and nate the vast majority of compounds before
ALADDIN (2071, including the consideration more rigorous, and computationally demand-
of retrieved ligand-receptor volume comple- ing, pattern-matchingcomparisons (212,213).
haps the first to employ this philosophy in a

novel application (220) of the program DOCK.
This well-known program searches three-di-
mensional databases of ligands and deter-
mines potential binding modes of any that will
fit within a target receptor (183). However,
only a single, static conformation of each da-
tabase structure is maintained, disregarding
ligand flexibility. In DeJarlais' method, con-
formational flexibility was later introduced by
dividing individual ligands into fragments
overlapping at rotatable bonds. Each frag-
ment was first docked separately into various
receptor regions. Attempts were then made to
Figure 3.20. Structure of bromperidol (top) found reassemble the component parts into a legiti-
by DOCKprogram when used on active site of HIV-1 mate structure. A current example of this ap-
protease (219) compared with structure of JG-365 proach is the program LUDI, written by Bohm
(bottom),a typical substrate-derived inhibitor.
(221,222). In this program, a receptor volume
of interest is scanned to determine subsites
This search strategy is indeed very quick and where hydrogen bonding or hydrophobic con-
efficient; however, all retrieved compounds tact can occur. Small complementary mole-
must contain every query component as de- cules are then chosen from a database and po-
fined in the preliminary screens. As the num- sitioned within these subsites to optimize
ber and complexity of the query elements in- binding energy. The process concludes with
crease, one would anticipate fewer true hits, the selection of various bridging fragments to
but a corresponding rise in the number of link subsets of small molecules.
near-misses. If such near-misses could be re- Chau and Dean published a series of arti-
covered, effective ligands may simply arise cles addressing whether small molecular frag-
from slight conformational modification to ments, with transferable properties, could be
maximize receptor interactions. Furthermore, generated for further use in automated site-
the retrieval and combinatorial assembly of directed drug design (223-225). A prograi
numerous pharmacophore subcomponents was developed to combinatorially generate all
would intuitively produce many more diverse three-, four-, and five-atom fragments con-
structures than the quest for a single com- taining any geometrically allowed combina-
pound in the database incorporating the en- tion of H, C, N, 0 , F, and C1. Aromatic frag-
tire pharmacophore, that is, all requirements ments were produced as well. Searches of the
of the query. This suggests an approach that Cambridge Structural Database (194)were
would retrieve compounds containing any performed to determine the most frequently
combination of a minimum number of match- occurring fragments. To utilize these frag-
ing pharmacophoric elements. ments as components for ligand assembly,
Methods have been developed that employ more data were necessary to better character-
this "divide-and-conquer" approach to ligand ize them. They were analyzed, therefore, to
development. The active site is partitioned statistically ascertain bond lengths from the
into subsites, each containing several pharma- CSD to provide some geometrical constraints
cophoric elements. Chemical fragments com- for structure assembly. Finally, the transfer-
plementary to each subsite are then designed ability of atomic residual charges was studied
or retrieved from databases. Finally, frag- by comparing charges generated for the atoms
ments are linked to form aggregate ligands. in each fragment with charges calculated for
The advantage of this approach is that ligand whole molecules containing the fragment.
diversity can be tremendously augmented Another approach, FOUNDATION (2261,
through the combinatorial assembly of nu- searches three-dimensional databases of
merous subcomponents. DeJarlais was per- chemical structures for a user-defined query
3 Known Receptors
consisting of the coordinates of atoms and/or Bartlett to find cyclic scaffolds (207) by search-
bonds. All possible structures that contain any ing the CSD (195) for the correct vectorial ar-
combination of a user-specified minimum rangement of appended groups.
number of matching atoms and/or bonds are All of these approaches attempt to help the
retrieved. Combinations of hits can be gener- chemist discover novel compounds that will be
ated automatically by a companion program recognized at a given receptor. Van Drie et al.
(104),SPLICE, which trims molecules found (207) described a program, ALADDIN, for the
from the database to fit within the active site design or recognition of compounds that meet
and then logically combines them by overlap- geometric, steric, or substructural criteria,
ping bonds to maximize their interactions and Bures et al. (235) described its successful
with the site (Fig. 3.21). The addition of bridg- application to the discovery of novel auxin
ing fragments to those recovered from the da- transport inhibitors. As our knowledge base of
tabase allows generation of many novel li- receptors grows, such tools will prove increas-
gands for further evaluation. ingly useful. The ability to transcend the
chemical structure of lead compounds, while
3.3.3 De Novo Design. Design of novel retaining the desired activity, should dramat-
chemical structures that are capable of inter- ically improve the ability to design away unde-
acting with a receptor of known structure uses sirable side effects. Bohm developed the pro-
methodology that is much more robust, given gram LUDI (221,222) to construct ligands for
that the geometric foundations of molecular active sites with an empirical scoring function
sciences are much firmer than the thermody- to evaluate their construction.
namic ones. Techniques for the design of novel
structures to interact with a known receptor 3.3.4 Docking. The search for the global
site are becoming more available and show minimum, or the complete set of low energy
promise (227-229). It has become quite evi- minima, on the free energy surface when two
dent that much of a molecule acts simply as a molecules come in contact is commonly re-
scaffold to align the appropriate groups in the ferred to as the "docking" problem [(236);see
three-dimensional arrangement that is crucial also Leach (21)l.Any useful molecular docking
for molecular recognition. By understanding program must be computationally efficient in
the pattern for a particular receptor, one can determining the most favorable binding mode,
transcend a given chemical series by replacing sufficiently sensitive in its scoring function to
one scaffold with another of geometric equiv- discriminate between alternate binding
alence. This offers a logical way to dramati- modes and the correct mode, and robust
cally change the side-effect profile of the drug enough to allow various ligand-receptor sys-
as well as its physical and metabolic at- tems to be studied.
tributes. Various software tools are already 3.3.4.1 Docking Methods. In the case of
under development to assist the chemist in two proteins of known structure that can be
this design objective. Lewis and Dean de- approximated as rigid bodies, there are 6 de-
scribed their approaches to molecular tem- grees of freedom, the relative position ( x , y,
plates in a series of papers (230, 231). An al- and z coordinates), and relative orientation
ternative approach, BRIDGE (Dammkoehler (roll, pitch, and yaw to use the aeronautical
et al., unpublished), is based on geometric gen- expressions) to be explored. Several very intel-
eration of possible cyclic compounds as scaf- ligent approaches to this problem have been
folds, given constraints derived from the types developed. The first and most well known ap-
of chemistry the chemist is willing to consider. proach is the DOCK program (http://www.
Nishibata and Itai (232, 233) published a cmpharm.ucsf.edu/kuntz/dock.html)(183) that
Monte Carlo approach to generating novel was developed to solve the ligand-receptor
structures that fit a receptor cavity. Pearlman problem. This program uses abstract repre-
and Murko (234) combined a similar approach sentations (a set of spheres) of the convex
with molecular dynamics with illustrative ap- shape on the receptor to be filled and the con-
plications to HIV protease and FK506 binding cave ligand and matches them to generate
protein. CAVEAT is a program developed by plausible binding modes with complementary
--L
Figure 3.21. Combination by SPLICE (104)of fragments that bind to different subsites of NADP
binding site of DHFR to generate a more optimal ligand.
3 Known Receptors
surfaces. An example of the successful use of The algorithm is dependent on selection of an

DOCK was the identification of 13 inhibitors appropriate base fragment, requiring one that
of DHFR from P. carinii selected from the makes enough specific contacts with the pro-
Fine Chemicals Directory. Of 40 compounds tein that a definite preference for binding ori-
predicted to be active, these 13 showed IC,, entation can be determined. FlexX holds bond
values less than 150 micromolar. DOCK (13, lengths and angles invariant, using the values
183) has been quite successful in finding non- of the input ligand. The core is used as the base
congeneric molecules of the correct shape to to which low energy fragment conformers are
interact with a receptor cavity (237-239). An added, with these conformers based on a sta-
overview of docking and scoring functions is tistical evaluation of fragments in the Cam-
available (240).
bridge Structural Database.
Another approach focusing on complemen-
3.3.4.2 Scoring Functions (247-260).
tary surface maximization uses a grid represen-
Three-dimensional qualitative structure-ac-
tation of the surface in a series of slices. The
slices from the target molecule are processed tivity relationship (3D QSAR) approaches
against the slices from the other molecules by based on the use of training sets of structures
use of a variant of the fast-Fourier transform with measured affinities are often used to gen-
(241-244) to identify those sections with the erate a model with predictive powers. The lim-
greatest complementarity. This approach has itation in such methodologies is the necessity
been incorporated and extended to electrostatic for a robust training set of diverse chemical
complementarity in FTDOCK (http://www. structures to encompass the domain of possi-
bmm.icnet.uk/ftdock/ftdock.html) by Gabb ble interactions with the therapeutic target.
et al. (245). This approach is a relatively fast At the beginning of a project, or when three-
method for searching the 6 degrees of freedom dimensional information on a novel target
and has reproduced the binding mode of sev- first becomes available, such data on a diverse
eral macromolecular complexes and is avail- set of chemical ligands are usually not avail-
able in GRAMM (Global Range Molecular able. For this reason, one would like to capital-
Matching, http://reco3.musc.edu/gramm/) ize as much as possible on the physical chem-
that was judged the best when applied to iden- istry of the possible interactions between the
tify the binding modes in a set of macromolec- ligand and its receptor when the structure of
ular complexes at the second (Fall, 1996) the receptor is available. Because of the need .
CASP evaluation of prediction methods. to prioritize synthesis in structure-based de-
Obviously, other degrees of freedom should sign efforts and prioritize compounds in com-
be included to allow both molecules to undergo binatorial libraries for screening as well as
conformational changes (side-chain relax- predict the structure of protein complexes, an
ation, at the very least, in the case of proteins). increased interest in scoring functions (i.e.,
In many cases, the active site of the receptor is empirical approaches to predict affinities)
assumed to be rigid (rationalized on the basis have emerged. Several early attempts and
of the specificity and affinity of the system) their reported predictive ability are cited next.
and a flexible ligand is docked. This limits the
number of degrees of freedom to be explored. 1. Bohm (221, 222) analyzed 45 protein-li-
By simply generating a set of low energy con- gand complexes (affinity range = -9 to
formers of the ligand and processing them se- - 76 kJ/mol) and found the following equa-
quentially with DOCK (220), one can sample tion by multiple regression analysis:
on a low resolution scale; the flexible ligand
problem can be addressed on the basis of shape
complementarity.
FlexX is a program for flexibly docking li-
gands into binding sites, by use of an incre-
mental construction algorithm that builds the
r 2 = 0.76, S = 7.9, q 2 = 0.696,
ligands in the binding site (246). It starts by
extracting a core fragment from the ligand. S (press) = 9.3 (2.2 kcal/mol)
2. Krystek et al. (261) analyzed 19 protein- ing a good range of activity as well as using
ligand complexes in an update of the No- several inhibitors from the published test set.
votny approach (262). The PLS predictive r 2 value was 0.565, with an
absolute average error of 0.694. The predictive
r 2 value is considerably lower than that of the
first test set, although this is attributed to the
smaller range and distribution of activity in
this set. The absolute average error is almost
identical.
Although shape complementarity is an im-
3. VALIDATE is a hybrid approach to predict portant consideration and shows correlation
the binding affinity of novel ligands for with the energy of interaction, it does not con-
a receptor of known three-dimensional sider the electrostatics of the system (the rel-
structure based on the calculation of sev- ative positioning of hydrogen-bond donors and
eral physicochemical properties of the li- acceptors, etc.). More sophisticated energetic
gand itself as well as a molecular mechanics functions are often used to refine the candi-
analysis of the receptor-ligand complex date binding modes found by DOCK, or in the
(263). The properties of a diverse training docking process itself. The assumption of rigid
set (-log K,, range = 2.47-14.00) of 51 geometry for the receptor allows a preprocess-
crystalline complexes were analyzed by ing of the energetic contribution of the recep-
partial least squares (PLS) statistical tor to each grid point of a lattice constructed
methodology and neural network analysis within the active site cavity (131, 265, 266).
to select a statistical model from a variety This allows a simple estimation of the energy
of parameters with the following proper- of interaction of each atom in the ligand by
ties: finding the energy of the lattice points that are
closest followed by interpolation. By increas-
ing the efficiency of the scoring function, more
candidate binding modes can be evaluated
S (press) = 1.29 (1.75 kcal/mol) and, thus, one resembling the global minimum
is more likely to be found. This assumes that.
The true measure of any model rests in its the scoring function used is sufficiently accu-
ability to predict the affinity of new com- rate to discriminate between the correct bind-
pounds. This would include the prediction of ing mode and others, and the problem is sim-
unique ligands bound to receptors that exist in ply one of sampling. Most scoring functions
the base set as well as the affinities of unique used, however, deal almost essentially with
ligand/receptor complexes. Three separate the enthalpy of binding and ignore the entropy
test sets were compiled for this purpose. The of binding. It should not be surprising, there-
first set consisted of 14 inhibitors that were fore, that the agreement between the pre-
obtained from crystalline receptorlligand com- dicted binding modes and those observed ex-
plexes. Neither ligands nor their receptor perimentally are not always perfect. AS one is
classes were included in this training set. attempting to discriminate between alternate
Included were 2 DHFR, 2 penicillipepsin, 3 binding modes of the same complex, difficul-
carboxypeptidase, 2 alpha-thrombin, and 2 ties in estimating entropy and desolvation are
trypsinogen inhibitors as well as 3 DNA-bind- minimal because many of the terms (solvation
ing molecules. Prediction of binding affinities and entropy of isolated ligand and receptor) in
gave a PLS predictive r2 = 0.786, with an ab- the comparison cancel.
solute average error of 0.693 log units. The 3.3.4.3 Search for the Correct Binding
second test set consisted of 13 HIV protease Mode (267-283). Just as there are many dif-
inhibitors whose initial conformation and ferent approaches to the global minimization
alignment were derived from the CoMFA problem, most, if not all, have been applied to
analysis done by Waller et al. (264). The selec- the docking problem. These include molecular
tion of the inhibitors was based on maintain- dynamics, Monte Carlo sampling, systematic
3 Known Receptors
search (284), the genetic algorithm (101, 102, several groups (101,102,105,285,286,293) to
105,285,286),and straight derivative optimi- optimize the scoring function used. Encoding
zation with multiple starting geometries. A of the conformation of the ligand by torsional
combination of MDMC has been shown (287, degrees of freedom and generating increas-
288) to be a fairly efficient method for deter- ingly more fit sets of progeny by mutation and
mining the free energy surface in smaller host- crossover have proved to be an effective search
guest systems (289). The combination of mo- strategy. In one example (285), a Gray-coded
lecular dynamics to locally sample with Monte binary string was used for the three transla-
Carlo that allows for conformational transitions, three rotations, and bond rotations that
tions provides adequate sampling if sufficient specified the binding mode, and a two-point
computational resources are available.
crossover operator was used in the GA algo-
Wasserman and Hodge (290) used molecu-
rithm. In the four examples of complexes with
lar dynamics to dock thermolysin inhibitors to
an approximate model of the enzyme, with known crystal structures, the results of rigid-
flexibility in the active site (38 of 314 residues) body docking with a straightforward applica-
and ligand and with the rest of the enzyme tion of the GA were not encouraging, in that
represented by a grid approximation. A solva- the correct binding mode was identified in
tion model was used to compensate for desol- only two of the four test cases. Restraining the
vation in complex formation. To get 22 of 25 GA to search subdomains (different binding
runs to orient the hydroxamate function cor- hypotheses) in a systematic manner corrected
rectly, the hydroxamate oxygens of the start- this problem. Only the ligand was allowed
ing conformation were initialized within 4 A of flexibility and the GA procedure was repeated.
the zinc. If they were allowed to vary to 8 then Several binding modes similar to that seen in
only 3 of 24 runs placed the ligand correctly. the experimental complex were found in each
Obviously, there is a serious sampling problem. example, but ones with the lowest energy did
Desmet et al. (291) used a truncated (dead- not necessarily have the lowest rms from the
end elimination) search procedure to bind experimental, pointing out deficiencies in the
flexible peptides to the MHC I receptor. The AMBER-like scoring function used.
translatiodrotational space covered 6636 rel- Generally, no single scoring function can
ative orientations and each nonglycine/proline accurately predict the binding affinities for all
residue of the peptide had 47 main-chain con- types of ligands with all types of receptors.
formers. Side chains had threefold rotations Consensus scoring (294, 295) is the simulta-
about their chi angles and 28 side chains of the neous use of multiple different scoring func-
receptor were allowed to rotate. Seventy-four tions to make virtual screening more predic-
low energy structures were obtained with an tive. CScore (Tripos, Inc.) is a consensus-
average rmsd of 1 A. The lowest energy struc- scoring program that integrates several well-
ture had an rmsd of 0.56 A. Peptides up to 20 known scoring functions from the scientific
residues were docked with this procedure. literature. Each individual scoring function is
King et al. (292) used an empirical binding used to predict the affinity of ligands in candi-
free-energy function when docking MVT-101 date complexes. CScore also creates a consen-
to HIV protease. Forty-nine translationlrota- sus column, containing integers that range
tions were examined with the PonderIRichard from 0 to the total number of scoring func-
rotamer library. Only a limited number of tions. Each complex whose score exceeds the
rotamers for each amino acid were examined: threshold for a particular function adds 1 to
Thr(21, Ile(31, Nle(31, Nle(3), Gln(6), and the value of the consensus; configurations be-
Arg(5). According to the authors, 2.24 x 10'' low the threshold contribute a zero. Consen-
discrete states were examined. Sixty-four low sus columns can also be calculated from any
energy structures with an average rmsd of combination of externally supplied indicators,
1.36 A were found. If the CHARMM potential so that key aspects of binding (e.g., the pres-
was used with the same protocol, then the av- ence of a specific hydrogen bond) can be used
erage rmsd was increased to 1.68 k to discriminate good configurations from bad
The genetic algorithm has been used by ones. CScore can be used to rank multiple con-
figurations of the same ligand docked with a

receptor, or to rank selected configurations of
different ligands docked to the same receptor.
These approaches implicitly assume that
the observed receptor cavity has some physical
stability (i.e., a static view) and is not signifi-
cantly altered by binding of different ligands.
Although there is no guarantee that this is
true for any particular case under study, the
specificity seen in biological systems argues
that a receptor site has some functional signif-
icance in imposing its specific steric and elec-
trostatic characteristics in the molecular rec-
ognition and selection process. One must
always be prepared, however, for binding to
sites other than that targeted, and possible
exposure of cryptic sites that are not observed
in the absence of the ligand (181). The current
Figure 3.22. Use of systematic search to explore
computational limits in molecular dynamics
possible binding modes of mechanism-based inhibi-
simulations restrict the chance of uncovering tors of chymotrypsin (284) by rotation of six bonds
such alternative binding modes in our studies. (*), which orient carbonyl of substrate relative to
If we can assume the binding mode of our can- hydroxyl (Du) of Ser-195.
didate drug is nearly identical to that of a
known compound, however, then we have a trum of chemistry such as metals (29-31,301-
legitimate basis for thermodynamic perturba- 305). Combinations of molecular mechanics
tion calculations. Multiple or alternate bind- with quantum chemistry (159, 160, 162, 306)
ing modes remain a major fly in the ointment. are clearly necessary for problems in which
Naruto et al. (284) demonstrated a systematic chemical transformations are involved.
approach to the determination of productive Rather amazing agreement between calcula-
binding modes for mechanism-based inhibition and experiment has been reported (165,
tors Fig. 3.22) that could select starting struc- 307) on the relative stabilities of transition-
tures for complexes for molecular dynamics state structures, although there is some con-
simulations. Combinations of methods, such troversy (308) regarding this approach. In any
as Monte Carlo or systematic search, to gener- case, this is another area of rapid growth as
ate multiple starting configurations for simu- adequate computational resources become
lations to improve sampling and thermody- available. Riley et al. (309, 310) found an ex-
namic reliability will increase as adequate cellent correlation between the relative stabil-
computational power to support these hybrid ities of conformers in manganese complexes of
approaches becomes more readily available. pentaazacrowns and their ability to catalyze
Many technical limitations remain to be the dismutation of superoxide.
overcome before ligand design becomes reli-
3.4 Calculation of Affinity (260)
able and routine. Many deficiencies in molec-
ular mechanics previously cited remain that 3.4.1 Components of Binding Affinity
limit reliability. Adequate modeling of electro- (255). The ability to calculate the affinity of
statics remains elusive in many experimental prospective ligands based on the known three-
systems of interest such as membranes. dimensional structure of the therapeutic tar-
Newer derivations of force fields, such as MM3 get would allow prioritization of synthetic tar-
(27, 296 and references therein), CHARMM gets. It would bring quantitation to the
(297, 298), AMBERIOPLS (1571, ECEPP qualitative visualization of a potential ligand
(2991, and others (156,300),are attempting to in the receptor site. Although this problem has
more accurately represent the experimental been solved in principle, in practice, direct ap-
data, whereas others include a broader spec- plication of molecular mechanics has not yet
3 Known Receptors
OR3
Figure 3.23. Vancomycin-

peptide complex used by Wil-
liams et al. (311-315) to inves-
tigate components of free
energy of binding.
proved to be a reliable indicator. The reasons AGvdw is the energy derived from enhanced
behind this difficulty become more obvious if van der Wads interactions in complex; and
one dichotomizes the free energy of binding AGH is the free energy attributed to the hydro-
into a logical set of components. phobic effect (0.125 kJ/mol per A2of hydrocar-
For example, Williams (311-314) used a bon surface removed from solvent by complex
vancomycin-peptide complex (Fig. 3.23) as an formation).
experimental system in which to evaluate the Through use of this analysis on the dipep-
various contributions to binding affinity. A tide-vancomycin system, estimates of the con-
similar analysis for antibody mutants was at-
tempted by Novotny (262).
tribution of the hydrogen bonds to binding .
were made (312) that were considerably
higher (-24 kJ/mol, -6 kcal/mol) than those
derived experimentally. The most likely
source of error is the assumption of complete
loss of relative and internal entropy upon
where AGerans + rot, is the free energy associ- binding. In retrospect, Searle and Williams
ated with translational and rotational free- (313) examined the thermodynamics of subli-
dom of the ligand. This has an adverse effect mation of organic compounds without inter-
on binding of 50-70 kJ/mol (12-17 kcallmol) nal rotors, and showed that only 40-70% of
at room temperature for ligands of 100-300 theoretical entropy loss occurs on crystalliza-
Da, assuming complete loss of relative trans- tion. This provides an estimate of the entropy
lational and rotational freedom. AGrotOrs is the loss to be expected on drug-ligand interaction.
free energy associated with the number of ro- Applying this correction to the peptide-vanco-
tational degrees of freedom frozen. This is 5-6 mycin system led (314)to a more conventional
kJ/mol (1.2-1.6 kcal/mol) per rotatable bond, view of the hydrogen bond of between -2 and
assuming complete loss of rotational freedom. -8 kJ/mol(0.5-2.0 kcallmol). Because several
~ c o n f o m is the strain energy introduced by of the components in the binding energy esti-
complex formation (deformation in bond mate are directly related to the degree of order
lengths, bond angles, torsional angles, etc. of the system (entropy),simulations in solvent
from solution states); X AG, is the sum of in- may be necessary to quantitate the degree by
teraction free energies between polar groups; which the relative motions of the ligand and
protein are quenched and the restriction on Data Bank, drawing on hundreds or thou-
rotational degrees of freedom upon complex- sands of examples of each interaction type.
ation. Aqvist (316, 317) developed the linear Grzybowski et al. (321) combined a knowl-
interaction energy (LIE) method for calculat- edge-based potential with a Monte Carlo
ing the ligand-binding free energies from mo- growth algorithm that generated a very potent
lecular dynamics simulations. Verkhivker et inhibitor of human carbonic anhydrase (322).
al. (318) developed a hierarchical computa- The resulting equation for all the atom-pair
tional approach to structure and affinity pre- interactions in a protein-ligand complex can
diction in which dynamics is combined with a yield free energies directly, given that solva-
simplified, knowledge-based energy function. tion and entropic terms are treated implicitly.
Despite the focus on short peptides interacting
with the SH2 domain with exhaustive calori- 3.4.4 Simulations and the ~hermodynamic
metric determination of binding entropy, en- Cycle. Given a known structure of a drug-re-
thalpy, and heat capacity changes, the overall ceptor complex with a measured affinity of the
correlation between computed and experimen- ligand, the thermodynamic cycle paradigm al-
td binding amnity remained rather modest. \OW% calcu1ation of the diffe~exein an it^
(AAG) with a novel ligand. Bash et al. (136)
3.4.2 Binding Energetics and Compari- successfully calculated the effect of changing a
sons. Because of the difficulties in calculating phosphoramidate group (P-NH) to a phos-
binding free energies (see below), attempts to phate ester (P-0) in transition-state analog
use AH as a means of correlation with binding inhibitors of thermolysin (Fig. 3.24). The dif-
affinities have often appeared in the litera- ference in free energy between a benzenesul-
ture, sometimes meeting with considerable fonamide and itsp-chloro derivative as an in-
success. These successes, however, are fortu- hibitor of carbonic anhydrase has been
itous and depend on simplifying assumptions calculated (323) as well. This is similar to the
as well as the well-known correlation (319) be- original application to enzyme-ligand work on
tween AH and AG, which has been suggested benzamidine inhibitors of trypsin, in which
as an unusual property of the solvent water. A the mutation of a proton to a fluorine was cal-
similar correlation has been observed in non- culated (324). Hansen and Kollman (325) cal-
aqueous systems and relates to higher entropy culated differences in the free energy of bind-
loss associated with stronger enthalpic inter- ing of an inhibitor of adenosine deaminas'e as
actions (313). It is a common assumption with one changes a proton to a hydroxyl group by
congeneric series that the desolvation ener- use of a model of the active site. Other exam-
gies and entropic effects will be approximately ples (326-328) looked at the difference in
the same across members of the series. This, binding of two stereoisomers of a transition-
often tacit, assumption may hold for most of state inhibitor of HIV protease (Fig. 3.25) and
the series, but complex formation is depen- the affinity of DHFR for methotrexate analogs
dent on the total energetics of the complex, (329). One obvious conclusion can be drawn:
and what may appear a relatively innocuous successful applications in the literature deal
change in a substituent may trigger a different with relatively minor perturbations to a struc-
binding mode in which the ligand has reori- ture where there is less chance that the bind-
ented. This will likely have an impact on de- ing mode might be altered.
solvation as well as entropic effects, in that the There is at least one example in the litera-
interactions of the majority of the ligand have ture (330) in which the calculated affinity dif-
changed environment. ference did not agree with the experimental
date [binding of an antiviral agent to human
3.4.3 Atom-Pair Interaction Potentials. Af- rhinovirus HRV-14 and to a mutant virus in
finities can be calculated based on ligand-re- which a valine was mutated to a leucine (Fig.
ceptor atom-pair interaction potentials that 3.2611. Here a p-branched amino acid (Val)
are statistical in nature rather than empirical. was converted into Leu, which lacks the iso-
Muegge and Martin (320)derived these poten- propyl side chain adjacent to the peptide back-
tials from crystallographic data in the Protein bone besides the addition of a methyl group.
3 Known Receptors
M G (theoretical)= - 4.21 + 0.54

M G (experimental)= - 4.07 + 0.33
Figure 3.24. Calculated (136) difference in af-

finity (AAG) compared with experimental value
for two inhibitors of thermolysin.
The differences between calculation and ex- with electrostatics were cited. A review of ap-
perimental data may be related to rotational plications by Kollman (134) cites numerous
isomerism of the side chains that can be ex- other examples.
plicitly included (331). Despite the successful
examples of this approach that appear in the 3.4.5 Multiple Binding Modes. Realisti-
literature, there exists a growing healthy cally, congeneric series that can be a useful
skepticism regarding its general application. construct exist only in the mind of the medic-
In a discussion (332)of the application of sim- inal chemist. The orientation of the drug in
ulations to prediction of the changes in protein the active site depends on a multitude of inter-
stability attributed to amino acid mutation, actions and a minor perturbation in structure
problems in adequate sampling, particularly can destabilize the predominant binding mode .
of the unfolded state, as well as difficulties in favor of another. As examples, detailed
Figure 3.25. Structures of JG-365 and

Ro 31-8959 in which chirality at crucial
transition-state hydroxyl is reversed for
JG-365 optimal binding in the two analogs. An
alteration in binding mode was predicted
(333) to explain this observation that was
subsequently confirmed by crystallogra-
NH-lle-Val-OMe
P~Y.
J M G * = - 0.5\
Figure 3.26. Calculated (330) kcallmol
relative affinity of a Sterling-
Winthrop antiviral that binds to
rhinovirus coat protein (HRV-11)
and to the V188L mutant. Biolog-
ANH 0
0
ical data indicate that V188L mu- Leucine-188
tation drastically diminishes ac- Valine-188
HRV-14
tivity of the antiviral. HRV-14
analyses of the multiple binding modes shown they bind at the same site on the receptor (cer-
with thyroxine analogs (334) by transthyretin, tainly, the simplest hypothesis). Recent stud-
a transport protein, and enkephalin analogs ies on G-protein-coupled receptors indicates
(335) by an FAB fragment have been made that agonists and antagonists often have dif-
through crystallography. For this reason, the ferent binding sites, given that mutations in
probability of correct answers with thermody- the receptor can affect the binding of one and
namic integration studies is directly related to not the other. An example of such a study on
the similarity in structure between the ligand the angiotensin I1 receptor has been published
of interest and the reference compound. All (336). This story is only beginning to unfold,
three-dimensional methods for predicting af- but appears to be a general phenomenon in
finity require a fundamental assumption G-protein receptors (337, 338). Examples of
about the binding mode (in other words, an this phenomenon have been reported with an-
orientation rule for aligning compounds in the tagonists derived from screening where the
model). Examination of series of ligands bind- structure of antagonist and agonist differ dra-
ing to the same site usually includes examples matically, but also where the antagonists were
of similar compounds that have different bind- obtained by minor structural modification of
ing modes [e.g., the change in orientation (Fig. the natural agonist.
3.25) of the C-terminal portion of the Roche
3.5 Protein Structure Prediction
HIV protease inhibitor compared with
JG-3651 (333). Molecular modeling is cur- Prediction methods for generating the 3D
rently capable of distinguishing correctly in structure of a protein based on its sequence
many cases between alternate binding modes alone fall into several categories. There are
of the same ligand. Many components (desol- hierarchical methods that predict secondary
vation, entropy of binding, etc. of the ligand), structures and then attempt to fold those ele-
which cloud the issue of direct calculation of ments together. There are simulation meth-
affinities are constant when comparing bind- ods that attempt to fold the protein through
ing modes of the same compound and, there- the use of models of reduced complexity and
fore, do not have to be evaluated. The compu- then refine the prediction by using them to
tational costs of exploring possible binding constrain all-atom models. Additionally, there
modes within the active site is nontrivial, how- are hybrids of these approaches that rely
ever, especially when the protein is capable of heavily on heuristics. These methods have
reorganizing to expose alternative sites, as been successful in limited cases in the hands of
was the case for a series of ligands for hemo- their authors, but have generally been found
globin (181). lacking when tested by others in a more thor-
In a similar fashion, it is generally assumed ough and objective manner. Nevertheless,
from the competitive behavior for binding partial successes indicate that signal has be-
shown by many agonists and antagonists that gun to emerge from the smoke and mirrors.
3 Known Receptors
3.5.1 Homology Modeling. Often, the crys- is systematically forced to adopt the coordi-
tal structure of the therapeutic target is not nates of overlapping segments of the 3D motif
available. but the three-dimensional structure and its energy evaluated. In essence, the local
of a homologous protein will have been deter- multibodied interactions induced by the 3D
mined. Depending on the degree of homology constraints are evaluted with an empirical
between the two proteins, it may be useful to pseudopotential that has been calibrated on
model-build the structure of the unknown the PDB database (354,355) and that is capa-
protein based on the known structure. Many ble of returning a low energy for native se-
models (339341) of the various G-protein- quences compared with scrambled sequences
coupled receptors have been built based on ho- or protein with other 3D structures. If one
mology with bacterial rhodopsin. Models of cannot discriminate native structures from
the three-dimensional structures of human other folding motifs, then there is little chance
rennin (342) and HIV protease (343,344) were that an unknown sequence, which folds in a
built from crystal structures of homologous similar 3D pattern, would be discriminated.
aspartyl proteinases as aids to drug design. The basic assumption is that 3D homology ex-
The known structures of serine proteases ists between the test sequence and some se-
have served as templates for models of phos- quence represented in the motif database.
pholipase A2 (345) and convertases or subti- This is not necessarily true, inasmuch as
lases (346). The crystal
- structure of the MHC many as 40% of the new structures by crystal-
class I receptor served to generate a hypothet- lography determined have no known 3D ho-
ical model of the foreign antigen-binding site mologs. In fact, in an analysis of the genomes
of Class I1 histocompatibility molecules (347). of several sequenced microorganisms (356), no
Models of human cytochrome P450s have more than 12% of the deduced proteins had
been built by homology as well (348). detectable homology with proteins of known
One of the major difficulties facing con- structure. In the CASP competition, however,
struction of such models is the alignment the most predictive success has been with this
problem that is compounded by multiple in- approach when a 3D homology existed.
sertions and/or deletions. As the number of One interesting question that arises is an
known homologous sequences increases, the estimate of the number of protein motifs that
alignment problem is lessened by consensus exist. One way to approximate this is to as;
criteria. Although the interior core of the pro- sume random sampling of protein motif space
teins is often quite similar, significant alter- and then analyze the frequency of new motifs
ations can occur on surface loops, and much in new crystal structures that leads to a num-
effort has been expended to fold these loops ber of approximately 1500 folds (357). Of
(123, 349). With regard to the utility of such course, such an estimate is always biased by
models in drug design, one can expect that size of protein, ease of crystallization, abun-
they will prove useful conceptually, but that dance, and so forth. Lattice approaches give a
the molecular details required for optimizing maximal estimate of 4000 folds (358). Over
specificity, for example, would be deficient. 1000 protein structures are known with ap-
One tries to exploit the often subtle differ- proximately 120 folds (351).
ences that arise from sequence changes, which At a more local level, proteins are gener-
are reflected in the three-dimensional struc- ated from a set of architectural building
ture. Models built by homology would be ex- blocks, helices, sheets, turns, and so forth. If
pected to be weakest in those areas in which one can accurately determine the location of
sequence differences were greatest. these structural elements within a sequence,
then the difficulty of assembly of these com-
3.5.2 Inverse Folding and Threading (350- ponents is significantly easier because the
353). This is the ultimate in motif recogni- degrees of freedom have been drastically re-
tion. One makes use of the ever-increasing da- duced. Unfortunately, our ability to accu-
tabase of known three-dimensional structures rately determine these elements of secondary
to generate a set of 3D folding motifs for pro- structure seems to have peaked at the 75%
teins. The sequence of an unknown structure accuracy level (359, 360).
LIN US. LINUS (Local Independent Nucle- values. A contact between nonpolar atoms
ating Units of Structure) (361) is an imple- (carbon or sulfur) is worth -0.7 kcdmol at
mentation of a hierarchical folding model in closest contact and scaled down from there.
which protein sequences are subdivided into Buried non-hydrogen-bonding groups get a
overlapping 50-residue fragments to assess penalty of 1.5 kcal/mol. Polar conflicts in
the algorithm effectiveness in predicting which two donors or two acceptors are in con-
short- and medium-range interaction as well tact are given a similar penalty. Constraint-
as to limit computational complexity. The al- based exhaustive search is used (systematic
gorithms
- accumulate favorable structures search with limits such that no steric overlap
within a sequence window, and repeat the pro- is allowed and that a compact structure is gen-
cess as the window is allowed to grow over the erated), a branch-and-bound method that
sequence. Obviously, this is an embodiment of guarantees that all globally or near-globally
the principle of hierarchical condensation of optimal conformations will be found, while ne-
local initiation of folding. At the beginning, glecting less important conformations. The
the segment length is six and the starting con- compact structure is guaranteed by a volume
formation set to all extended backbone. Start- constraint about 60% higher than the volume
ing at the N-terminus of the segment, three- of a native protein of the same size. Side
residue subfragments are perturbed with chains are introduced in their most populated
backbone torsional values from a library to rotameric state from the PBD and only
give a trial conformation. If two atoms over- changed to an alternate rotamer to avoid a
lap, the trial conformation is rejected. Other- vdW contact. Four -proteins were used to test
wise, the energy is evaluated and selection de- the approach, avian pancreatic polypeptide
pends on the Metropolis criterion. For each (IPPT), crambin (ICRN), melittin (2MLT),
interaction cycle, 6000 iterations of this proce- and apamin (18 residues). Some 190 million
dure are performed, 1000 iterations for equi- conformations were generated for lPPT, with
librium and 5000 samples. Conformations of 8217 having an energy not more than 16 kcall
chain segments that give a high frequency in mol above the optimum found. The conforma-
the sample are frozen and the segment size tion with the lowest rms to the native struc-
increased. Backbone atoms and highly simpli- ture was within the 100 lowest energy
fied side chains are used in the simulations. conformations found, but the true native
The simplified energy function has a vdW structure had a lower energy by use of the
term, a hydrogen-bonding term, and a back- same energy function than that of any con-
bone torsional term. former found by 3-10%. This implies that the
Given the arbitrary fragmentation of the major problem was conformational sampling,
protein for computational efficiency, the pre- not just an oversimplified potential function.
dicted secondary structures were surprisingly Genetic Algorithm. Le Grand and Merz
accurate for the five cases examined, with he- (364) applied the genetic algorithm to a model
lical and sheet boundaries within two residues of proteins using a rotamer library and the
of their corresponding native structures. Nev- AMBER potential function. In a second study,
ertheless, the rms differences were rather they used a fragment library and a knowledge-
large, from 3 to 9 A. Certainly, these results based potential function. Sun (365) used a
are quite encouraging and confirm the ideas fragment library consisting of di- to pentapep-
from studies on lattices by Dill (362) and oth- tides and the Sippl potential. He predicted
ers that much of the secondary structure is the structures of mellitin, avian pancreatic
encoded into local patterns of hydrophobic polypeptide, and apamin (both fragments
and polar residues. from apamin and APP were included in the
GEOCORE (363). Amino acids are repre- library, so it is not so surprising that the rms
sented at the united atom level with explicit agreement for these two was around 1.5 A).
polar hydrogens with slightly reduced vdW ra- Bowie and Eisenberg (366) used the genetic
dii. The approach uses a discrete set of @, 9 algorithm with a fragment library of from 9 to
values for each residue type: Gly has six, Pro 25 residues and their own knowledge-based
has three, and most others have four or five potential. The fragment most similar to that
3 Known Receptors 125
of the sequence based on 3D profiles (367) was with a compact structure. This is done within
chosen. They were able to fold 50-residue frag- the framework of a simple and readily formal-
ments to within 4.0 A based on the error in the ized geometric model.
distance matrix. This avoids the problem of The system of intraglobular residue-resi-
embedding and generating the wrong chiral- due contacts of a protein of N residues may be
ity, which reduces the error estimate. represented as an N x N matrix of the carbon-
alphas, whose elements are ones (contact) or
3.5.3 Contact Matrix. Instead of searching zeros (lack of contact). Any reasonable defini-
the three-dimensional coordinate space, one tion of contact provides ones in the positions
can reduce dimensionality by focusing on gen- (i, i + 1) that correspond to a peptide bond
erating an optimal contact map in 2D (368). between two adjacent residues in the se-
The 3D coordinates of a correct contact map quence. The same is true for the residues cor-
can be generated within 1Arms for the carbon responding to the pair of cysteines forming a
alphas by distance geometry (369) or other disulfide bond (these data may not be available
methods (370). By use of the powers of the as input and may be used as a test of correct
contact matrix as constraints that limit the prediction). This set of contacts describes the
contact matrices to compact structures, explo- sequential covalent topology and is a constant
ration of various potential interactions be- part of the contact matrix which does not de-
tween secondary structural elements can be pend on the spatial structure of the polypep-
done efficiently. Because of the limited predic- tide chain; however, any additional informa-
tion on existing intraglobular contacts (e.g.,
tive ability of current secondary structure pre-
from NMR data or disulfide linkage) can easily
diction paradigms, a set of plausible inputs to
be introduced in the constant part A" of the
this procedure need to be generated, and the
contact matrix A:
best structures that are derived evaluated fur-
ther. This may be an efficient low resolution
A"= const. (3.1)
model builder and have some of the computa-
tional advantages of the hydrophobic core con-
The number of contacts involving a given
straints used by Dill and coworkers. This ap-
residue ni(the coordination number of the ith
proach based on geometrical constraints was
originally proposed by Kuntz et al. in 1976 residue) .
(371). The matrices of residue-residue con-
tacts provide, at the very least, a significant
partial solution to the prediction of long-range
intersegmental contacts through a formalism
explicitly describing the structure and some are assumed to be approximate constants (co-
structure-related properties of a protein glob- ordination number) and are determined by a
ule in terms of matrices of residue-residue separate algorithm based on residue type and
contacts without explicit knowledge of second- position in the sequence as well as predicted
ary structure predictions, although they can secondary structure.
be a useful source of constraints. In many A very important condition of spatial con-
ways, the success of this approach verifies the sistency of any given contact system is defined
conclusions based on lattice models that sec- by the relation
ondary structures are implicit in the pattern
of hydrophobic and hydrophilic residues and
the requirements of compactness. The resi-
due-residue contact matrices have some spe-
cial properties as mathematical objects that In other words, the squared matrix of A
can encode the geometrical requirements of should have its elements not less than c at any
compactness; the knowledge of these allows position where there is a nonzero element in
their treatment, starting with the sequence to matrix A. More generally, there exists a set of
generate a contact matrix that is consistent specific constraints regulating the relation-
ships of A with its powers A', A3, and so forth. @'(A)= 2 lnql,
These relations are entirely analogous with all contacts
those known from graph theory for connectiv-
ity (adjancency) matrices. The elements of the
squared matrix represent the number of paths
of length two, the cubed matrix, the number of
paths of length three, and so forth. Finally, an
obvious property of matrix A is its symmetry It is clear that proper formulation and pararn-
(for all contact definitions considered so far, if eterization of this problem need the analysis
of the voluminous experimental data on pro-
the ith residue is in contact with the jth, the
tein structure to derive the specific properties
jth residue is in contact with the ith, also).
to be emulated.
This methodology has been used to predict
the structure of loops of helical-bundle pro-
teins, given the positions of the connection to
Thus, conditions 3.1-3.4 define the set of ma- the helices (372). Because of the uncertainties
trices A, that correspond to spatially consis- in secondary structure predictions that are
tent, compact structures of protein chains. Be- used as inputs to constrain the search, any
sides these general conditions, mainly of single prediction of the method must be
geometrical origin, any matrix A describing viewed with skepticism. Development of scor-
the structure of a real protein molecule should ing functions that discriminate between alter-
also possess several more specific properties native models at the Ca level of resolution
that may be derived from studies of the gen- would complement this approach.
eral properties of protein structures as exem- Distance Geometry. Aszodi et al. (373-375)
plified in the Brookhaven Protein Databank. explored the use of distance geometry as the
The central idea of the approach is to use both metric for comparative modeling of struc-
the general and specific properties of the con- tures. In the CASP2 target set, the methods
tact matrix and its powers for the design of a generated an overall Ca rmsd of 1.85 A for
gain (energy, penalty) function, @(A),so that glutathione transferase based on close ho-
the task of determining an appropriate intra- mologs with known structure. It had more dif;
globular contact matrix might be formulated ficulty with PNSl and built models based on
two different proteins. The correct fold was
as a problem of maximization of @(A),
not obvious based on the CHARMM energy
values for the two models.
@(A) + max Neural Networks. PROBE (376) is an inte-
A grated suite of neural network modules that
predicts folding motif, secondary structure per
with respect to A under conditions 3.1-3.4. In residue, location of disulfide bonds, and sur-
the simplest and clearest form, @(A)may be face accessibility of each residue. No critical
expressed in terms of the probabilities of con- assessment of the accuracy of the results from
tact between the residues of different types (or this package was given in the description, but
groups), qG.The solution of the problem pro- is available for evaluation.
vides the most probable residue-residue con- Discrimination Between Folds. Because of
tact matrix A in the inherent error in potential functions, sec-
ondary structure prediction methods, limited
@(A)= I1 qij+ max, sampling, and so forth, one can anticipate that
all contacts (3.6) prediction of a variety of alternative struc-
A
tures (perhaps, by several methods) would be
more likely to generate a correctly folded
which is the sense of the maximum likelihood structure than any single prediction. The
principle. This condition may be rewritten in problem then becomes one of discriminating
the form between the correct structure and alterna-
4 Unknown Receptors
tives that may be very similar in overall qual- molecular dynamics and the Monte Carlo
ity of fold. Park et al. (377) evaluated the abil- method, are not possible. One can only at-
ity of 18 low and medium resolution energy tempt to deduce an operational model of the
functions to discriminate correct from incor- receptor that gives a consistent explanation of
rect folds. Functions that were effective in the known data and, ideally, provides predic-
protein threading were not competitive in dis- tive value when considering new compounds
criminating the X-ray structure from ensem- for synthesis and biological testing. The utility
bles of plausible structures, and vice versa. of such an approach has been demonstrated by
Obviously, these empirical functions have Bures et al. (2351, who used the pharmaco-
been derived to optimize their discriminate phoric pattern derived for the plant hormone
abilities for a given problem class and the auxin, to find four novel classes of active com-
training (selection) sets were different. In pounds by searching a corporate three-dimen-
other words, the true physics has not been sional database of structures. In many ways,
captured by any of the methods. Crippen (378) the approach that has evolved is analogous to
also raised serious doubts concerning the abil- the American parlor game of 20 questions, in
ity of "empirical" energy functions to identify which the medicinal chemist poses the ques-
correctly folded structures based on studies tions in terms of novel three-dimensional
with simple lattice models. Thomas and Dill chemical structures and attempts to interpret
(379) described an iterative approach EN- the response of the receptor in a consistent
ERG1 to generate pairwise residue "energy" manner. The underlying hypothesis is a struc-
scores from the PDB. This is one alternative to tural complementarity between the receptor
the Boltzmann-based pairing frequency anal- and compounds that bind. In the same way
ysis used by others (380).The assumption that that the receptor's existence could be deduced
pairing frequencies are independent is not based on pharmacological data, some low res-
true based on lattice simulation and, there- olution three-dimensional schematic of the re-
fore, the underlying assumption of the Boltz- ceptor, at least with regard to the active site or
mann approach is flawed. The study that used binding pocket, can be deduced by analysis of
two different sets of proteins to thread was structure-activity data. It is the purpose of
able to classify 88%of 121 proteins having less this section to summarize the current ap-
than 25% homology and no homologs in the proaches in use for receptors of unknown .
training set. The method appears to separate three-dimensional structure and evaluate
interactive free energies from chain configura- their utility. For purposes of this section, re-
tional entropies and thus give a more realistic ceptor is often used in a completely generic
estimate. sense, including enzymes and DNA, for exam-
ple, as the macromolecular component (i.e.,
binding site) of recognition of biologically ac-
4 UNKNOWN RECEPTORS tive small molecules.
Until recently, receptors were hypothetical 4.1 Pharmacophore versus Binding-Site

macromolecules whose existence was postu- Models
lated on the basis of pharmacological experi-
ments. Although recent advances in molecular 4.1.1 Pharmacophore Models. It is often
biology have led to cloning and expression of useful to assume that the receptor site is rigid
many of those receptors whose existence was and that structurally different drugs bind in
postulated as well as a plethora of subtypes, conformations that present a similar steric
progress in most cases in defining their three- and electronic pattern, the pharrnacophore.
dimensional structure has yet to provide the Most drugs, because of inherent conforma-
medicinal chemist with the necessary atomic tional freedom, are capable of presenting a
detail to design novel compounds. Without de- multitude of three-dimensional patterns to a
tailed information about the three-dimen- receptor. The pharmacophoric assumption led
sional nature of the receptor, conventional to a problem statement that logically is com-
computationally based approaches, such as posed of two processes. First is the determina-
X-A
Figure 3.27. (a) Pharmacophore hypothesis with correspondence of functional groups in drugs, A =
A', B = B', C = C'. (b) Binding-site hypothesis by use of drugs with hypothetical binding sites
attached (X, Y, and Z overlap).
tion, by chemical modification and biological macophoric groups with retention of activity.
testing, of the relative importance of different This is the basis of the current activity (381,
functional groups in the drug to receptor rec- 382) in peptidomimetics, in which the amide
ognition. This can give some indication of the backbone of peptides has been replaced by
nature of the functional groups in the receptor sugar rings, steroids (383, 384), benzodiaz-
that are responsible for binding of the set of epines (385), or carbocycles (386, 387) (Fig.
drugs. Second, a hypothesis is proposed (Fig. 3.28). In the pharmacophoric hypothesis,
3.27) concerning correspondence, either be- physical overlap of similar functional groups is
tween functional groups (pharmacophore) in assumed; that is, the carboxyl group fr'om
different congeneric series of the drug or be- compound A physically overlaps with the cor-
tween recognition site points postulated to ex- responding carboxyl group from compound B
ist within the receptor (binding-site model). and with the bioisosteric tetrazole ring of com-
The intellectual framework for use of pound C.
structure-activity data to extrapolate infor- One caveat that must be remembered is the
mation regarding the ligand's partner, the re- probability of alternate, or multiple, binding
ceptor, is the concept of the pharmacophore. modes. The interaction of a ligand with a bind-
The pharmacophore, a concept introduced by ing site depends on the free energy of binding,
Ehrlich at the turn of the 20th century, is the a complex interaction with both entropic and
critical three-dimensional arrangement of mo- enthalpic components. Simple modifications
lecular fragments (or distribution of electron in structure may favor one of several nearly
density) that is recognized by the receptor energetically equivalent modes of interaction
and, in the case of agonists, that causes subse- with the receptor, and change the correspon-
quent activation of the receptor upon binding. dence between functional groups that has pre-
In other words, some parts of the molecule are viously been assumed and supported by exper-
essential for interaction, and they must be ca- imental data. Changes in binding mode of an
pable of assuming a particular three-dimen- antibody FAB fragment to progesterone and
sional pattern that is complementary to the its analogs have been shown by crystallogra-
receptor to interact favorably. One corollary of phy (390,391) of the complexes. For this rea-
the pharmacophoric concept is the ability to son, analysis of agonists as a class is usually
replace the chemical scaffold holding the phar- preferred, given that the necessity to both
Jnkna
(4
= Tyr-Gly-Gly-Phe-Leu-OH
(Enkephalin)
'0 = H2N-Ala-Gly-Cys-Lys-Asn-
Phe-Phe-Trp-Lys-Thr-Phe-
Thr-Ser-Cys-OH
(Somatostatin)
Figure 3.28. Peptidomimetics that have been designed based on iterative introduction of con-
straints into parent peptide and hypotheses concerning receptor-bound conformation. Enkephalin
mimetic (3881, RGD platelet GPIIbLIIa receptor antagonists (384, 385), thyroliberin [TRH (38711,
and somatostatin (383,389).For an overview of recent approaches to peptidomimetic design, see the
review by Bursavich and Rich (382).
bind and trigger a subsequent transduction the receptor that interact with ligands as be-
event is more restrictive than the simple re- ing the common features for recognition of a
quirement for binding shared by antagonists set of analogs. When pharmacophore and
(336). Compounds that clearly are inconsis- binding-site hypotheses are compared, the
tent with models derived from large amounts binding-site model is physicochemically more
of structure-activity data may be indicative of plausible, in that overlap of functional groups
such changes in binding mode, and may re- in binding to a receptor is more restrictive
quire a separate structure-activity study to than assuming the site remains relatively
characterize their interaction. Despite its lim- fixed when binding different ligands. How-
itations, the pharmacophore approach is often ever, the number of degrees of freedom in
the most appropriate because of lack of de- binding-site hypotheses, represented by the
tailed information regarding the receptor and necessary addition of virtual bonds between
can yield useful insights, as seen in the case of groups A and X, B and Y, and C and Z in Fig.
clinical success with tyrosine kinase inhibitors 3.27, is greater. Additional degrees of freedom
(392,393) and other recent examples (394). complicate subsequent conformational analy-
ses and may preclude any conclusions unless a
4.1.2 Binding-Site Models. One major defi- sufficiently diverse set of compounds is
ciency in the approach described above is the available.
requirement for overlap of functional groups Other approaches to this problem have em-
in accord with the pharmacophoric hypothe- phasized comparison of molecular properties
sis. Although it is true that molecules having rather than atom correspondences. Kato et al.
functional groups that show three-dimen- (395) developed a program that allows con-
sional correspondence can interact with the struction of a receptor cavity around a mole-
same site, it is also true that a particular ge- cule emphasizing the electrostatic and hydro-
ometry associated with one site is capable of gen-bonding capabilities. Other molecules can
interacting with equal affinity with a variety then be fit within the cavity to align them.
of orientations of the same functional groups. This is similar in concept to the field-fit tech-
One has only to consider the cone of nearly niques available in the CoMFA module of
equal energetic arrangements of a hydrogen- SYBYL, in which the molecular field (electro-
bond donor and acceptor to realize the prob- static and steric) surrounding a selected ipol-
lem. Sufficient examples from crystal struc- ecule becomes the objective criterion for align-
tures of drug-enzyme complexes and from ment of subsequent molecules for analysis. An
theoretical simulation of binding compel the example emphasizing molecular properties in
realization that the pharmacophore is a limit- pharmacophoric analysis was given by Moos et
ing assumption. Clearly, the observed binding al. (396) on inhibitors of CAMPphosphodies-
mode in a complex represents the optimal po- terase 11.
sition of the ligand in an asymmetric force
field created by the receptor that is subject to 4.1.3 Molecular Extensions. If we assume
perturbation from solvation and entropic con- the binding-site points remain fixed and can
siderations. Less restrictive is the assumption augment our drug with appropriate molecular
that the receptor-binding site remains rela- extensions that include the binding site (i.e., a
tively fixed in geometry when binding the se- hydrogen-bond donor correctly positioned
ries of compounds under study. Experimental next to an acceptor), we can then examine the
support for such a hypothesis can be found in set of possible geometrical orientations of site
crystal structures of enzyme-inhibitor com- points to see whether one is capable of binding
plexes, where the enzyme presents essentially all the ligands. Here, the basic assumption of
the same conformation, despite large varia- rigid site points is more reasonable, at least for
tions in inhibitor structures; studies of HIV-1 enzymes that have evolved to catalyze reac-
protease complexed with diverse inhibitors tions and must, therefore, position critical
support this view (171). groups in a specific three-dimensional ar-
In recent years, therefore, there has been rangement to create the correct electronic en-
an increasing effort to focus on the groups of vironment for catalysis. The program checks
4 Unknown Receptors
the possible positioning of the zinc relative to

ACE inhibitors such as captopril. Analyses of
nearly 30 different chemical classes (Fig. 3.31)
of ACE inhibitors led to a unique arrangement
of the components of the active site postulated
to be responsible for binding of the inhibitors.
The displacement of the zinc atom in ACE to a
location more distant from the carboxyl-bind-
Figure 3.29. The use of active-site models in the ing Arg seen in carboxypeptidase A is compat-
Active Analog Approach. The structure shown is one ible with the fact that ACE cleaves dipeptides
of a series of ACE inhibitors analyzed. The thick from the C-terminus of peptides, whereas car-
gray lines are noncovalent interactions between the
boxypeptidase A cleaves single amino acid
inhibitor and active-site points in the enzyme. The
dashed lines correspond to the six interatomic dis- residues.
tances monitored for each of the inhibitors ana- Visualization of the OMAP is useful to
lyzed. judge the additional information introduced
as each new compound is added (Fig. 3.32).
this hypothesis by determining whether one Computationally, it is much more efficient to
or more geometrical arrangements of the pos- treat the set of noncongeneric compounds si-
tulated groups of site points is common to the multaneously (111, 399), as we shall see, but
set of active compounds. Such a geometrical reassuring when identical results are obtained
arrangement of receptor groups becomes a if one uses the sequential procedure introduc-
candidate binding-site model, which can be ing each molecule in turn, where intermediate
evaluated for predictive merit. results may be visually verified. The use of
In the study of the active site of angiotensin computer graphics to confirm intermediate
converting enzyme (ACE) by Mayer et al. processing of data in convenient display
(397),a binding site model (Fig. 3.29) was used modes becomes increasingly more important
by incorporating the active-site components
as the individual computations and numbers
as parts of each compound undergoing analy-
of molecules under consideration increase.
sis. As an example, the sulfhydryl portion of
captopril was extended to include a zinc bound
.
4.1.4 Activity versus Affinity. Given a con-
at the experimentally optimal bond length and
bond angle for zinc-sulfur complexes (Fig. sistent model of either type, a limitation is
3.29). The orientation map (OMAP) (398), that one can only ask whether the compound
which is a multidimensional re~resentationof under consideration can present the three-di-
the interatomic distances between pharma- mensional electronic pattern (pharmaco-
cophoric groups (Fig. 3.301, was based on the phore) that is the current candidate. In other
distances between binding-site points such as words, one is limited to predicting the pres-
the zinc atom with the introduction of more ence or absence of activity, a binary choice.
degrees of torsional freedom to accommodate Even the presence of the appropriate pattern
is insufficient to ensure biological activity. For
example, competition with the receptor for oc-
cupied space by other parts of the molecule
can inhibit binding and preclude activity. We
can thus postulate the following conditions for
activity:
1. The compound must be metabolically sta-

ble and capable of transport to the site for
receptor interaction (interpretation of in-
Figure 3.30. Distances used in five-dimensional active compounds may be flawed by prob-
OMAP used in analysis of ACE inhibitors. lems with bioavailability).
HS- CH2- N ""7 OH

I
O=P-NH-CH-C
I 11 COOH
0 COOH OH 0
CH2
I
CHz e C H 2 - P H - C , N HII\ ~ H 2 I
I
O=P-NH-CH-C' CH2 O COOH
I I
I' COOH
OH 0
CH3 CH3 CH3

I I I
HS-CH2-CH2-C HS-CH2-CH-C CH-CH2-CH2-CH-NH-CH-C
I
I' COOH COOH I' COOH
0 0 0
HOOC- CH2- CH2

I' COOH
COOH 0 COOH
COOH
Figure 3.31. Compounds from different chemical classes of ACE inhibitors used in active-site
analysis. Used with permission (397).
2. The compound must be capable of assum- 3. The compound must not compete with
ing a conformation that will present the the receptor for space while presenting
pharmacophoric or binding-site pattern the pharmacophoric or binding-site
complementary to that of the receptor. pattern.
4 Unknown Receptors
COOH
0 CH2-CH2-CH2-P-CH2-C-N
I
O COOH II
OH O COOH
CH2- CH2- NH2

I 0 CH3
CHz
I ~1-0-cH-c-N
CH2 I I1
YH2 OH O COOH
CHZ-CH2-CH-NH-CH-C-N
I II
COOH O COOH
CH3
CH3--(
0 CH2 CHP
II I I
HO-P-NH-CH-C-NH-CH
I II I
SH O COOH 0 COOH
HO
CH3
8 \ CH2-CH2-CH-NH-CH-C-N
I
CH3
I
O
II
3
COOH
ecH2-TH-NH-c
CH2
I
SH
O
,cH~,
COOH
NH CH3
/
0
4
\
C-CH2-CH2-CH-NH-CH-C-N
I
COOH
' 3 II
O COOH
HS-CH2-CH2-CHz COOH
Figure 3.31. (Continued.)
Once these conditions are met, we can at- cophoric pattern, but incapable of binding, to
tempt to deal with the potency, or binding af- help determine the location of receptor-occu-
finity. This belongs to the domain of three- pied space in relation to the pharmacophore
dimensional quantitative structure-activity (receptor-mapping) (402). This allows a crude,
relationships (3D-QSARs) (400) and we illus- low resolution map of the position of the recep-
trate the use of a particular variant, CoMFA tor relative to the pharmacophoric elements
(187,401),on ACE inhibitors at the end of this and indicates in which directions chemical
chapter. Condition 3.3 allows us to utilize modifications may be productive.
compounds capable of presenting the pharma- The number and diversity of compounds
134 Molecular Modeling in Drug Desig~
Figure 3.32. Change in OMAP (projection of three of the five dimensions) as new compounds were
introduced to analysis of ACE inhibitors (397). Left is original OMAP of compound 1 (Fig. 3.30). Right
is OMAP after completion of analysis.
available for analysis determine the method- important and then comparison of moleculai
ology to be used. If there is a limited data set, properties becomes of interest. A major im,
then the pharmacophoric approach should be pediment to analysis is the definition of a corn,
assessed first because of its fewer degrees of mon frame of reference by which to align mol.
freedom. If no pharmacophoric patterns are ecules for comparison. This is equivalent tc
consistent with the set of analogs, then intro- solving the three-dimensional pharmaco,
duction of logical molecular extensions to en- phoric pattern, and implies that one has dis.
able the active-site approach is warranted. Op- tinguished those properties of the molecule!:
erationally, one first determines the set of under consideration in a manner similar t c
potential pharmacophoric patterns consistent the receptor. Initial efforts to rationalize
with the set of active analogs [leading to its structure-activity relationships (SARs)among
name of Active Analog Approach (398)l. If noncongeneric systems was hampered by ar
there are sufficient data, then a unique phar- "RMS mentality." That is, a point of view thal
macophore, or active-site model, may be iden- required atomic centers to align rather than
tifiable. The basic assum~tionbehind efforts overlap of steric and electronically simila~
to infer properties of the receptor from a study grouping of atoms. An example would be re.
of structure-activity relations of drugs that quiring the six atoms of aromatic benzene
bind is the idea of complementarity. It follows rings to overlap at each of the six atoms of the
that the stronger the binding affinity, the ring vertices rather than simple requirements
more likely that the drug fits the receptor cav- for coincidence and coplanarity that would
ity and aligns those functional groups that recognize the torus of electron density that the
have specific interactions in a way comple- rings share in common (Fig. 3.33). In conge-
mentary to those of the receptor itself. cer- neric series, the difficulty in assignment oi
tainly, our understanding of intermolecular correspondence is less (nonexistent by defini-
interactions from studies of known complexes tion). This allows a variety of approaches, in-
does not dissuade us of this notion, but may cluding those based on molecular graph the-
make us somewhat skeptical of the naive mod- ory (404-4071, to detect similarities between
els that often result from such efforts. An- molecules that can form the basis of a correla-
drews et al. (403) reviewed efforts of this type tion analysis. Extrapolation outside of the
with regard to CNS drugs. group of congenerically related compounds on
Clearly, the key to insight relies on chemi- which the analysis was based would appear
cal modification to determine the relative im- difficult, if not impossible.
portance of functional groups for molecular Although it is simpler to start an analysis
recognition. Often more subtle effects than with a congeneric series to identify the recog-
the simple presence or absence of a group are nition elements, diversity in chemical struc-
4 Unknown Receptors 135
Figure 3.33. Torus of electron density representing benzene ring. Atom-to-atom correspondences
of ring atoms used in normal fitting routines lead to overconstrained fits.
tures implies more information regarding the dimensional patterns and generates an
conformational requirements of the system. A opportunity for determining a unique solu-
congeneric series requires that the basic tion.
chemical framework of the molecule remains
constant and that groups on the periphery are 4.2 Searching for Similarity
either modified (e.g., aromatic substitution) or
substituted (e.g., tetrazole for carboxyl func- 4.2.1 Simple Comparisons. To gain insight .
tional group). Implicit in this concept is the into molecular recognition, subtle differences
notion that the compounds bind to the recep- in molecules must be perceived. Comparisons
tor in a similar fashion and, therefore, the can be divided into two categories: those that
changes are localized and comparable for each are independent of the orientation and posi-
position of modification. Introduction of de- tion of the molecule and those that depend on
grees of freedom in the substituents as well as a known frame of reference. Simple compari-
consideration of differences in properties that sons deal with properties independent of a ref-
are conformationally dependent, such as the erence frame. For example, the magnitude of
electric field, require conformational analysis the dipole moment is frame independent, but
in an effort to determine the relevant confor- the dipole itself is a vectorial quantity depen-
mation for comparison. dent on the orientation and conformation of
The problem can be divided into two: what the molecule, Similarly, the bond lengths, va-
are the aspects of the molecules that are in lence angles and torsion angles, and inter-
common and that may provide the basis for atomic distances are independent of orienta-
molecular recognition, and which conformation. The distance matrix, composed of the set
tion for each molecule is appropriate to con- of interatomic distances (Fig. 3.34), is a conve-
sider. For the first problem, studies on a con- nient representation of molecular structure
generic series can often yield valuable insight. that is invariant to rotation and translation of
For determination of the three-dimensional the molecule, but which reflects changes in
lrrangement of the crucial recognition ele- internal degrees of freedom. The distance
nents, diversity in the chemical scaffolds im- range matrix is an extension (Fig. 3.34) that
loses different constraints on possible three- has two values for each interatomic distance
representing the upper and lower limits, or

range, allowed for a given interatomic dis-
tance arising from the conformational flexibil-
ity of the molecule. Crippen (408) developed a
procedure that will generate conformations
that conform to the constraints represented
by such a distance range matrix. This ap-
proach is used to generate structures from ex-
perimental measurements such as nuclear
Overhauser effects in NMR experiments. The
use of distance range matrices in the identifi-
cation of pharmacophoric patterns was ini-
Figure 3.34. Distance matrix (a) in which unique tially illustrated by Marshall et al. (398) (Fig.
interatomic distances for a particular conformation 3.351, and has recently been used by Clark et
of a molecule are stored. Distance range matrix (b)
al. (409) in three-dimensional databases for
in which ranges of interatomic distances represent-
ing conformational flexibilty of molecule are stored. representing the conformational flexibility of
U = upper bound, L = lower bound. molecules. Pepperrell and Willett (410) exam-
ined several techniques for comparing mole-
cules by use of distance matrices. Other de-
scriptors for comparison of pharmacophoric
02
Figure 3.35. Distance range
matrices used for illustra-
tion of analysis of musca-
rink receptors (398). Used
with permission.
lnknown Receptors 137
Paliterns and retrieval of similar substruc-

tur.es are under active investigation (411).
4.2.2 Visualization of Molecular Properties

(412). Although straightforward displays of
molecular structure have proved to be ex-
tremely useful tools that enable medicinal
"
chf ?miststo visualize molecules and to compare

their structural properties in three dimensions,
of f?vengreater potential utility is the display of
the! various chemical and physical properties of
molecules in addition to their structures. Such
displays allow the comparison not only of molec-
ular shapes and three-dimensional structures,
bul;also of molecular properties such as internal
enc:rgy, electronic charge distribution, and hy-
drc)phobic character. A number of different Figure 3.36. Molecular electrostatic potential for
Prc per ties have been displayed (412) in this water. Positive potential superimposed on right sur-
manner in an effort to gain insight into molecu- rounding hydrogens. Negative potential on left sur-
lar recognition in a series of compounds. rounding oxygen.
Among the more useful properties is the
elelctrostatic potential. Any distribution of
elelctrostatic charge, such as the electrons and initially derived to display empirically deter-
nuc:lei of a molecule, creates an electrostatic mined potentials on the surface of proteins,
pot,entialin the surrounding space that at any but have since been used widely to display the
given point represents the potential of the electrostatic potentials on sets of small mole-
molecule for interacting with an electrostatic cules for comparative purposes.
chsrge at that point. This potential is a very Other graphical uses of the electrostatic po-
use!ful property for analyzing and predicting tential have been developed by Davis et al.
molecular reactive behavior. In particular, it (413),who were able to graphically align cyclic .
has;been shown to be an indicator of the sites AMP and cyclic GMP, based on the superim-
or 1.egions of a molecule to which an approach- position of their respective electrostatic poten-
ing electrophile or nucleophile is initially at- tial minima, and by Weinstein et al. (414),who
tra,cted or from which it is repelled (Fig. 3.36). oriented 5-hydroxytryptamine and 6-hydroxy-
The maior obstacle to use of electrostatic
" tryptamine based on the alignment of an elec-
Potentials in the comparison of different mol- trostaticallv derived "orientation vector."
"
ecu les has been the sheer volume of informa- In a similar procedure to that described for
ti01I produced. The traditional means of dis- the display of electrostatic potential, Cohen
pla:ying such large amounts of data has been to and colleagues developed a technique whereby
dis]play the electrostatic potential around a the steric field surrounding- a molecule can be
mo.lecule as a two-dimensional contour map. displayed on a graphics screen as a three-di-
Thcs advent of computer graphics techniques mensional isopotential contour map (415).
ha! re improved the situation by allowing The map is generated by calculating the VDW
thr4ee-dimensional contour maps to be dis- interaction energy between the molecule and
pla:yed in color on the graphics screen and ma- a probe atom or molecule placed at varying
nip.ulated in real time along with a display of points around the molecule of interest. This
the molecule itself. An alternative mode for interaction energy is then contoured at spe-
disl)laying molecular electrostatic potentials cific levels to give the most stable VDW con-
is tjo employ a dotted surface representation, tour lines around the molecule, that is, the
witlh the dots taking on an appropriate color contour that represents the most favorable
accc~rdingto the electrostatic potential value steric position for the probe as it is moved
at t~he relevant location. Such techniques were around the target.
Lattice . ..................................
......... .....
a. .). .a. .a.
.....................................................
-. -.* ..
- ..;,-.
...-.-5.. -....."...."....'.;s;s;s.
..............
I.
PLS
Figure 3.37. Calculation of electrostatic and
VDW fields surrounding a series of molecules
in defined orientations are used as a basis for
Equation \1
31) QSAR correlations in C ~ M F A(187,401). Bio = y + a x SO01 + b x SO02 + ..... + m x S998 + n x E001
used with permission.
A similar three-dimensional contour repre- In situations where, either from previous

sentation of a molecule can be obtained for QSAR work or from experimental evidence, it
both the electrostatic and steric fields of a mol- is known or suspected that differences in the
ecule within the comparative molecular field reactivity of a set of molecules are attributed
analysis (CoMFA) methodology that has been primarily to their hydrophobic rather than
developed by Cramer (187) to investigate 3D- their electrostatic properties; it is probably of
QSARs (400). In this procedure, the molecule more use to compare molecular surfaces that
is surrounded by a regular lattice of points, at display hydrophobicity or polarity informa-
each point of which a van der Wads and an tion. Indeed, dotted molecular surfaces color-
electrostatic interaction energy between the coded by hydrophobic character have been
molecule and a probe atom is computed (Fig. used very successfully by Hansch and cowork-
3.37). Isocontours can then be generated ers to rationalize QSARs from several differ-
around individual molecules, displayed graph- ent systems (418,419). This concept has been
ically, and they can be statistically compared extended to calculate the hydrophobic field
throughout a series of molecules in an attempt surrounding a molecule by Kellogg and Abra-
to generate 3D-QSARs and hence to rational- ham (420,421) and utilized in CoMFA studies.
ize activity data. This is very similar to the
4.3 Molecular Comparisons
GRID program (186), which uses various
probe groups (416) to map potential interac- To compare molecules in a general way, a
tions around a molecule. Inductive logic pro- means of superposition, or correctly orienting
gramming has been combined with CoMFA to the molecules in the same reference frame,
develop a new approach (417) to pharmaco- must be available. A procedure for positioning
phore mapping that does not require explicit an atom in the molecule at the center of the
superimposition of compounds. coordinate frame with other atoms positioned
4 Unknown Receptors 139
0-0 $
/
Determine
Centroid
.*.
8 /
bet!
------,
Normals
Du
Atom
Carbon
Temperature
factor
60
Atomic
number
25
Nitrogen 55 25
Figure 3.38. Construction of dummy vector per- Oxygen 50 25
pendicular to plane of aromatic ring at centroid that Sulfur 67 35
allows superposition and coincidence of aromatic Phosphorus 70 35
rings by fitting endpoints (Du) of dummy vector Hydrogen 40 15
without requiring superposition of ring atoms. Bromine 65 50
Chlorine 60 35
Fluorine
along coordinate axes can be used, or the mol-
Iodine
ecules can be successively fit to one that is
Sodium
used as the standard orientation. Danziger Potassium
and Dean (422) described an approach that Calcium
will find geometric similarities in positions of Lithium
hydrogen-bonded atoms between two mole- Aluminum
cules. Least-squares-fitting procedures for Silicon
designated atoms allow selectivity in orienting-
the molecules with predetermined conforma-
tions in the most appropriate manner. Kears- Figure 3.39. Set of parameters to generate pseudo-
electron density maps of molecules that can be con-
ley (423) described an efficient method for fit- toured to approximately represent VDW surface
ting a series of molecules when atom-atom (Ho and Marshall, unpublished).
associations have been previously defined be-
tween members of the series. In some cases,
the use of dummy atoms allows geometric su- three-dimensional grid that surrounds the
perposition of groups such as aromatic rings molecule whose atoms are replaced by dummy
without requiring superposition of the atoms Gaussian atoms. Atom types are characterized
composing the ring. By defining the centroid by a half-width and an integrated density, cho-
of the ring and erecting a normal to the plane sen so that the Gaussians have a fixed value at
of the ring, the dummy atom at the end of the a distance equal to the VDW radius (Fig. 3.39).
normal and the centroid dummy atom can be Such density maps may be contoured in three
used to superimpose the ring on another ring dimensions to provide a chicken wire-like en-
with similar dummy atoms (Fig. 3.38). This velope around the molecule that corresponds
method leads to coincidence and coplanarity of to the van der Wads surface.
the two ring systems without requiring the A concomitant benefit of this technique is
atoms composing the rings to be coincident. In that estimates of the molecular surface area
other words, the rings can be viewed as two and volume are generated as by-products of
toruses of electron density without overem- the contouring routines, whether the surface
phasizing the positions of the atomic nuclei. In is being drawn around one or several mole-
numerous studies [see review by Andrews et cules. Additionally, the generated surfaces
al. (403)l of biogenic amine ligands, this and volumes are readily susceptible to logical
method of comparison of the aromatic ring operations, such as union, intersection, or
components is essential to allow alignment of subtraction, enabling the rapid determination
the nitrogens. of, for example, union or difference volumes
among a series of molecules.
. . - One method of dis-
4.3.1 Volume Mapping. Once one has fixed the molecules in a com-
playing molecular surfaces that retains the mon frame of reference, then comparison by a
ability to transform the display interactively variety of techniques becomes feasible. As an
has been developed by Marshall and Barry example, difference in volume may be impor-
(424). The procedure involves computing a tant in understanding the lack of seen activity
molecular pseudo-electron density map on a in compounds that appear to possess all the
prerequisites for activity seen in others in the troduced flexibility in the comparison of mol-
series. In a congeneric series, a significant por- ecules based on their electrostatic potential
tion of the molecular structure is common to fields.
the molecules under comparison. This com-
mon volume that is shared logically should not 4.3.3 Directionality. If one is comparing
contribute to differences in activity. By sub- molecules that share interaction at a common
traction of the volume shared by two mole- site on a biological macromolecule, it is logical
cules, one obtains a difference map in which to assume that they may do so by interacting
the volume occupied by one molecule and not with similar sites in the receptor with optimal
the other remains (398). Correlations between interaction shown by molecules with correctly
oriented functional groups. If one does not
the shared volume and the biological activity
have a three-dimensional model of the recep-
of a congeneric series of inhibitors of DHFR
tor from which to deduce potential interactive
have been shown by Hopfinger (425). Simon
sites, then one can only attempt to deduce the
and his colleagues (426)emphasized the use of potential interactive receptor-subsites by ex-
both overlapping volume and nonoverlapping amination of the molecules that interact with
volume in QSAR studies in a quantitative them. Systematically, one can vary the confor-
methodology, the minimal steric difference, or mation of a molecule and record the relative
MTD method. This approach has been en- orientation of groups postulated, or shown ex-
hanced to allow comparison of low energy con- perimentally, to play a dominant role in inter-
formers of each molecule and use of those that molecular interactions. In this way, one can
are sterically most similar. An application to map out the directionality of interactions of
substrates of acetylcholinesterase illustrates each functional group of the ligand in a com-
this facility (427). mon frame of reference. Comparison of these
maps can often lead to hypotheses regarding
4.3.2 Field Effects. Once the frame of refer- pharmacophoric groups and their correspon-
ence has been established, other properties of dence between molecules.
molecules, such as the electrostatic field, can
be compared as well. Because the electrostatic 4.3.4 Locus Maps. One can generate a lo-
properties can be sampled on a grid, differ- cus plot in coordinate space showing all $he
ences between the values of two molecules can potential locations of one group relative to an-
be calculated and a difference map contoured. other by fixing one group in a particular orien-
Such difference maps (428) highlight more tation as a frame of reference and recording all
clearly the similarities and differences be- possible coordinates of the other. An example
would be the relative positions of the basic ni-
tween molecules. Hopfinger (429) integrated
trogen to the aromatic ring in compounds such
the difference between potential fields and
as dopamine interacting with biogenic m i n e
showed this parameter to be useful in QSAR receptors. One must choose the common frag-
studies. ment (in the example, the aromatic ring) of
An approach to statistically quantifying the each molecule and its orientation to generate a
similarity between two molecular electrostatic similar frame of reference, so that the locus of
potential surfaces was developed by Dean and positions of the atom (the basic nitrogen) leads
coworkers (430,431) and by Richards and co- to a meaningful comparison across a series of
workers (215). Here, the previously deter- molecules (Fig. 3.40).
mined molecular electrostatic potential sur-
faces are projected outward onto surrounding 4.3.5 Vector Maps and Conformational
spheres that provide a common surface of ref- Mimicry. Often, one is more interested in ac-
erence, and then statistical analyses are per- cessing the directionality of potential interac-
formed over the points on this common sur- tion rather than simply looking for overlap of
face in an attempt to quantify the similarities atoms such as the basic nitrogen. In this case,
or differences between the two molecules unfor example, one is interested in determining
der consideration. Burt and Richards (432) in- both the locus of the lone pair of the nitrogen
4 Unknown Receptors
Figure 3.40. Locus of sterically allowed positions

of nitrogen atom in dopamine relative to aromatic
ring.
and the nitrogen as the ordered pair of coordi-

nates determines a vector in the chosen frame
of reference. The resulting plot of the locus of
all possible vectors of the nitrogen lone pair
constitutes a vector map. The combination of
positional information with relative orienta-
tion offers considerable insight into potential Figure 3.41. Vector map of the orientations of the
interactions with a hypothetical receptor. The C"-CPbond of Alal, with the methylamide fixed as a
work of Lloyd and Andrews (233) postulating frame of reference of the dipeptide Ac-Ala-Ala-NH-
a common theme in CNS receptors based on CH, in which the central amide bond was cis (433).
Used with permission.
an underlying biogenic m i n e pattern can be
rationalized using the vector-map approach.
The use of vector maps is essential to the
assessment of conformational mimicry, in tion. The linear dipeptide, acetyl-Ala-Ala-
that one attempts to determine the statistical methylamide, with the amide bond between
probability that the conformation essential for the two alanine residues in the cis-conforma- .
activity will be preserved with a given chemi- tion, and the tetrazole analog, acetyl-
cal modification. An example will serve to il- AlaWCN,]Ala-methylamide, were modeled
lustrate this concept and its application. Mod- using the coordinates derived from dike-
ification of arnide bonds (introduction of topiperazines for the cis-amide bond or from
amide isosteres) in peptide drugs to increase the crystal structure of the cyclic tetrazole
metabolic stability may alter the potential ac- dipeptide. A systematic, or grid, search, which
cessible conformations. This may preclude the determines the sterically allowed conforma-
compound containing the isostere from adopt- tions by systematically varying the torsional
ing the correct orientation for receptor recog- degrees of freedom, was used to generate a
nition and activation. In the general case, one Ramachandran plot for each of the pairs of
has no specific information regarding which backbone torsional angles (a, 9)associated
particular conformation is biologically rele- with each amino acid residue. The rigid geom-
vant and can only assess whether the chemical etry approximation was used with the set of
modification mimics the amide bond in its con- scaled VDW radii, shown by Iijima et al. (109)
formational effects. This can be quantitatively to reproduce the experimental crystal data for
assessed by the comparison of the percentage proteins and peptides. When the cis-amide
of vectors of the vector map of the parent dipeptide model was calculated, the orienta-
amide bond that can be found in a comparable tions of the Ca-CPbond of Ala-1 with the meth-
vector map of the analog. ylamide fixed as a frame of reference were
Work by Zabrocki et al. (433) on the use of recorded for each sterically allowed conforma-
1,5-&substitutedtetrazole rings as surrogates tion (Fig. 3.41). Use of the same orientation of
for the cis-amide bond illustrates this applica- the methylamide in the tetrazole allowed the
program to determine which vectors, or orien-

tations of the Ala-1 side chain relative to the
methylamide, were common to both dipep-
tides. Alternatively, the acetyl group was used I
as the fixed frame of reference and the side- HN H- N
chain orientation of Ala-2 was used to monitor
conformational mimicry. Because the quanti- cffi
tative results were essentially the same, the
p-turn
measurement of mimicry was shown to be in-
de~endent
* of the chosen frame of reference. A
torsional increment of 10 degrees was used,
and a side-chain vector was assumed to corre-
spond if both the carbon-a and carbon-p were
within 0.2 A of the coordinates of another vec-
tor. The percentage of orientations available
to the analog that are available to the parent is
referred to as the conformational mimicry in-
dex. For the tetrazole surrogate of the cis- p-dihedral angle
amide bond, the conformational mimicry in-
dex is 88% [the number of vectors (747) Figure 3.42. Definition of new parameter P, the
common to both the tetrazole and cis-amide dihedral angle between the backbone atoms (&I-
divided by the total number of vectors (849) d&,-&(,,-N(,, of peptides, used to describe the to-
allowed for the cis-amide]. The tetrazole ana- pography of reverse turns (434,435).
log has more conformational freedom than the
cis-amide model with 33,359 conformers al- two distinct approaches to this problem. The
lowed compared to 14,912 allowed for the first that is associated with minimization
cis-amide of the 364 (or 1,679,616) possible methodology focuses on the existence issue. Is
conformations. This difference was easily vi- there a conformation that is energetically ac-
sualized in plots of the vector maps for the two cessible to each of the molecules under consid-
dipeptides. eration that will place the designated func;
A more recent example of the use of vector tional groups in a similar orientation? The
maps to evaluate conformational similarity is second approach attempts to systematically
an application to p-turn mimetics by Ballet al. enumerate all possible conformations and
(434,435). This led to a recognition that many thereby derive all possible orientations or pat-
of the various turn types described in peptides terns to determine the set of patterns shared
based on their backbone dihedral angles lead by the compounds under study. The latter ap-
to quite similar topographical arrangements proach, when it can be applied, can directly
of the side chains. A new parameter, p [the address the question of uniqueness of the com-
dihedral angle formed by the backbone atoms mon pattern.
C,,,-aC,,,-aC(,-N,4,1, was described (Fig. The search for the global minimum, or
3.42) that more readily facilitated comparison complete set of low energy minima, on a poten-
of the topography of the system. tial surface is a common problem in science
and engineering that does not have a general
4.4 Finding the Common Pattern
solution. Numerous approaches in chemistry
If one assumes that a common binding mode have been used: most commonly stochastic
exists for two or more compounds, then one methods such as distance geometry (4081, mo-
can use the computer to verify the geometric lecular dynamics, and Monte Carlo sampling.
feasibility of the assumption. One needs to de- Although distance geometry and molecular
termine whether it is possible for the two mol- dynamics are widely used in the elucidation of
ecules to present a common geometric ar- solution conformations from NMR data, they
rangement of the designated "important" have problems in conformational sampling
functional groups for recognition. There are and homogeneous treatment of data from
4 Unknown Receptors
Figure 3.43. Simultaneous minimiza-

tion of molecules to force overlap of phar-
macophoric groups A, B, and C. Springs
represent constraints between groups and
only interatomic forces evaluated.
rigid and mobile domains. In general, the dif- approach with simultaneous minimization of
ficulties with most methods are similar to all variables is recommended (Fig. 3.43).
those seen with minimization procedures. If The combination of molecular mechanics
one is in the area of the global minimum, then with flexible minimization routines allows
one is likely to converge to that solution. Oth- penalty functions to be assigned to force geo-
erwise, one will be trapped in some local min- metrical correspondence of groups, whereas
imum. In contrast, systematic search methods individual molecules have their internal en-
are algorithmic, so that all sterically allowed ergy evaluated, but are invisible to the other
conformations are generated at the selected molecules under consideration. A program has .
torsional grid parameters. Systematic search been described (437) with this capability and
methods, therefore, do not have problems in its use illustrated on histamine antagonists by
sampling and are path independent, but are Naruto et al. (438). Template forcing allows
combinatorial in complexity, which may limit one molecule to be set up as a template and
the fineness of the sample grid and thus com- another molecule to be constrained to overlap
promise the results. Only in small systems in a specified manner. The strain energy in-
such as cycloalkane rings (121) and small pep- volved in forcing correspondence gives an up-
tides (90, 436) have the potential energy hyper-bound estimate of the distortion energy
persurfaces been mapped. required, given that the results depend on the
initial-problem definition.
4.4.1 Constrained Minimization. In cases An alternative approach uses the distance
where one has internal degrees of freedom, geometry paradigm, in which all the con-
besides the six associated with position and straints are combined to form the distance
orientation. the use of constrained minimiza- matrix from which energetically feasible con-
tion procedures becomes a useful technique. formations of the set of molecules are sought
Often the standard molecule for comparison mathematically. Sheridan et al. (439) demon-
has a fixed conformation and the molecule to strated this approach on acetylcholine analogs
be fitted has internal degrees of freedom. Sev- that are muscarinic agonists. Both of these ap-
eral groups have published methods for deal- proaches ask the same question and suffer
/ ing with this problem. In case one has simul- from the same limitations, and differ only in
taneous degrees of freedom in both the computational technique. Each suffers from
molecule to be fitted and the target, a different the local minima problem, in that each uses a
minimization technique, and the results will then the OMAP for each active molecule must
be dependent on the starting geometries of the contain the pattern encrypted in the set of dis-
initial set of molecules. Both have the advan- tances. By logically intersecting the set of
tage that the unique constraints imposed by OMAPs, one can determine which patterns
particular molecules enter consideration at an are common to all molecules (444). In other
early stage and minimize comparison of words, all potential pharmacophoric patterns
conformations. consistent with the activity of the set of mole-
Another variant recently reported by cules can be found by this simple manipula-
Hodgkin et al. (440) uses a Monte Carlo search tion of OMAPs, and the question of unique-
procedure to generate candidate pharmaco- ness addressed directly (Fig. 3.44).
phoric patterns. A reduced force-field parame- A good example is the work of Nelson et al.
ter set is used initially to lower energy barriers -
(445) on the rece~tor-bound conformation of
between conformations to ensure greater con- morphiceptin. Based on structure-activity
figurational sampling. Candidate pharma- data, the tyrarnine portion and phenyl ring of
cophores are then refined to produce low en- residue three of morphiceptin, Tyr-Pro-Phe-
ergy conformations of molecules overlaid in a Pro-NH,, were postulated to be the pharma-
common binding mode. Application to antag- cophoric groups responsible for recognition
onists of the human platelet-activating factor and activation of the opioid preceptor. It was
led to a consistent binding model for a set of assumed further that the aromatic rings
five diverse structures when active-site hydro- bound to the receptor in the different analogs
gen-bonding groups were postulated. Barakat were coincident and coplanar. A series of ac-
and Dean (441, 442) utilized simulated an- tive analogs with a variety of conformationally
nealing to optimize structure matching by constrained amino acid analogs in positions
minimizing the difference matrix between the two and three were analyzed. Aunique confor-
two molecules. A somewhat similar approach mation was found for the two most con-
is that of Perkins and Dean (443), who used strained analogs that allowed overlap of the
simulated annealing to search conformational Phe and Tyr portions of the molecules (Fig.
space followed by cluster analysis for each 3.45). In this case, a five-dimensional orienta-
molecule, with subsequent comparison of a tion map with distances between the nitrogen
small number of diverse conformers between and normals to the two aromatic rings was
different molecules. used in the analysis.
The Active Analog Approach (Fig. 3.46) is
4.4.2 Systematic Search and the Active An- appropriate for the unknown receptor prob-
alog Approach. Once the existence of a com- lem, given that no objective criteria function,
mon pattern has been determined, then the such as'potential energy, can be used a priori
issue of uniqueness needs to be addressed. The in the absence of information regarding the
Active Analog Approach (398) uses a system- receptor. Adequate sampling of the potential
atic search to generate the set of sterically al- surface to ensure that the complete set of local
lowed conformations based on a grid search of minima is found is still problematic because of
the torsional variables at a given angular in- the phenomenon known as "grid tyranny."
crement. For each sterically allowed confor- This relates to the fact that the combinatorial
mation, a set of distances between the postu- explosion that results by decreasing the incre-
lated pharmacophoric groups are measured. ment of the torsion angles scanned limits one
The set of distances, each of which represents to a finite increment for a given problem, say,
a unique pharmacophoric pattern, constitutes 10" for a seven-rotatable bond problem. Be-
an O W . Each point of the OMAP is simply a cause the energetics of the system is very sen-
submatrix of the distance matrix and, as such, sitive to interatomic distances, a conformation
is invariant to global translation and rotation generated at the 10" increment may be steri-
of the molecule. If the initial assumption is cally disallowed, but very close to a minimum.
valid, that the same binding mode of interac- Relaxation of the structure might find the
tion, or pharmacophoric pattern, is common relevant conformation, for example, by al-
to the set of molecules under consideration, lowing a torsional angle to vary by lo. Im-
4 Unknown Receptors
3 potential
pharmacophoric
areas
Molecule 1 Molecule 2
Figure 3.44. OMAPs generated for two molecules can be logically intersected to determine which
three-dimensional patterns are common.
provements in algorithms described in the ation, generation of an OMAP from those con-
following section have helped to overcome formations, and logical intersection of the
this problem. OMAPs to determine the common pharma-
cophoric patterns. A simple analysis will easily
4.4.3 Strategic Reductions of Computa- convince one that this is not feasible because
tional Complexity. Logically, the Active Ana- of the computational complexity of the prob-
log Approach can be conceived as sequentially lem. For example, the set of 28 ACE inhibitors'
determining all the sterically allowed confor- (Fig. 3.311, analyzed by Mayer et al. (3971,
mations for each molecule under consider- have a total of 163 torsional degrees of free-
dom that have to be explored to find a common
pattern, as seen in Table 3.1. If we were to
determine all possible conformations for each
molecule at 10" torsional scan, the scan pa-
rameter (s) = 10" and the number of torsional
increments r = 360"/s, or 36. For each mole-
cule, there are r" possibilities to be examined.
For the set of molecules there are (6 x 363) +
(7 X 365) + (3 X 366) + (5 X 367) + (6 X 368)
+ (1 x 36') possible conformations to be gen-
erated and examined. If one compares each
conformation of each molecule with all the
conformations of the other molecules to find
possible correspondences, the combinatorials
of the problem explode and one reaches the
same level of complexity as a complete confor-
Figure 3.45. Conformations of two constrained mational search of a peptide of 30 residues at a
analogs of morphiceptin in which aromatic rings of 10" scan (not currently feasible).
Tyrland Phe3 are overlapped (445). One is not interested in the conformational
Figure 3.46. The flow of information in the Active Analog Approach (111,399).Sterically allowed
conformations (represented by filled circles on the o,,o,torsional grid) of a molecule are determined
and the distances (dl,d,, etc.) between pharmacophore elements are recorded for each. The resulting
OMAP is used to constrain the next molecule in the series. Ideally, once all of the molecules have been
evaluated, only a single point or cluster of points remains in the OMAP.
hyperspace of the set of the inhibitors, but ines each candidate solution from the initial
rather the three-dimensional patterns com- OMAP to see whether all the other molecules
mon to the total set of inhibitors. Many con- are capable of presenting the same pattern. By
formations of a molecule often map into one changing the focus to the hypothesis of a com-
three-dimensional pattern. Transformation of mon three-dimensional pattern, a more effi-
the multidimensional conformational hy- cient approach has been devised (Fig. 3.46)
perspace in a smaller-dimensioned OMAP (399).Clearly, the algorithms that one chooses
space reduces the number of objects for com- to do the problem are important.
parison. If one starts with the most con-
strained inhibitor (fewest torsional degrees of
freedom) and determined an OMAP for it, 4.4.4 Alternative Approaches. A conceptu-
then one can use the upper and lower distance ally similar approach to receptor mapping has
bounds as constraints for searches for the next been taken by Ghose and Crippen (446-449),
molecule. In other words, one looks only who used the distance geometry method to an-
where there are possible solutions to the prob- alyze site points and drug interactions. A site
lem. A more advanced approach simply exam- model was postulated with some initial esti-
mates of force constants between the appro-
Table 3.1 Degrees of Torsional Freedom to priate portion of the ligand and the site point.
Specify ACE Active Site Geometry The binding energy for a particular binding
Degrees of Number of mode can be calculated:
Freedom (n) Molecules Total
3 6 18
5 7 35
6 3 18
7 5 35 where E, is the conformational energy, c is a
8 6 48 coefficient to be fit, x is the interaction of a site
9 1 9
point i with the bound ligand point m, which
Totals 28 163
depends on their types. The novel aspect of
4 Unknown Receptors
this approach was the use of distance geome- The ETMC is essentially an interatomic dis-
try to generate avariety of conformers binding tance matrix (Fig. 3.47), with the diagonal ele-
within the postulated site and then finding a ments containing an electronic structural pa-
set of force constants between the postulated rameter (atomic charge, polarizability, HOMO
site points and ligand points that will predict energy, etc.). Off-diagonal elements for two at-
the affinities of the compounds in the data set oms that are chemically bonded are used to
when bound in their optimal manner. With a store information regarding the bond (bond
site model of 11 attractive site points and 5 order, polarizability, etc.). Matrices for active
repulsive ones for DHFR, Ghose and Crippen compounds in a series are then searched for
-
(447) were able to derive force constants that common features that are not shared by inac-
fit 62 molecules, with an R 2 = 0.90, and pre- tive compounds. The successful examples
dict the activity of 33 molecules, with an R 2 = cited are predominately for small, relatively
0.71. The compounds, however, are essentially rigid structures where the conformational pa-
an extended congeneric series because the rameter does not confuse the analysis.
core recognition portion of the inhibitor, the Martin et al. (456) developed a strategy for
pyrimidine ring, is common to all the determining both the bioactive conformation
compounds. and a superposition rule for each active mole-
Linschoten et al. (450) extended Crippen's cule in a data set. In DISCO, a set of low en-
method by use of lipophilicity to describe the ergy conformers for each molecule is pro-
binding of parts of the ligand to lipophilic ar- cessed to locate atoms within the molecule and
eas of the receptor. Through the use of only a extensions for binding-site points for superpo-
nine-point model of the turkey erythrocyte sition. A clique-finding algorithm then finds
P-receptor and six energy parameters, they superpositions containing at least one confor-
successfully modeled 58 compounds. Distance mation of each molecule and a user-specified
geometry approaches to receptor-site model- minimum number of site points.
ing have been reviewed (449,451). Unlike methods that are limited to a pre-
Simon and his coworkers have developed computed set of rigid conformers, GASP (Ge-
(426) a quantitative 3D-QSAR approach, the netic Algorithm Similarity Program) (457) al-
minimal steric (topologic) difference (MTD) lows full conformational flexibility of ligands.
approach. Oprea et al. (452) compared MTD GASP employs a genetic algorithm for deter- -
and CoMFA on affinity of steroids for their
7
mining the correspondence between func-
binding proteins and found similar results. tional groups in different molecules and the
Snyder and colleagues (453) developed an au- alignment of these groups in a common geom-
tomated method for pharmacophore extrac- etry for receptor binding. For a set of ligands,
tion that can ~rovidea clear-cut distinction GASP automatically identifies rotatable
between agonist and antagonist pharmaco- bonds and pharmacophore elements such as
phores. Klopman (404, 454) developed a pro- rings and potential hydrogen-bonding sites. A
cedure for the automatic detection of common population of chromosomes is randomly con-
molecular structural features mesent in a structed, where each chromosome represents
training set of compounds. This has been used a possible alignment of all the molecules.
to produce candidate pharmacophores for a Chromosomes encode the torsion settings for
set of antiulcer compounds (404). Extensions rotatable bonds as well as the intermolecular
(454)of this approach allow differentiation be- mapping of elements. The fitness score of a
tween substructures responsible for activity particular alignment is the weighted sum of
and those that modulate the activity. three terms: the number and similarity of
Bersuker and Dimoglo (455) described a overlaid elements. the common volume of all
matrix-based approach that combines geomet- the molecules, ancl the internal van der Wads
ric and electronic features of a molecule, the energy of each molecule. Using a mutation or
electron-topological approach. For each mole- crossover operator, child chromosomes are
cule, an electron-topological matrix of congru- produced. Those with improved fitness scores
ity (ETMC) is constructed based on a con- replace the least-fit members of the existing
former selected by conformational analysis. population. The calculation terminates when
Figure 3.47. The electron-topological matrix of congruity (ETMC)for a 17-atom fragment proposed
by Bersuker and Dimoglo (455) to encode geometrical and electronic features of molecules.
the fitness of the population fails to improve by the receptor and that must be available for
by a specified amount, or when the preset binding. Inactive compounds mentioned
number of genetic operations is completed. above should possess novel volume require-
GASP produces several sets of alignments and ments, some portion of which is likely to ove&
their associated pharmacophore elements. lap with that occupied by the receptor. As an
example of receptor mapping, Sufrin et al.
4.4.5 Receptor Mapping. One can attempt (402) showed with amino acid analogs of me-
to decipher physical properties of the receptor thionine, which inhibited the enzyme, methi-
by use of data from both active and inactive 0nine:adenosyl transferase, that the data for a
analogs. Interpretation of results requires set of rigid amino acid inhibitors required the
some understanding of the interactions be- postulation of competition between the inac-
tween ligand and receptor that underlie mo- tive analogs and the enzyme for a particular
lecular recognition. Oprea and Kurunczi (458) volume of space (Fig. 3.48). Summation of the
reviewed these interactions in the context of volume requirements for the set of com-
receptor mapping. A basic assumption is that pounds, when oriented on the amino acid
a compound that contains the correct pharma- framework, yielded a minimum space from
cophoric elements and has the capability of which the receptor could be excluded. Each
positioning them correctly should be active. amino acid had the necessary binding ele-
Compounds with these attributes that are in- ments, but several were inactive. Each of the
active must be incapable of binding to the re- inactive analogs required extra volume not re-
ceptor in the correct orientation; that is, steric quired by the active analogs and shared a
overlap with the receptor must occur. By cal- small common unique volume whose occu-
culating the combined volume of the active an- pancy by the enzyme would be sufficient to
alogs superimposed in the correct orientation, rationalize their inactivity.
one has mapped space that cannot be occupied Klunk et al. (459) used separate receptor
4 Unknown Receptors
Active analogs
4 C O O H
COOH
COOH NH2
Inactive analogs
Figure 3.48. Example of recep-
&COO'
NH2
A N H
COOH
2 Q COOH
NH2
tor mapping of set of enzyme in-
hibitors that can be aligned on
common amino acid framework.
Set of inactive compounds all re-
quire common novel volume when
compared with active compounds
VII Vlll IX (402). Used with permission.
mapping of two different chemical classes of tion, and subtraction of volumes. Analytical
hands to support the hypothesis that they representation of molecular volumes by Con-
bound to the same site. Calder et al. (460) ar- nolly (464, 465) and solvent-accessible sur-
gued that a successful correlative CoMFA faces by Kundrot et al. (466) may be an alter-
model for 36 compounds of six chemical native that would allow optimization of
classes of GABA inhibitors indicated that the volume overlap, for example, by minimizing '
alignments used were significant. In some the difference in volume between two struc-
cases, comparison of volume maps for two re- tures. The solvent-accessible surface area can
ceptors have allowed optimization of activity be used to approximate the free energy of hy-
at one receptor with respect to the other. The dration and a rapid, numerical procedure for
work of Hibert et al. (461, 462), through the its calculation has been reported (467).
use of receptor mapping to increase the selec-
tivity of a lead compound for the 5-HT,, re- 4.4.6 Model Receptor Sites. One of the first
ceptor over the a,-adrenoreceptor, has re- visualizations of a receptor model is that of
sulted in clinical trials for a novel chemical Beckett and Casey (468) for the opiate recep-
class. This steric-mapping approach has be- tor published in 1954. Because morphine and
come relatively popular, and numerous exam- many other compounds active at this receptor
ples appear in current journals (463) on a reg- are essentially rigid, the model did not have to
ular basis. address the interaction of myriad numbers of
Although there are several feasible algo- flexible, naturally occurring opioid ligands,
rithms to deal with unions of molecular vol- such as endorphins and enkephalin, which
umes, the use of pseudoelectron density func- were only subsequently discovered. The model
tions calibrated to reproduce VDW radii (424) receptor had an anionic site to bind the
with three-dimensional contouring to repre- charged nitrogen, a hydrophobic flat surface
sent the surface has allowed mathematical with a cleft to bind the phenyl ring, and a hy-
manipulation of the density associated with drophobic hydrocarbon bridge seen in mor-
each lattice point to allow for union, intersec- phine. Kier (469) published a number of pa-
Figure 3.49. Peptidic pseudo-

receptor used to calculate af-
finity of NMDA agonists and
antagonists (453). Used with
permission.
pers attempting to define the pharmacophore by varying the distances of the amino acid
based on semiempirical molecular orbital cal- from its postulated binding position and find-
culations of in vacuo minimum-energy confor- ing the optimal distance for correlation with
mations. Although his basic concepts were observed affinity for the ribosome. Peptidic
valid, his emphasis on the global minima in pseudoreceptors have been constructed (453)
vacuo limited his scope of applicability. that correctly rank-order glutamate NMDA
Humber et al. (470) used semirigid antipsy- agonists and antagonists (Fig. 3.49).
chotic drugs, the so-called neuroleptics, which An intermediate between unknown recep-
antagonize CNS dopamine transmission and tors and ones where the three-dimensional
displace dopamine from its receptor, to formu- structure is known are models based on homol-
late a geometrical arrangement of receptor ogy. For the medicinal chemist, the G-protein
groups to rationalize their activity. Olson et al. receptors have been of intense interest and nu-
(471) used this model to design a novel ste- merous models (339,340,461,473) of the vari-
reospecific dopamine antagonist and success- ous receptor types have been developed based on
fully predicted its stereochemistry. their presumed three-dimensional homology
Because we are reasonably convinced the with bacteriorhodopsin (474). Mechanisms of
receptor is a protein, construction of hypothet- signal transduction (475) and differences be-
ical sites from amino acid fragments and cal- tween agonists and antagonists (476) have even
culation of affinity for these sites should cor- been rationalized based on such models. Nord-
relate with observed affinity, assuming that vall and Hacksell (341) recently combined the
the type of interactions and their geometry is construction of such a model for the muscarinic
represented by the site in some reasonable m l receptor with constraints derived from steric
manner. An individual fragment such as an mapping of muscarinic agonists. By adding the
indole ring from tryptophan does a good job of experimental constraints from ligand binding, a
simulating a flat hydrophobic surface. Holtje qualitative model was derived that was able to
and Tintelnot (472) constructed a site for reproduce experimentally derived stereoselec-
chloramphenicol from arginine and histidine tivities.
4 Unknown Receptors
4.4.7 Assessment of Model Predictability. What appears crucial to such studies is the
Because it is unlikely that there will be suffi- choice of training set, which encompasses as
cient structure-activity data to uniquely de- much of parameter space as one is likely to use
fine a model at atomic resolution in competi- in the predictive mode as well as tests of the
tion with crystallography, justification for predictive ability of resulting models. Given
model building must come from its potential that one is dealing with a situation in which
predictive power and possible insight into the the number of variables is larger (often several
receptor-drug interaction before detailed times) than the number of observations, lin-
ear regression models are not applicable be-
three-dimensional information from either
cause chance correlations are highly probable.
crystal structure or NMR studies. Certainly,
The use of cross-validation allows selection of
the questions regarding the ability of a pro-
correlations that are predictive in a self-con-
posed drug to bind to the active site without sistent manner within the training set. This
steric conflict with the receptor can be ad- does not mean to imply that such internally
dressed by the methods outlined above in a self-consistent models have predictive power
qualitative manner. The resolution of our re- outside of the training set, or extremely close
ceptor models is too crude, however, to subject congeners.
them to molecular mechanics estimates of af- DePriest et al. (483, 484) applied the
finities. There are alternative paradigms, CoMFA methodology to a series of 68 ACE
however, based on pattern recognition tech- (angiotensin-converting enzyme) inhibitors
niques in which a set of analogs and their representing 28 different chemical classes.
activities are used, along with their physico- Through use of the binding-site geometry de-
chemical parameters, to generate a mathe- termined by Mayer et al. (397), a CoMFA
matical model that relates the values of the model with a statistically significant cross-val-
physicochemical parameters for a given ana- idated R 2 and considerable predictive ability
log with its activity. One such paradigm is for inhibitors outside of the training set was
comparative molecular field analysis (CoMFA), derived. Because the geometry of the ACE in-
which combines the three-dimensional elec- hibitors was determined computationally by
trostatic and steric fields surrounding the anan active-site analysis rather than experimen-
alogs with powerful statistical techniques, tally, a comparison of the results of the ACE
partial least squares (PLS) (477) and cross- series against thermolysin inhibitors, for '
validation, to generate predictive models if a which there were crystallographic data to ex-
set of orientation rules are available for align- plicitly define the binding-site geometry and
ing the molecules for comparison and predic- the resulting alignment rules, was made,
tion. Alternative methods for assessing simi- given that thermolysin is also a zinc-contain-
larity and their use in QSAR schemes have ing metallopeptidase with numerous similari-
been compared (215) with CoMFA. Another ties between ACE and thermolysin. Their re-
approach is the use of neural nets that learn to sults give strong support to both the Active
"see" patterns in much the same way as our Analog Approach (398) used to define the
own nervous system processes information. alignment rule for the ACE series and the
Examples of the use of this pattern-recogni- CoMFA methodology itself. In the absence of
tion approach include classification of mecha- an experimentally known active-site geome-
nism of action for cancer chemotherapy (478) try, correlations were derived that explain as
and QSAR studies of DHFR inhibitors (479, much as 84% of the variance in activities
480) and carboquinones (481). Machine learn- among a set of 68 diverse ACE inhibitors by
ing has also been applied (482) to the QSAR use of CoMFA steric and electrostatic poten-
problem. Trimethoprim analogs were success- tials plus a zinc indicator variable (Fig. 3.50).
fully analyzed for their inhibition of DHFR If the set of 68 ACE inhibitors was divided into
and similar results to the original Hansch re- three classes and correlations are derived for
sults were obtained. It is not clear that this each class, CoMFA parameters alone explain
paradigm could be applied to noncongeneric 79-99% of the variance in activities. It was
series, at least as outlined. notable that statistically significant correla-
9-
8-
7-
0
..
u
Figure 3.50. Plot of experimental versus

predicted inhibition constants for 68 ACE
inhibitors used in derivation of CoMFA
model for the ACE active site (484). This -plot
shows the self-consistency of the model. 2 3 4 5 6 7 8 9
Used with permission. Actual (plC50)
tions were found, in spite of the fact that predictive r 2 = (SD - "press")/SD
CoMFA does not explicitly consider hydropho-
bicity or solvation. In further support of the where SD is the sum of the squared deviations
active-site paradigm, the cross-validated re- between the affinities of molecules in the
sults of the ACE series were equivalent to test set and the mean affinity of the training
those of the thermolysin series (cross-vali- set molecules, and "press" is the sum of the
dated R 2 = 0.65 to 0.70), for which the align- squared deviations between predicted and ac-
ment rule was defined by crystallographic tual affinity values for every molecule in the
data.
test set. It should be obvious from the equa-
The predictions for molecules outside the
tion that prediction of the mean value of the
training sets are a valid test of the predictive
ability of the model, rather than just a confir- training set for each member of the test set
mation of self-consistency of the derived would yield a predictive r 2 = 0.35 out of the 66
model. In other words, statistical analysis predicted molecules had residuals less than
alone does not answer the question of a chance one log value with a predictive r 2 value for the
correlation (485) for the training set. One collective set of these 35 test molecules of 0.90.
must investigate lateral correlations such as Of the 31 inhibitors with residuals greater
predictability. The predictive correlations pre- than 1.0, 8 were carboxylates, 12 were phos-
sented by DePriest et al. (483;484) represent a phates, and 11 were thiols. Clearly, no single
total of 66 diverse inhibitors that were not class of inhibitors dominated the distribution
chosen as analogs of compounds present in the of residuals. Considering both the composition
training set, but by selecting published papers and the method of selection of the test data
on three different chemical classes and testing sets (range of activities over 7 log units), the
all compounds in the papers [predictive r 2 = fact that more than 50% of the molecules were
0.46 for the set of 66 compounds predicted, predicted with correlations greater than r2 =
which had not been included in the training 0.90 lends strong support to the use of CoMFA
set for the ACE model with a zinc indicator of as a tool for QSAR development.
10 (Fig. 3.5111. The "predictive" r 2 was based Use of CoMFA as a predictive tool for recep-
only on molecules not included in the training tors of known three-dimensional structure
set and was defined as has also been explored. Klebe and Abraham
5 Conclusions
Diverse 20
A Thiols
Carboxylates
0 Phosphates
I I
I I
I I
I I I I
4 5 6 7 8 9 10
Actual (plC50)
Figure 3.51. Plot of experimental versus predicted inhibition constants for 35 ACE inhibitors not
used in derivation of CoMFA model (484). This plot indicates the predictability of the model. Used
with permission.
(486)used two enzymes (thermolysin and re- dictions from this CoMFA model of HIV pro-
nin) as well as antiviral activity against tease are being used to prioritize synthesis of de
human rhinovirus, where the coat-protein re- novo-designed HIV-protease inhibitors not in-
ceptor is known, to calibrate CoMFA method- cluded in development of the model.
ology. They concluded that only enthalpies of Crippen developed a method (488) to objec-
binding and not binding affinities were pre- tively model the binding of small ligands to .
dicted by CoMFA. Waller et al. (264)developed receptors, given the experimentally deter-
a predictive CoMFA model for the binding af- mined affinities of a set of ligands. The proce-
finities of HIV-protease inhibitors based on dure, Vorom, used Voronoi polyhedra to gen-
crystal structures of complexes. Initial analy- erate the simplest geometrical model of the
sis of the 59 molecules in the training set binding site. In a recent application to DHFR
representing five structurally diverse classes inhibitors (4891, only eight analogs were used
(hydroxyethylamine, statine, norstatine, keto- in the training set to derive the model and the
amide, and dihydroxyethylene) of transition- affinities of 23/39 of the test set molecules
state protease inhibitors yielded a correlation were correctly predicted, with an average rel-
with a cross-validated r2 value of 0.786. To ative error of 0.83 kcal/mol for the remaining
evaluate the predictive ability of this model, a compounds.
test set of 18 additional inhibitors (487) was
used that represented another class of transi-
tion-state isostere, hydroxyethylurea. The 5 CONCLUSIONS
model expressed good predictive ability for the
test set of hydroxyethylurea compounds Rapid advances in molecular and structural
,?
(,, = 0.624) with all compounds predicted biology have provided ample therapeutic tar-
within 1.06 log unit (1.4 kcdmol in binding af- gets characterized in three dimensions. Tools
finity) of their actual activities, with an average to exploit this information are being rapidly
absolute error of 0.58 log units (0.8 kcal/mol) developed and several strategies for de novo
mom a range of 3.03 log units (Fig. 3.52). Pre- design of ligands, given an active site, are un-
Figure 3.52. Plot of experimental

versus predicted inhibition constants
for 18 HIV-1 protease inhibitors not
used in derivation of CoMFA model
-1
(264). This plot indicates the predict-
ability of the model. Actual
der investigation. It is already clear, however, The game of 20 questions with receptors
that iterative approaches are necessary be- has progressed with experience. Ambiguity in
cause of the lack of precision in predicting af- interpretation of results and multiple models
finities for bound ligands. Molecular mechan- clearly
" reflect the uncertainties inherent in
ics and computer graphics are essential this indirect approach. Nevertheless, the ab-
components for design of novel ligands, and sence of direct experimental data in many bi-
rapid progress in evolving a useful set of tools ological systems of intense therapeutic inter-
is apparent. est make this the only game available for
The ultimate goal in comparison of mole- many. It is hoped that the next decade will gee
cules with respect to their biological activity is further progress in our ability to extract three-
insight into the receptor and its requirements dimensional information from structure-ac-
for recognition and activation. Conjecture re- tivity studies on unknown receptors.
garding the receptor is often a necessary part This perspective has examined the ap-
of rationalizing a set of structure-activity proaches to molecular modeling and drug de-
data. Although the problem of characterizing sign and emphasized their limitations. The
the active site of an unknown macromolecule reader should be aware. however. that these
indirectly is certainly challenging, the analy- tools are daily used on many problems of ther-
sis of structure-activity data of a set of ligands, apeutic interest with increasing success. This
especially if their structural variety is wide, is clearly witnessed by publications of such
allows useful models of active sites to be devel- studies in almost every issue of current major
oped. There are numerous caveats that must journals. For specific application areas, such
be acknowledged, however, such as flexibility as RNA (490, 491), DNA (492-496), mem-
of the receptor, multiple binding modes for li- brane (497-5071, or peptidomimetic modeling
gands, and lack of uniqueness of most models (382, 508-513), the reader is referred to the
because of limited experimental observations. literature. The prediction of molecular prop-
Success in using these methods would appear erties, such as log P and correlation between
to be increasing. This reflects both technolog- substructures and metabolism. has led to a
ical advances as well as insight into the prob- dramatic increase in efforts to correlate ad-
lem and algorithmic improvements in our an- sorption, distribution (514), metabolism (515-
alytical approaches. 5171, and elimination (ADME) with chemical
References
structure (518-522). In addition, the advent 9. C. J. Dinsmore, M. J. Bogusky, J. C. Culberson,

of combinatorial chemistry has focused mod- J. M. Bergman, C. F. Homnick, C. B. Zartman,
eling efforts on prioritizing compounds (523- S. D. Mosser, M. D. Schaber, R. G. Robinson,
528) for high throughput screening based on K. S. Koblan, H. E. Huber, S. L. Graham, G. D.
Hartman, J. R. Huff, and T. M. Williams,
chemical diversity (529-531), druglike prop-
J. Am. Chem. Soc., 123,2107-2108(2001).
erties (532,533), predicted oral bioavailability
(534,535),and so forth. 10. J. Hajdu, R.Neutze, T. Sjogren, K. Edman, A.
Szoke, R. Wilmouth, and C. M. Wilmot, Nut.
Struct. Biol., 7,1006-1012(2000).
11. A. Perrakis, R.Morris, and V. S. Lamzin, Nat.
6 ACKNOWLEDGMENTS Struct. Biol., 6,458-463(1999).
12. C. R.Beddell, Ed., The Design ofDrugs to Mac-
The work and influence of many talented col- romolecular Targets, John Wiley & Sons, New
laborators as well as the National Institute of York, 1992.
Health for grant support are gratefully ac-
13. I. D. Kuntz, Science, 257,1078-1082(1992).
knowledged. Although my former colleagues'
names are prominent in the references cited, 14. P. W. Finn and L. E. Kavraki, Algorithmica,
25,347-371(1999).
their contributions are numerous and individ-
ual citations are avoided because of probable 15. D. Joseph-McCarthy, Pharmacol. Ther., 84,
omissions. The author apologizes to many con- 179-191(1999).
tributors to the field whose efforts have not 16. P.G. Mezey, J.Mol. Model.,6,150-157(2000).
been adequately recognized in this overview, 17. E. F. Meyer, S. M. Swanson, and J.A. Williams,
the result of a somewhat arbitrary citation of Pharmacol. Ther., 85,113-121 (2000).
references. Space and time limitations pre- 18. V. Schnecke and L. A. Kuhn, Perspect. Drug
clude a more thorough discussion of many im- Discov. Des., 20,171-190(2000).
portant aspects. 19. U.Burkert and N. L. Allinger in M. C. Caserio,
Ed., Molecular Mechanics, ACS Monograph
339, Vol. 177, American Chemical Society,
Washington, DC, 1982.
REFERENCES
1. M. Salzmann, K. Permshin, G. Wider, H. 20. J. P. Bowen and N. L. Allinger in K. B. Lipko-
witz and D. B. Boyd, Eds., Revisions in Com-
Senn, and K. Wuthrich, J. Biomol. NMR, 14,
putational Chemistry, VCH, New York, 1991,.
85-88 (1999).
pp. 81-98.
2. P. J. Hajduk, R. P. Meadows, and S. W. Fesik,
Q. Rev. Biophys., 32,211-240(1999).
21. A. R. Leach, Molecular Modelling: Principles
and Applications, 2nd ed., Prentice Hall, New
3. P. J. Hajduk, G. Sheppard, D. G. Nettesheim, York, 2001,744 pp.
E. T. Olejniczak, S. B. Shuker, R. P. Meadows,
D. H. Steinman, G. M. Carrera, P. A. Marcotte, 22. M. Clark, R.Cramer, and N. Van Opdenbosch,
J. Severin, K. Walter, H. Smith, E. Gubbins, R. J. Comput. Chem., 10,982-1012(1989).
Simmer, T. F. Holzman, D. W. Morgan, S. K. 23. J. G. Vinter, A. Davis, and M. R. Saunders,
Davidsen, J. B. Summers, and S. W. Fesik, J. Cornput.-Aided. Mol. Des., 1,31-51(1987).
J. Am. Chem. Soc., 119,5818-5827(1997). 24. D. N. J. White and M. J. Bovill, J. Chem. Soc.
4. L. M. McDowell and J. Schaefer, Cum. Opin. Perkin Trans. 2,12,1610-1623(1977).
Struct. Biol., 6,624-629(1996). 25. K. Gundertofte, J . Palm, I. Pettersson, and A.
5. R.Ishima and D. A. Torchia, Nut. Struct. Biol., Stamvik, J. Comput. Chem., 12, 200-208
7,740-743(2000). (1991).
6. L. M. McDowell, M. A. McCarrick, D. R. Stu- 26. W. L. Jorgensen and J. Gao, J. Am. Chem. Soc.,
delska, W. J. Guilford, D. Arnaiz, J. L. Dallas, 110,4212-4216(1988).
D. R. Light, M. Whitlow, and J. Schaefer, 27. J.-H. Lii and N. L. Allinger, J. Comput. Chem.,
J.Med. Chem., 42,39104918(1999). 12,186-199(1991).
7. J. M. Moore, Biopolymers, 51,221-243(1999). 28. J. Aqvist and A. Warshel, J. Am. Chem. Soc.,
8. J. Fejzo, C. A. Lepre, J. W. Peng, G. W. Bemis, 112,2860 (1990).
Ajay, M. A. Murcko, and J. M. Moore, Chem. 29. R. D. Hancock, Acc. Chem. Res., 23,253-257
Biol.,6,755-769(1999). (1990).
30. A. Vedani and D.W . Huhta, J. Am. Chem. Soc., 55. W . G. Richards, P. M. King, and C. A. Reynolds,
112,4759-4767 (1990). Protein Eng., 2, 319-327 (1987).
31. V . S. Allured, C. M. Kelly, and C. R. Landis, 56. R. A. Pierotti, Chem. Rev., 76,717-726 (1976).
J. Am Chem. Soc., 113, 1-12 (1991). 57. G. L. Pollack, Science, 251, 1323-1330 (1991).
32. A. E. Carlsson, Phys. Rev. Lett., 81, 477-480 58. R. J. Zauhar and R. S. Morgan, J. Comput.
(1998). Chem., 9,171-187 (1988).
33. A. E. Carlsson and S. Zapata, Biophys. J., 81, 59. J. Tomasi, R. Bonaccorsi, R. Cammi, et al.,
1-10 (2001). Theochem. J. Mol. Struct., 80,401-424 (1991).
34. A. T . Hagler, E. Hugler, and S. Lifson, J. Am. 60. D. A. Liotard, G. D. Hawkins, G. C. Lynch, C. J.
Chem. Soc., 96,5319 (1974). Cramer, and D. G. Truhlar, J. Comput. Chem.,
35. S. C. Harvey, Proteins, 5, 78-92 (1989). 16,422-440 (1995).
36. M. E. Davis and J. A. McCammon, Chem. Rev., 61. K. Sharp, J. Comput. Chem., 12, 454-468
90,509-521 (1990). (1991).
37. W . F. van Gunsteren and H. J. C. Berendsen, 62. W . C. Still, A. Tempczyk, R. C. Hawley, and T .
Angew. Chem. Znt. Ed. Engl., 29, 992-1023 Hendrickson, Chem. Soc., 112, 6127-6129
(1990). (1990).
38. C. E. Dykstra, Chem. Rev., 93, 2339-2353 63. C. A. Schiffer,J. W . Caldwell, P. A. Kollman,
(1993). and R. M. Stroud, Mol. Simul., 10, 121-149
39. A. J. Stone and M. Alderton, Mol. Phys., 56, (1993).
1047-1064 (1985). 64. P. F. W . Stouten, C. Frommel, H. Nakamura,
40. M. J. Dudek and J. W . Ponder, J. Comput. and C. Sander, Mol. Simul., 10,97-120 (1993).
Chem., 16,791-816 (1995). 65. R. J. Zauhar, J. Comput. Chem., 12, 575-583
41. C. I. Bayly, P. Cieplak,W . D. Cornell, and P. A. (1991).
Kollman, J. Phys. Chem., 97, 10269-10280 66. A. J. Stone, Mol. Phys., 56, 1065-1082 (1985).
(1993). 67. S. Kuwajima and A. Warshel, J. Phys. Chem.,
42. D. E. Williams in K. B. Lipkowitz and D. B. 94,460-466 (1990).
Boyd, Eds., Revisions in Computational Chem- 68. R. A. Sorensen, W . B. Liau, L. Kesner, and
istry, VCH, New York, 1991, pp. 219-271. R. H. Boyd, Macromolecules, 21, 200-208
43. R. J. Loncharich and B. R. Brooks, Proteins, 6, (1988).
32-45 (1989). 69. J. Caldwell, L. X . Dang, and P. A. Kollman,
44. J. Guenot and P. A. Kollman, J. Comput. J. Am. Chem. Soc., 112,9144-9147 (1990):
Chem., 14,295-311 (1993). 70. C. J. Cramer and D. G. Truhlar, J. Am. Chem.
45. K. Tasaki, S. McDonald, and J. W . Brady, SOC.,113,8305-8311 (1991).
J. Comput. Chem., 14,278-284 (1993). 71. C. J. Cramer, J. Am. Chem. Soc., 113, 8552-
46. J. Shimada, H. Kaneko, and T . Takada, 8554 (1991).
J. Comput. Chem., 14,867-878 (1993). 72. C. J. Cramer and D. G. Truhlar, Science, 256,
47. H. Schreiber and 0. Steinhauser, Chem. Phys., 213-217 (1992).
168, 75-89 (1992). 73. C. J. Cramer and D. G. Truhlar, J. Comput.
48. H. Schreiber and 0. Steinhauser, Biochemis- Chem., 13,1089-1097 (1992).
t ~31,5856-5860
, (1992). 74. G. Rauhut, T . Clark, and T . Steinke, J. Am.
49. P. E. Smith and B. M. Pettit, J. Chem. Phys., Chem. Soc., 115,9174-9181 (1993).
95,8430-8441 (1991). 75. F. Franks i n F. Franks, Ed., Water, A Compre-
50. G. E. Marlow, J. S. Perkyns, and B. M. Pettit, hensive Treatise, Vol. 1, Plenum Press, New
Chem. Rev., 93,2503-2521 (1993). York, 1975.
51. M. Whitlow and M. M. Teeter, J. Am. Chem. 76. F. H. Stillinger, Science, 209,451-457 (1980).
SOC.,108,7163-7172 (1986). 77. L. R. Pratt, Ann. Rev. Phys. Chem., 36, 433-
52. M. K. Gilson, K. A. Sharp, and B. H. Honig, 449 (1985).
J. Comput. Chem., 9,327435 (1987). 78. J. P. M. Postma, H. J. C. Berendsen, and J . R.
53. A. Nicholls and B. Honig, J. Comput. Chem., Haak, Faraday Symp. Chem. Soc., 17, 55-67
12,435-445 (1991). (1982).
54. M. Schaefer and C. Froemmel, J. Mol. Biol., 79. B. G. Rao and U . C. Singh, J. Am. Chem. Soc.,
216,1045-1066 (1990). 111,31253133 (1989).
References
80. I. Ohmine and H. Tanaka, Chem. Rev., 93, 104. C. M. W. Ho and G. R. Marshall, J. Cornput.-
2545-2566 (1993). Aided Mol. Des., 7,623-647 (1993).
81. W. L. Jorgensen, J. Gao, and C. Ravimohan, J. 105. A. W. R. Payne and R. C. Glen, J. Mol. Graph-
Phys. Chem., 89,34703473 (1985). ics, 11, 74-91 (1993).
82. N. Muller, Trends Biochem. Sci., 17,459-463 106. H. A. Scheraga in K. B. Lipkowitz and D. B.
(1992). Boyd, Eds., Revisions in Computational Chem-
83. L. X. Dang, J. E. Rice, J. Caldwell, and P. A. istry, VCH, New York, 1992, pp. 73-142.
Kollman, J. Am. Chem. Soc., 113, 2481-2486 107. T. Schlick in K. B. Lipkowitz and D. B. Boyd,
(1991). Eds., Revisions in Computational Chemistry,
84. A. K. Rappe and W. A. Goddard 111, J. Phys. VCH, New York, 1992, pp. 1-71.
Chem., 95,3358-3363 (1991). 108. D. D. Beusen, E. F. B. Shands, S. F. Karasek,
85. R. Czerminski and R. Elber, Int. J. Quantum G. R. Marshall, and R. A. Dammkoehler,
Chem. Quantum Chem. Symp., 24, 167-186 THEOCHEM, 370, 157-171 (1996).
(1990). 109. H. Iijima, J. B. Dunbar, Jr., and G. R. Marshall,
86. C. Choi and R. Elber, J. Chem. Phys., 94,751- Proteins, 2 , 3 3 0 3 3 9 (1987).
760 (1991). 110. I. Motoc, R. A. Dammkoehler, and G. R. Mar-
87. S. E. Huston and G. R. Marshall, Biopolymers, shall in N. Trinajstic, Ed., Mathematic and
34, 74-90 (1994). Computational Concepts in Chemistry, Ellis
88. R. V. Pappu, R. K. Hart, and J. W. Ponder, J. Honvood, Chichester, UK, 1986, pp. 222-251.
Phys. Chem. B, 102,9725-9742 (1998). 111. R. A. Dammkoehler, S. F. Karasek, E. F. B.
89. L. Piela, Collect. Czech. Chem. Commun., 63, Shands, and G. R. Marshall, J. Cornput.-Aided
1368-1380 (1998). Mol. Des., 3, 3-21 (1989).
90. R. K. Hart, R. V. Pappu, and J. W. Ponder, 112. N. Go and H. A. Scheraga, Macromolecules, 3,
J. Comput. Chem., 21,531-552 (2000). 178-187 (1970).
91. M. Saunders, K. N. Houk, Y.-D. Wu, W. C. Still, 113. A. R. Leach in K. B. Lipkowitz and D. B. Boyd,
M. Lipton, G. Chang, and W. C. Guida, J. Am. Eds., Revisions in Computational Chemistry,
Chem. Soc., 112,1419-1427 (1990). VCH, New York, 1991, pp. 1-55.
92. M. Saunders, J. Am. Chem. Soc., 109, 3150- 114. S. K. Burt and J. Greer, Ann. Rep. Med. Chem.,
3152 (1987). 23,285-294 (1988).
93. J. T. Ngo and M. Karplus, J. Am. Chem. Soc., 115. D. M. Ferguson and D. J. Raber, J. Am. Chem.
119,56575667 (1997). SOC., 111,4371-4378 (1989).
94. R. V. Pappu, G. R. Marshall, and J. W. Ponder, 116. M. Saunders, J. Comput. Chem., 10, 203-208
Nut. Struct. Biol., 6 , 5 0 6 5 (1999). (1989).
95. K. R. Mackenzie, J. H. Prestegard, and D. M. 117. M. Saunders, J. Comput. Chem., 12, 645-663
Engelman, Science, 276, 131-133 (1997). (1991).
96. J. H. Holland, Sci. Am., July,66-72 (1992). 118. M. Saunders and H. A. Jimenez-Vazquez,
97. S. Forrest, Science, 261,872-878 (1993).
J. Comput. Chem., 14,330-348 (1993).
98. P. Willett, Trends Biotechnol., 13, 516-521 119. M. Saunders and N. Krause, J. Am. Chem.
(1995). SOC.,112,1791-1795 (1990).
99. J. E. Devillers, Genetic Algorithms in Molecu- 120. A. V. Shah and D. P. Dolata, J. Cornput.-Aided
lar Modeling, Academic Press, New York, Mol. Des., 7, 103-124 (1993).
1996. 121. I. Kolossvary and W. C. Guida, J. Am. Chem.
100. D. B. McGarrah and R. S. Judson, J. Comput. SOC.,115,2107-2119 (1993).
Chem., 14,1385-1395 (1993). 122. H A . Boehm, G. Klebe, T. Lorenz, T. Mietzner,
101. R. S. Judson, Y. T. Tan, E. Mori, C. Melius, and L. Siggel, J. Comput. Chem., 11, 1021-
E. P. Jaeger, A. M. Treasurywala, and A. Ma- 1028 (1990).
thiowetz, J. Comput. Chem., 16, 1405-1419 123. A. E. Howard and P. A. Kollman, J. Med.
(1995). Chem., 31,1669-1675 (1988).
102. R. P. Meadows and P. J. Hajduk, J. Biomol. 124. M. Lipton and W. C. Still, J . Comput. Chem., 9,
NMR, 5,41-47 (1995). 343-355 (1988).
103. B. Waszkowycz, D. E. Clark, D. Frenkel, J. Li, 125. D. D. Beusen, R. D. Head, J . D. Clark, W. C.
C. W. Murray, B. Robson, and D. R. Westhead, Hutton, U. Slomczynska, J. Zabrocki, M. T.
J. Med. Chem., 37,3994-4002 (1994). Leplawy, and G. R. Marshall in C. H. Schnei-
der and A. N. Eberle, Eds., The Solution NMR Eds., Advances in Biomolecular Simulations,
Structures of Emerimicins III and N Deter- American Institute of Physics Conference Pro-
mined Using the New Program, MACROSE- ceedings No. 239, Obernai, France, 1991, pp.
ARCH, ESCOM Scientific, Leiden, Nether- 174-199.
lands, 1993, pp. 79-80. 147. M. L. Smythe, S. E. Huston, and G. R. Mar-
126. M. P. Allen and D. J. Tildesley, Computer Sim- shall, J. Am. Chem. Soc., 115, 11594-11595
ulation of Liquids, Oxford Science Publica- (1993).
tions, Oxford, UK, 1989, p. 385. 148. M. L. Smythe, S. E. Huston, and G. R. Mar-
127. N. Metropolis, A. W. Rosenbluth, M. N. Rosen- shall, J. Am. Chem. Soc., 117, 5445-5452
bluth, A. H. Teller, and E. Teller, J. Chem. (1995).
Phys., 21, 1087 (1953). 149. G. H. Loew and S. K. Burt in C. A. Ramsden,
128. J. A. McCammon and S. C. Harvey, Dynamics Ed., Quantitative Drug Design, Pergamon
of Protein and Nucleic Acids, Cambridge Uni- Press, Oxford, UK, 1990, pp. 105-123.
versity Press, Cambridge, UK, 1987, p. 234. 150. S. L. Price and N. G. J. Richards, J. Cornput.-
129. G. Zhang and T. Schlick, J. Comput. Chem., Aided Drug Des., 5,41-54 (1991).
14,1212-1233 (1993). 151. U. C. Singh and P. A. Kollman, J. Comput.
130. T. Schlick and W. K. Olson, Science, 257, Chem., 5, 129 (1984).
1110-1115 (1992).
152. B. H. Besler, K. M. Merz, Jr., and P. A. Koll-
131. D. S. Goodsell and A. J. Olson, Proteins, 8,195- man, J. Comput. Chem., 11,431-439 (1990).
202 (1990).
153. G. Rauhut and T. Clark, J. Comput. Chem., 14,
132. W. L. Jorgensen, Acc. Chem. Res., 22,184-189 503-509 (1993).
(1989).
154. J. G. Vinter and M. R. Saunders in D. J. Chad-
133. D. L. Beveridge and F. M. DiCapua in W. van
wick and K. Widdows, Eds., Host-Guest Molec-
Gunsteren and P. K. Weiner, Eds., Computer
ular Interactions: From Chemistry to Biology,
Simulation of Biomolecular Systems, ESCOM
John Wiley & Sons, Chichester, UK, 1991, pp.
Science, Leiden, Netherlands, 1989, pp. 1-26.
249-265.
134. P. Kollman, Chem. Rev., 93,2395-2417 (1993).
155. C. A. Hunter and J. K. M. Sanders, J. Am.
135. W. L. Jorgensen, J. Phys. Chem., 87, 5304- Chem. Soc., 112,5525-5534 (1990).
5314 (1983).
156. U. Dinur and A. T. Hagler in K. B. Lipkowitz
136. P. A. Bash, U. C. Singh, F. K. Brown, R. Lan-
and D. B. Boyd, Eds., Revisions in Computa-
gridge, and P. A. Kollman, Science, 235,574-
tional Chemistry, VCH, New York, 1991, fip.
576 (1987).
99-164.
137. P. A. Kollman and K. M. Merz, Acc. Chem.
Res., 23, 246-252 (1990). 157. J. Pranata, S. G. Wierschke, and W. I. Jor-
gensen, J. Am. Chem. Soc., 113, 2810-2819
138. T. P. Lybrand, J. A. McCammon, and G. Wipff, (1991).
Proc. Natl. Acad. Sci. USA, 83, 833-835
(1986). 158. J. Tirado-Rives and W. L. Jorgensen, J. Am.
Chem. Soc., 112,2773-2781 (1990).
139. J. Hermans, R. H. Yun, and A. G. Anderson,
J. Comput. Chem., 13,429-442 (1992). 159. A. Alex and T. Clark, J. Comput. Chem., 13,
140. J. Hermans, Curr. Opin. Struct. Biol., 3, 270- 704-717 (1992).
276 (1993). 160. J. Aqvist and A. Warshel, Chem. Rev., 93,
141. R. Elber and M. Karplus, J. Am. Chem. Soc., 2523-2544 (1993).
112,9161-9175 (1990). 161. M. J. Field, P. A. Bash, and M. Karplus,
142. D. J. Tobias, J. E. Mertz, and C. L. Brooks 111, J. Comput. Chem., 11,700-783 (1990).
Biochemistry, 30,6054-6058 (1991). 162. A. Warshel, Computer Modeling of Chemical
143. D. J. Tobias and C. L. Brooks 111,Biochemistry, Reactions in Enzymes and Solutions, John
30,6059-6070 (1991). Wiley & Sons, New York, 1991, p. 236.
144. D. J. Tobias, S. F. Sneddon, and C. L. Brooks 163. V. Daggett, S. Schroder, and P. Kollman,
111, J. Mol. Biol., 216, 783-796 (1990). J. Am. Chem. Soc., 113,8926-8935 (1991).
145. S. F. Sneddon, D. J. Tobias, and C. L. Brooks 164. P. R. Andrews and D. A. Winkler in G. Jolles
111, J. Mol. Biol., 209, 817-820 (1989). and K. R. H. Wooldridge, Eds., Drug Design:
146. D. J. Tobias, S. F. Sneddon, and C. L. Brooks Fact or Fantasy?, Academic Press, New York,
I11 in R. Lavery, J.-L. Rivail, and J. Smith, 1984, pp. 145-174.
References
165. J. E. Eksterowicz and K. N. Houk, Chem. Rev., 179. G.Otting, Cum. Opin. Struct. Biol., 3,760-768
93,2439-2461(1993). (1993).
166. K.Appelt, R. J. Bacquet, C. A. Bartlett, C. L. J. 180. S. 0. Smith, Curr. Opin. Struct. Biol., 3, 755-
Booth, S. T. Freer, M. A. M. Fuhry, M. R. Geh- 759(1993).
ring, S. H. Herrmann, E. F. Howland, C. A. 181. M. F. Perutz, G. Fermi, D. J:Abraham, C. Po-
Janson, T. R. Jones, C.-C. Kan, V. Kathard- yart, and E. Bursa-, J. Am. Chem. Soc., 108,
ekar, K. K. Lewis, G. P. Marzoni, D. A. 1064-1078 (1986).
Mathews, C. Mohr, E. W. Moomaw, C. A. 182. A.S. Mehanna and D. J. Abraham, Biochemis-
Morse, S. J. Oatley, R. C. Ogden, M. R. Reddy, try, 29,3944-3954(1990).
S. H. Reich, W. S. Schoettin, W. W. Smith,
M. D. Varney, J. E. Villafranca, R. W. Ward, S.
183. I. D. Kuntz, J. M. Blaney, S. J. Oatley, R. Lan-
gridge, and T. E. Ferrin, J. Mol. Biol., 161,269
Webber, S. E. Webber, K. M. Welsh, and J.
(1982).
White, J. Med. Chem., 34, 1925-1934 (1991).
184. R. Voorintholt, M. T. Kosters, G. Vegter, G.
167. P. J. Goodford, J. Med. Chem., 27, 557-564 Vriend, and W. G. J. Hol, J. Mol. Graphics, 7,
(1984). 243-245(1989).
168. C. R. Beddell, Chem. Soc. Rev., 13, 279-319 185. C. M. W. Ho and G. R. Marshall, J. Cornput.-
(1984). Aided Mol. Des., 4,337454(1990).
169. R. Wootton in C. R. Beddell, Ed., The Design of 186. P. J. Goodford, J. Am. Chem. Soc., 28, 849-
Drugs to Macromolecular Targets, John Wiley 856(1985).
& Sons, New York, 1992,pp. 49-83.
187. R. D. Cramer 111, D. E. Patterson, and J. D.
170. L.F. Kuyper, B. Roth, D. P. Baccanari, R. Fer- Bunce, J. Am. Chem. Soc., 110, 5959-5967
one, C. R. Beddell, J. N. Champness, D. K. (1988).
Stammers, J. G. Dann, F. E. Norrington, D. J.
188. R. D. Cramer I11 and M. Milne, The Lattice
Baker, and P. J. Goodford, J. Med. Chem., 28,
Model: A General Paradigm for Shape-Related
303-311 (1985).
Structure/Activity Correlation, in Proceedings
171. K. Appelt, J. Cornput.-Aided Mol. Des., 1, of the 19th National Meeting of the American
23-48(1993). Chemical Society, American Chemical Society,
172. M.von Itzstein, W.-Y. Wu, G. B. Kok, M. S. Washington, DC, 1979.
Pegg, J. C. Dyason, B. Jin, T. V. Phan, M. L. 189. A. Miranker and M. Karplus, Proteins, 11,
Smythe, H. E. White, S. W. Oliver, P. M. Col- 29-34(1991).
man, J. N. Varghese, D. M. Ryan, J. M. Woods,
R. C. Bethell, V. J. Hotham, J. M. Cameron,
190. A. CafIisch, A. Miranker, and M. Karplus, .
J. Med. Chem., 36,2142-2167 (1993).
and C. R. Penn, Nature, 363,418-423 (1993).
191. P. K.Weiner, C. Landridge, J. M. Blaney, R.
173. J. W. Liebeschuetz, S. D. Jones, P. J. Morgan, Schaefer, and P. A. Kollman, Proc. Natl. Acad.
C. W. Murray, A. D. Rimmer, J. M. Roscoe, B. Sci. USA, 79,3754-3758(1982).
Waszkowycz, P. M. Welsh, W. A. Wylie, S. C. 192. S. J. Weiner, P. A. Kollman, D. A. Case, U.C.
Young, H. Martin, J. Mahler, L. Brady, and K. Singh, C. Ghio, G. Alagona, J. S. Profeta, and
Wilkinson, J. Med. Chem., 45, 1221-1232 P. Weiner, J. Am. Chem. Soc., 106, 765-784
(2002). (1984).
174. K.E. Lind, Z. Du, K. Fujinaga, B. M. Peterlin, 193. S. J. Weiner, P. A. Kollman, D. T. Nguyen, and
and T. L. James, Chem. Biol., 9, 185-193 D. A. Case, J. Comput. Chem., 7, 230-252
(2002). (1986).
175. M. Miller, M. Jaskolski, J. K. M. Rao, J. Leis, 194. F. H.Allen, J. E. Davies, J. J. Galloy, 0.John-
and A. Wlodawer, Nature, 337, 576-579 son, 0. Kennard, C. F. Macrea, E. M. Mitchell,
(1989). G. F. Mitchell, J. M. Smith, and D. G. Watson,
176. M. Miller, B.K. Sathyanarayana, A. Wlodawer, J. Chem. Znf. Comput. Sci., 31,187-204(1991).
M. V. Toth, G. R. Marshall, L. Clawson, L. 195. E. E. Abola, F. C. Bernstein, and T. F. Koetzle
Selk, J. Schneider, and S. B. H. Kent, Science, in P. S. Glaeser, Ed., The Role of Data in Sci-
246,1149-1152(1989). entific Progress, Elsevier, New York, 1985.
177. R. L. Stanfield, M. Takimoto-Kamimura, J. M. 196. P. R. Andrews, E. J. Lloyd, J. L. Martin, and
Rini, A. T. Profy, and I. A. Wilson, Structure, 1, S. L. A. Munro, J. Mol. Graphics, 4, 41-45
83-93(1993). (1986).
178. S. W. Fesik, J. Med. Chem., 34, 2938-2945 197. R. S. Pearlman, Chem. Des. Auto. News, 2,1
(1991). (1987).
198. R. S. Pearlman, CONCORD User's Manual, 218. R. L. Dedarlais, R. P. Sheridan, G. L. Seibel,

Tripos Associates, St. Louis, MO, 1992. J. S. Dixon, I. D. Kuntz, and R. Venkataragha-
199. R. S. Pearlman, Chem. Des. Auto. News, 8, van, J. Med. Chem., 31,722-729 (1988).
3-15 (1993). 219. R. L. Dedarlais, G. L. Seibel, I. D.Kuntz, P. S.
200. R. P. Sheridan, A. Rusinko 111, R. Nilakantan, F'urth, J. C. Alvarez, P. R. Ortiz de Montellano,
and R. Venkataraghavan, Proc. Natl. Acad. D. L. Decamp, L. M. Babe, and C. S. Craik,
Sci. USA, 86,8165-8169 (1989). Proc. Natl. Acad. Sci. USA, 87, 6644-6648
(1990).
201. P. Gund, W. T. Wipke, and R. Langridge, Com-
put. Chem. Res. Ed. Technol., 3,5-21(1974). 220. R. L. DesJarlais, R. P. Sheridan, J. S. Dixon,
I. D. Kuntz, and R. Venkataraghavan, J. Med.
202. P. Gund,Prog. Mol. Subcell. Biol., 11,117-143
Chem., 29,2149-2153 (1986).
(1977).
221. H . J . Bohm, J. Cornput.-Aided Mol. Des., 6,
203. A. M. Lesk, Commun. ACM, 22, 221-224
61-78 (1992).
(1979).
222. H . J . Bohm, J. Cornput.-Aided Mol. Des., 6,
204. S. E. Jakes and P. Willett, J. Mol. Graphics, 4,
593-606 (1992).
12-20 (1986).
223. P. L. Chau and P. M. Dean, J. Comput-Aided.
205. S. E. Jakes, N. Watts, P. Willett, D. Bawden,
Mol. Des., 6,385-396 (1992).
and J. D. Fisher, J. Mol. Graphics, 5, 41-48
(1997). 224. P. L. Chau and P. M. Dean, J. Comput-Aided.
Mol. Des., 6,397-406 (1992).
206. P. A. Bartlett, G. T. Shea, S. J. Telfer, and S.
Waterman in S. M. Roberts, Ed., Molecular 225. P. L. Chau and P. M. Dean, J. Comput-Aided.
Recognition: Chemical and Biological Prob- Mol. Des., 6,407-426 (1992).
lems, Royal Society of Chemistry, London, 226. C. M. W. Ho and G. R. Marshall, J. Cornput.-
1989, pp. 182-196. Aided Mol. Des., 7,3-22 (1993).
207. J. H. Van Drie, D. Weininger, and Y. C. Martin, 227. G. Klebe, J. Mol. Med., 78,269-281 (2000).
J.Cornput.-Aided Mol. Des., 3,225-251 (1989). 228. H. J. Bohm and M. Stahl, Med. Chem. Res., 9,
208. R. P. Sheridan, R. Nilakantan, A. I. Rusinko, 445-462 (1999).
N. Bauman, K. S. Haraki, and R. Venkat- 229. H. J. Bohm and M. Stahl, Curr. Opin. Chem.
araghavan, J. Chem. Inf. Comput. Sci., 29, Biol., 4, 283-286 (2000).
255-260 (1989). 230. R. A. Lewis and P. M. Dean, Proc. R. Soc. Lond.
209. Molecular Design, MACCS-3D, Molecular De- B, 236,141-162 (1989).
sign Ltd., San Leandro, CA, 1993. 231. R. A. Lewis and P. M. Dean, Proc. R. Soc. Lond.'
210. Chemical Design, CHEM-X, Chemical Design B, 236,125-140 (1989).
Ltd., Oxford OX2 OJB, UK, 1993. 232. Y. Nishibata and A. Itai, Tetrahedron, 47,
211. UNITY User's Manual, Tripos, Inc., St. Louis, 8985-8990 (1991).
MO, 2002. 233. Y. Nishibata and A. Itai, J. Med. Chem., 36,
212. Y. C. Martin, M. G. Bures, and P. Willett in K. 2921-2928 (1993).
Lipkowitz and D. Boyd, Eds., Revisions in 234. D. A. Pearlman and M. A. Murko, J. Comput.
Computational Chemistry, VCH, New York, Chem., 14,1184-1193 (1993).
1990, pp. 213-263. 235. M. G. Bures, C. Black-Schaefer, and G. Gard-
213. Y. C. Martin, J. Med. Chem., 35, 2145-2154 ner, J. Cornput.-Aided Mol. Des., 5, 323-334
(1992). (1991).
214. A. I. Rusinko, R. P. Sheridan, R. Nilakantan, 236. J. M. Blaney and J. S. Dixon, Perspect. Drug.
K. S. Haraki, N. Bauman, and R. Venkat- Discov. Des., 1, 301419 (1993).
araghavan, J. Chem. Inf. Comput. Sci., 29, 237. D. L. Bodian, R. B. Yamasaki, R. L. Buswell,
251-255 (1989). J. F. Stearns, J. M. White, and I. D. Kuntz,
215. A. C. Good, S. J. Peterson, and W. G. Richards, Biochemistry, 32,2967-2978 (1993).
J. Med. Chem., 36,2929-2937 (1993). 238. C. S. Ring, E. Sun, J. H. McKerrow, G. K. Lee,
216. R. S. Pearlman in H. Kubinyi, Ed., 3 0 QSAR P. J. Rosenthal, I. D. Kuntz, and F.E. Cohen,
in Drug Design: Theory, Methods and Applica- Proc. Natl. Acad. Sci. USA, 90, 3583-3587
tions, ESCOM Scientific, Leiden, Netherlands, (1993).
1993, pp. 41-79. 239. B. K. Shoichet, R. M. Stroud, D. V. Santi, I. D.
217. J. Sadowski and J. Gasteiger, Chem. Rev., 93, Kuntz, and K. M. Perry, Science, 259, 1445-
2567-2581 (1993). 1450 (1993).
rences
I. Halperin, B. Ma, H. Wolfson,and R. Nussi- R. D. Head, M. L. Smythe, T . I. Oprea, C. L.

nov, Proteins, 47,409-443(2002). Waller, S. M. Green, and G. R. Marshall, J. Am.
E. Katchalski-Katzir, I. Shariv, M. Eisenstein, Chem. Soc., 118,39594969(1996).
A. A. Friesem, C. Afialo, and I. A.Vakser, Proc. C. L. Waller, T . I. Oprea, A. Giolitti, and G. R.
Natl. Acad. Sci. USA, 89,2195-2199(1992). Marshall, J. Med. Chem., 36, 4152-4160
I. A. Vakser and C. Malo, Proteins, 20, 320- (1993).
329(1994). N. Prattibiraman, M. Levitt, T . E. Ferrin, and
R. Langridge, J. Comput. Chem., 6,432(1985).
I. A. Vakser, Protein Eng., 9,37-41 (1996).
E. C. Meng, B. K. Shoichet, and I. D. Kuntz,
I. A.Vakser, Biopolymers, 39,455-464(1996). J. Comput. Chem., 13,505-524 (1992).
H . A. Gabb, R. M. Jackson, and M. J . E. Stern- S. Makino and I. D. Kuntz, J. Comput. Chem.,
berg, J. Mol. Biol., 272,106-120(1997). 18,1812-1825(1997).
M. Rarey, B. Kramer, T . Lengauer, and G. T. J. A. Ewing and I. D. Kuntz, J. Comput.
Klebe, J. Mol. Biol., 261,470-489(1996). Chem., 18,1175-1189(1997).
S. R. Krystek, Jr., R. E. Bruccoleri, and J. No- B. Sandak, R. Nussinov, and H. J. Wolfson,
votny, Znt. J. Pept. Protein Res., 38,229-236 J. Comp. Biol., 5,631-654(1998).
(1991). C. A. Baxter, C. W . Murray, D. E. Clark, D. R.
J. Aqvist, J. Comput. Chem., 17, 1587-1597 Westhead, and M. D. Eldridge, Proteins, 33,
(1996). 367-382 (1998).
D. M. Lorber and B. K. Shoichet, Protein Sci.,
A. N. Jain, J. Cornput.-Aided Mol. Des., 10,
7,938-950(1998).
427-440(1996).
Y . Sun, T . J . A. Ewing, A. G. Skillman, and I . D.
A. Alex and P. Finn, THEOCHEM, 398,551- Kuntz, J. Cornput.-Aided Mol. Des., 12, 597-
554(1997). 604(1998).
R.C. Wade, A. R. Ortiz, and F. Gago, Perspect. R.Mangoni, D. Roccatano, and A. Di Nola, Pro-
Drug Discov. Des., 9-11,19-34(1998). teins, 35,153-162 (1999).
R. M. A. Knegtel and P. D. J. Grootenhuis, G. M. Morris, D. S. Goodsell, R. S. Halliday, R.
Perspect. Drug Discov. Des., 9-11, 99-114 Huey,W . E. Hart, R. K. Belew, and A. J. Olson,
(1998). J. Comput. Chem., 19,1639-1662 (1998).
M. D. Eldridge, C. W . Murray, T . R. Auton, M. Liu and S. M.Wang, J. Cornput.-Aided Mol.
G.V. Paolini, and R. P. Mee, J. Cornput.-Aided Des., 13,435-451(1999).
Mol. Des., 11,425-445 (1997). S. Makino, T . J. A. Ewing, and I. D. Kuntz,.
J. Cornput.-Aided Mol. Des., 13, 513-532
M . K. Gilson, J . A. Given, and M. S. Head,
(1999).
Chem. Biol., 4,87-92 (1997).
M. Rarey, B. Kramer, and T . Lengauer, Bioin-
M. K.Gilson, J. A. Given, B. L. Bush, and J. A. formatics, 15,243-250(1999).
McCammon, Biophys. J., 72, 1047-1069 R. M. A. Knegtel and M.Wagener, Proteins, 37,
(1997). 334-345(1999).
T.I. Oprea and G. R. Marshall, Perspect. Drug M. L. Lamb, K. W . Burdick, S. Toba, M. M.
Discov. Des., 9-11,35-61 (1998). Young, K. G. Skillman, X . Q. Zou, J. R. Arnold,
M. K.Holloway, Perspect. Drug Discov. Des., and I. D. Kuntz, Proteins, 42,296-318(2001).
9-11,63-84(1998). R. Abagyan and M. Totrov, Curr. Opin. Chem.
H. J. Bohm, J. Cornput.-Aided Mol. Des., 12, Biol., 5,375-382(2001).
309423(1998). H. Claussen, C. Buning, M. Rarey, and T . Len-
gauer, J. Mol. Biol., 308,377-395(2001).
T.Hansson, J.Marelius, and J. Aqvist, J. Com-
put.-Aided Mol. Des., 12,27-35(1998). C. A. Baxter, C. W . Murray, B. Waszkowycz, J.
Li, R. A. Sykes, R. G. A. Bone, T . D. J. Perkins,
G. R.Marshall, R. H. Head, and R. Ragno in E. and W . Wylie, J. Chem. Znf. Comput. Sci., 40,
Di Cera, Ed., Thermodynamics in Biology, Ox- 254-262(2000).
ford University Press, Oxford, UK, 2000, pp. N. Ota and D. A. Agard, J. Mol. Biol., 314,
87-111. 607-617(2001).
S. Krystek, T . Stouch, and J. Novotny, J. Mol. S. Naruto, I. Motoc, G. R. Marshall, S. B.
Biol., 234,661-679(1993). Daniels, M. J. Sofia, and J. A. Katzenellenbo-
J. Novotny, R. E. Bruccoleri, and F. A. Saul, gen, J. Am. Chem. Soc., 107, 5262-5270
Biochemistry, 28,4735-4749(1989). (1985).
285. K. P. Clark and Ajay, J. Comput. Chem., 16, 308. F. M. Menger and M. J. Sherrod, J.Am. Chem.
1210-1226 (1995). SOC., 112,8071-8075 (1990).
286. G. M. Verkhivker, P. A. Rejto, D. K. Gehlhaar, 309. D. P. Riley, P. J. Lennon, W. L. Neumann, and
and S. T. Freer, Proteins, 25,342353 (1996). R. H. Weiss, J. Am. Chem. Soc., 119, 6522-
287. D. Q. McDonald and W. C. Still, J. Am. Chem. 6528 (1997).
Soc., 116,11550-11553 (1994). 310. K. Aston, N. Rath, A. Naik, U. Slomczynska,
288. F. Guarnieri and W. C. Still, J.Comput. Chem., 0.F. Schall, and D. P. Riley, Inorg. Chem., 40,
15,1302-1310 (1994). 1779-1789 (2001).
289. D. Q. McDonald and W. C. Still, J. Am. Chem. 311. D. H. Williams, Aldrichimica Acta, 24, 71-80
Soc., 118,2073-2077 (1996). (1991).
290. Z. R. Wasserman and C. N. Hodge, Proteins, 312. A. J. Doig and D. H. Williams, J. Am. Chem.
24,227-237 (1996). SOC., 114,338-343 (1992).
291. J. Desmet, I. A. Wilson, M. Joniau, M. De- 313. M. S. Searle and D. H. Williams, J. Am. Chem.
maeyer, and I. Lasters, FASEB J.,ll,164-172 SOC., 114,10690-10697 (1992).
(1997). 314. M. S. Searle, D. H. Williams, and U. Gerhard,
292. B. L. King, S. Vajda, and C. Delisi, FEBS Lett., J. Am. Chem. Soc., 114,10697-10704 (1992).
384,87-91(1996). 315. D. H. Williams and B. Bardsley, Perspect. Drug
293. D. S. Goodsell, H. Lauble, C. D. Stout, and A. J. Discov. Des., 17,43-59 (1999).
Olson, Proteins, 17, 3-10 (1993). 316. M. Graffner-Nordberg, K. Kolmodin, J.Aqvist,
294. R. X. Wangand S. M. Wang, J.Chem. In$ Com- S. F. Queener, and A. Hallberg, J. Med. Chem.,
put. Sci., 41,1422-1426 (2001). 44,2391-2402 (2001).
295. P. S. Charifson, J. J. Corkery, M. A. Murcko, 317. J. Aqvist, V. B. Luzhkov, and B. 0. Brandsdal,
and W. P. Walters, J. Med. Chem., 42, 5100- Acc. Chem. Rev., 35,358-365 (2002).
5109 (1999). 318. G. M. Verkhivker, D. Bouzida, D. K. Gehlhaar,
296. N. L. Allinger, Z.-q. S. Zhu, and K. Chen, P. A. Rejto, L. Schaffer, S. Arthurs, A. B. Col-
J. Am. Chem. Soc., 114,6120-6133 (1992). son, S. T. Freer, V. Larson, B. A. Luty, T. Mar-
rone, and P. W.Rose, J.Med. Chem., 45,72-89
297. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, (2002).
D. J. States, S. Swaminathan, and M. Karplus,
J. Comput. Chem., 4, 187-217 (1983). 319. R. Lumry and S. Rajender, Biopolymers, 9,
1125-1227 (1970).
298. F. A. Momany and R. Rone, J. Comput. Chem.,
13,888-900 (1992). 320. I. Muegge and Y. C. Martin, J.Med. hem., 22,
791-804 (1999).
299. G. Nemethy, M. S. Pottle, and H. A. Scheraga,
321. B. A. Grzybowski, A. V. Ishcheno, J. Shimada,
J. Phys. Chem., 87,1883-1887 (1983).
and E. I. Shakhnovich, Acc. Chem. Res., 35,
300. T. A. Halgren, J. Am. Chem. Soc., 114, 7827- 261-269 (2002).
7843 (1992). 322. B. A. Grzybowski, A. V. Ishcheno, C.-Y. Kim,
301. P. S. Charifson, R. G. Hiskey, L. G. Pedersen, G. Topolov, R. Chapman, D. W. Christianson,
and L. F. Kuyper, J. Comput. Chem., 12,899- G. M. Whitesides, and E. I. Shakhnovich, Proc.
908 (1991). Natl. Acad. Sci. USA, 99,1270-1273 (2002).
302. S. C. Hoops, K. W. Anderson, and K. M. Merz, 323. K. M. Merz, Jr., M. A. Murcko, and P. A. Koll-
Jr., J.Am. Chem. Soc., 113,8262-8270 (1991). man, J. Am. Chem. Soc., 113, 4484-4490
303. C. J. Casewit, K. S. Colwell, and A. K. Rappe, (1991).
J. Am. Chem. Soc., 114,10035-10046 (1992). 324. C. F. Wong and J. A. McCammon, J. Am.
304. C. J. Casewit, K. S. Colwell, and A. K. Rappe, Chem. Soc., 108,3830-3832 (1986).
J. Am. Chem. Soc., 114,10046-10053 (1992). 325. L. M. Hansen and P. A. Kollman, J. Comput.
305. A. K. Rappe, C. J. Casewit, K. S. Colwell, W. A. Chem., 11,994-1002 (1990).
Goddard 111, and W. M. Skiff, J. Am. Chem. 326. B. G. Rao, R. F. Tilton, and U. C. Singh, J.Am.
SOC., 114,10024-10035 (1992). Chem. Soc., 114,4447-4452 (1992).
306. Y.-D. Wu and K. N. Houk, J. Am. Chem. Soc., 327. W. E. Harte, Jr. and D. L. Beveridge, J. Am.
114,1656-1661 (1992). Chem. Soc., 115,3883-3886 (1993).
307. K. Houk, J. A. Tucker, and A. Dorigo, Acc. 328. D. M. Ferguson, R. J. Radmer, and P. A. Koll-
Chem. Res., 23,107-113 (1990). man, J. Med. Chem., 34,2654-2659 (1991).
J. J. McDonald and C. L. Brooks 111, J. Am. 348. L. M. H. Koyrnans, N. P. E. Vermeulen, A.
Chem. Soc., 114,2062-2072 (1992). Baarslag, and G. M. Donne-op den Kelder,
T . P. Lybrand and J. A. McCammon, J. Com- J. Cornput.-Aided Mol. Des., 7,281-289 (1993).
put.-Aided Mol. Des., 2,259-266 (1988). 349. R. E. Bruccoleri and M. Karplus, Biopolymers,
W . R. Cannon, J. D. Madura, R. P. Thummel, 26,137-168 (1987).
and J . A. McCammon, J. Am. Chem. Soc., 115, 350. D. Jones and J. Thornton, J. Cornput.-Aided
879-884 (1993). Mol. Des., 7,439-456 (1993).
S. Yun-yu,A. E. Mark, W . Cun-Xin, H. Fuhua, 351. J. U. Bowie and D. Eisenberg, Curr. Opin.
J. C. Berendsen, and W . F. van Gunsteren, Struct. Biol., 3,437-444 (1993).
Protein Eng., 6, 289-295 (1993). 352. S. J. Wodak and M. J. Rooman, Curr. Opin.
D. H . Rich, C.-Q. Sun, J. V . N. Vara Prasad, Struct. Biol., 3, 247-259 (1993).
M . V . Toth, G. R. Marshall, P. Ahammadunny,
353. M. J. Sippl, J. Cornput.-Aided Mol. Des., 7,
M . D. Clare, R. D. Mueller, and K. Houseman,
473-501 (1993).
J. Med. Chem., 34,1222-1225 (1991).
P. De La Paz, J. M. Burridge, S. J. Oatley, and 354. S. Miyazawa and R. C. Jernigan, Macromole-
C. C. F. Blake i n C. R. Beddell, Ed., The Design cules, 18, 534-552 (1985).
of Drugs to Macromolecular Targets, John 355. S. H. Bryant and C. E. Lawrence, Proteins, 16,
Wiley & Sons, New York, 1992, pp. 119-172. 92-112 (1993).
A. B. Edmundson, J. N. Herron, K. R. Ely, 356. D. Frishman and H. W . Mewes, Nut. Struct.
X.-M. He, D. L. Harris, and E. W . Voss, Jr., Biol., 4,626-628 (1997).
Philos. Trans. R. Soc. Lond. Biol., 323, 495- 357. C. Chothia, Nature, 357,543-544 (1992).
509 (1989).
358. P.-A. Lindgard and H. Bohr in H. Bohr and S.
C. Bihoreau, C. Monnot, E. Davies, B. Teutsch, Bunak, Eds., Protein Folds, CRC Press, Boca
K. E. Bernstein, P. Corvol, and E. Clauser, Raton, FL, 1996, pp. 98-102.
Proc. Natl. Acad. Sci. USA, 90, 5133-5137
(1993). 359. P. E. Boscott, G. J. Barton, and W . G. Richards,
Protein Eng., 6, 261-266 (1993).
T . M . Fong, R. R. C. Huang, and C. D. Strader,
J. Biol. Chem., 267,25664-25667 (1992). 360. D. Frishman and P. Argos, Proteins, 27, 329-
335 (1997).
U. Gether, T . E. Johansen, R. M. Snider, I.
Lowe, J . A., S. Nakanishi, and T . W . Schwartz, 361. R. Srinivasan and G. D. Rose, Proteins, 22,
Nature, 362,345-348 (1993).
M . F. Hibert, S. Trumpp-Kallmeyer, A. Bruin-
81-99 (1995).
362. K. A. Dill, H. S. Chan, and K. Yue, Macromol.
.
vels, and J. Hoflack,Mol. Pharmacol., 40,8-15 Symp., 98,615-617 (1995).
(1991). 363. K. Y u e and K. A. Dill, Protein Sci., 5,254-261
M . F. Hibert, S. Trumpp-Kallmeyer, J. (1996).
Hoflack,and A. Bruinvels, Trends Pharmacol. 364. S. M. Le Grand and K. L. Merz, Jr. in S. M. Le
Sci., 14, 7-12 (1993). Grand and K. L. Merz, Jr., Eds., The Protein
G. Nordvall and U. Hacksell, J. Med. Chem., Folding Problem and Tertiary Structure Pre-
36,967-976 (1993). diction, Birkhauser, Boston, 1994, pp.
T . L. Blundell, B. L. Sibanda, M. J. E. Stern- 109-124.
berg, and J. M. Thornton, Nature, 326, 347- 365. S. Sun, Protein Sci., 2, 762-785 (1993).
352 (1987). 366. J. U. Bowie and D. Eisenberg, Proc. Natl. Acad.
, L. H. Pearl and W . R. Taylor, Nature, 329, Sci. USA, 91,4436-4440 (1994).
351-354 (1987). 367. J . U. Bowie, K. Zhang, M. Wilmanns, and D.
, I. T. Weber, Proteins, 7, 172-184 (1990). Eisenberg, Methods Enzymol., 266, 598-616
. L. M. Balbes and F. I. Carroll, Med. Chem. (1996).
Res., 1, 283-288 (1991). 368. S. G. Galaktionov and G. Marshall, Molecular
, R. J. Siezen,W . M. de Vos, J. A. M. Leunissen, Graphics and Drug Design: 27th Hawaii Znter-
and B. W . Dijkstra, Protein Eng., 4, 719-737 national Conference on System Sciences, IEEE
(1991). Computer Society Press, Washington, DC,
, J. H. Brown, T . Jardetzky, M. A. Saper, B.
1994.
Samraoui, P. J. Bjorkman, and D. C. Wiley, 369. S. Saitoh, T . Nakai, and K. Nishikawa, Pro-
Nature, 332,845-850 (1988). teins, 15, 191-204 (1993).
370. M. Vendruscolo, E. Kussell, and E. Domany, 387. G. L. Olson, D. R. Bolin, M. P. Bonner, M. Bos,
Fold. Des., 2,295-306 (1997). C. M. Cook, D. C. Fry, B. J. Graves, M. Hatada,
371. I. D. Kuntz, G. M. Crippen, P. A. Kolman, and D. E. Hill, M. Kahn, V. S. Madison, V. K.
D. Kimelman, J. Mol. Biol., 106, 983-994 Rusiecki, R. Sarabu, J. Sepinwall, G. P. Vin-
(1976). cent, and M. E. Voss, J.Med. Chem., 36,3039-
372. S. Galaktionov, G. V. Nikiforovich, and G. R. 3049 (1993).
Marshall, Biopolymers, 60, 153-168 (2001). 388. P. C. Belanger and C. Dufresne, Can. J. Chem.,
373. A. Aszodi and W. R. Taylor, Fold. Des., 1,325- 64,1514-1520 (1986).
334 (1996). 389. R. Hirschmann, K. C. Nicolaou, S. Pietranico,
374. A. Aszodi, R. E. J. Munro, and W. R. Taylor, E. M. Leahy, J. Salvino, B. Arison, M. A. Cichy,
Fold. Des., 2, S3-S6 (1997). P. G. Spoors, W. C. Shakespeare, P. A. Spren-
geler, P. Hamley, A. B. Smith 111, T. Reisine, K.
375. A. Aszodi and W. R. Taylor, Comput. Chern.,
Raynor, L. Maechler, C. Donaldson, W. Vale,
21, 13-23 (1997).
R. M. Friedinger, M. R. Cascieri, and C. D.
376. S. R. Holbrook, I. Dubchak, and S.-H. Kim, Strader, J. Am. Chem. Soc., 115,12550-12568
Biotechniques, 14, 984-989 (1993). (1993).
377. B. H. Park, E. S. Huang, and M. Levitt, J.Mol. 390. J. H. Arevalo, E. A. Stura, M. J. Taussig, and
Biol., 266, 831-846 (1997). I. A. Wilson,J.Mol. Biol., 231,103-118 (1993).
378. G. M. Crippen and V. N. Maiorov in H. Bohr 391. J. H. Arevalo, M. J. Taussig, and I. A. Wilson,
and S. Bunak, Eds., Protein Folds, CRC Press, Nature, 365,859-863 (1993).
Boca Raton, FL, 1996, pp. 189-201. 392. P. Traxler, J. Green, H. Mett, U. Sequin, and
379. P. D. Thomas and K. A. Dill, J. Mol. Biol., 257, P. Furet, J. Med. Chem., 42, 1018-1026
457-469 (1996). (1999).
380. S. Miyazawa and R. L. Jernigan, J. Mod. Biol., 393. P. Traxler, G. Bold, E. Buchdunger, G. Cara-
256,623-644 (1996). vatti, P. Furet, P. Manley, T. O'Reilly, J. Wood,
381. V. J. Hruby, W. Qui, T. Okayama, and V. A. and J . Zimmermann, Med. Res. Rev., 21,499-
Soloshonok, Methods Enzymol., 343, 91-123 512 (2001).
(2002). 394. R. Bureau, C. Daveu, J. C. Lancelot, and S.
382. M. G . Bursavich and D. H. Rich, J. Med. Rault, J. Chem. Znf. Comput. Sci., 42,429-436
Chem., 45,541-558 (2002). (2002).
383. R. Hirschmann, K. C. Niwlaou, S. Pietranico, 395. Y. Kato, A. Itai, and Y. Iitaka, Tetrahedron
J. Salvino, E. M. Leahy, P. A. Sprengeler, G. Lett., 43,5229-5236 (1987). .
Furst, and A. B. Smith 111,J. Am. Chem. Soc., 396. W. H. Moos, C. C. Humblet, I. Sircar, C. Rith-
114,9217-9218 (1992). ner, R. E. Weishaar, J. A. Bristol, and A. T.
384. R. Hirschmann, P. A. Sprengeler, T. Ka- McPhail, J. Med. Chem., 30, 1963-1972
wasaki, J. W. Leahy, W. C. Shakespeare, and (1987).
A. B. Smith 111,J.Am. Chem. Soc., 114,9699- 397. D. Mayer, C. B. Naylor, I. Motoc, and G. R.
9701 (1992). Marshall, J. Cornput.-Aided Mol. Des., 1,3-16
385. T. W. Ku, F. E. Ali, L. S. Barton, J. W. Bean, (1987).
W. E. Bondinell, J. L. Burgess, J. F. Callahan, 398. G. R. Marshall, C. D. Barry, H. E. Bosshard,
R. R. Calvo, L. Chen, D. S. Eggelston, J. S. R. A. Dammkoehler, and D. A. Dunn in E. C.
Gleason, W. F. Huffman, S. M. Hwang, D. R. Olsen and R. E. Christoffersen, Eds., Com-
Jakas, C. B. Karash, R. M. Keenan, K. D. Kop- puter-Assisted Drug Design, American Chem-
ple, W. H. Miller, K. A. Newlander, A. Nichols, ical Society, Washington, DC, 1979, pp. 205-
M. F. Parker, C. E. Peishoff, J. M. Samanen, I. 226.
Uzinskas, and J. W. Venslavsky, J. Am. Chem. 399. R. A. Dammkoehler, S. F. Karasek, E. F. B.
Soc., 115,8861-8862 (1993). Shands, and G. R. Marshall, Constrained
386. G. L. Olson, H.-C. Cheung, M. E. Voss, D. E. Search of Conformational Hyperspace: Seg-
Hill, M. Kahn, V. S. Madison, C. M. Cook, J. mentation and Parallelism, Abstr. 204th ACS
Sepinwall, and G. Vincent, Concepts a n d National Meeting, American Chemical Society,
Progress in the Design of Peptide Mimetics: Washington, DC, 1992.
Beta Turns and Thyrotropin Releasing Hor- 400. G. R. Marshall and R. D. Cramer 111, Trends
mone (Biotechnology USA 1989), Conference Pharmacol. Sci., 9,285-289 (1988).
Management Corporation, Norwald, CT, 1989, 401. R. D. Cramer 111and S. B. Wold, Comp. Mol.
pp. 348-360. Field Anal. (CoMFA), 5,388 (editorial) (1991).
J. R. Sufrin, D. A. Dunn, and G. R. Marshall, 423. S. K. Kearsley, J. Comput. Chem., 11, 1187-
Mol. Pharmacol., 19, 307313 (1981). 1192 (1990).
P. R. Andrews, E. J . Lloyd, J. L. Martin, S. L. 424. G. R. Marshall and C. D. Barry, Functional
Munro, M. Sadek, and M. G. Wong i n A. S. V . Representation of Molecular Volume for Com-
Burgen, G. C. K. Roberts, andM. S. Tute, Eds., puter-Aided Drug Design, Abstr. Amer. Cryst.
Molecular Graphics and Drug Design, Elsevier Assoc., Honolulu, HI, 1979.
Science, Amsterdam, 1986, pp. 216-255. 425. A. J. Hopfinger, J. Med. Chem., 2, 7196-7206
G. Klopman and S. Srivastava, Mol. Pharma- (1980).
col., 37,958-965 (1989). 426. Z. Simon, A. Chiriac, S. Holban, D. Ciubotariu,
G. Klopman and M. L. Dimayuga, J. Cornput.- and G. I. Mihalas, Minimum Steric Difference,
Aided Mol. Des., 4, 117-130 (1990). Research Studies Press, Letchworth, UK,
G. Rum and W . C. Herndon, J. Am. Chem. Soc., 1984.
113,9055-9060 (1991). 427. D. Ciubotariu, E. Deretey, T . I. Oprea, T . I.
Sulea, Z. Simon, L. Kurunczi, and A. Chiriac,
C. Silipo and A.Vittoria in C. A. Ramsden, Ed.,
Quant. Struct.-Act. Relat., 12,367-372 (1993).
Quantitative Drug Design, Pergamon Press,
Oxford,UK, 1990, pp. 153-204. 428. H.-D. Holtje and S. Marrer, J. Cornput.-Aided
Mol. Des., 1,23-30 (1987).
G. M. Crippen in D. Bawden, Ed., Distance Ge-
ometry and Conformational Calculations (Che- 429. A. J. Hopfinger, J. Med. Chem., 26, 990-996
mometrics Research Studies), Vol. 1, John (1983).
Wiley & Sons, Chichester, UK, 1981. 430. S. Namasivayam and P. M. Dean, J. Mol.
D. E. Clark, P. Willett, and P. W . Kenny, J. Graphics, 4,46 (1986).
Mol. Graphics, 10, 194-204 (1992). 431. P. L. Chau and P. M. Dean, J. Mol. Graphics, 5,
97 (1987).
C. A. Pepperrell and P. Willett, J. Cornput.-
Aided Mol. Des., 5,455-474 (1991). 432. C. Burt and W . G. Richards, J. Cornput.-Aided
Mol. Des., 4,231-238 (1990).
A. R. Poirette, P. Willett, and F. H. Allen, J.
Mol. Graphics, 11,2-14 (1993). 433. J. Zabrocki, G. D. Smith, J. B. Dunbar, Jr., H.
Iijima, and G. R. Marshall, J. Am. Chem. Soc.,
G. R. Marshall and C. B. Naylor in C. A. Rams-
110,5875-5880 (1988).
den, Ed., Quantitative Drug Design, Pergamon
Press, Oxford,U K , 1990, pp. 431-458. 434. J. B. Ball, R. A. Hughes, P. F. Alewood, and
P. R. Andrews, Tetrahedron, 49, 34673478
A. Davis, B. H. Warrington, and J. G. Vinter, (1993).
J. Cornput.-Aided Mol. Des., 1,97-120 (1987).
435. J. B. Ball and P. F. Alewood, J. Mol. Recognit.,'
H. Weinstein, R. Osman, S. Topiol, and J. P. 3,55-64 (1990).
Green, Ann. N. Y. h a d . Sci., 367, 434-448
436. G. V . Nikiforovich, K. E. Kover, W . J. Zhang,
(1981).
and G. R. Marshall, J. Am. Chem. Soc., 122,
N. C. Cohen in B. Testa, Ed., Advances in Drug 32623273 (2000).
Research, Academic Press, New York, 1985, 437. J. Labanowski, I. Motoc, C. B. Naylor, D.
pp. 40-144. Mayer, and R. A. Dammkoehler, Quant.
R. C. Wade, K. J. Clark, and P. J. Goodford, Struct.-Act. Relat., 5, 138-152 (1986).
J. Med. Chem., 36,140-147 (1993). 438. S. Naruto, I. Motoc, and G. R. Marshall, Eur.
N. Marchand-Geneste, K. A.Watson, B. K. Als- J. Med. Chem., 20,529-532 (1985).
berg, and R. D. King, J. Med. Chem., 45,399- 439. R. P. Sheridan and R. Venkataraghavan,
409 (2002). J. Cornput.-Aided Mol. Des., 1,243-256 (1987).
, C. Hansch, J. Mcclarin, T . Klein, and R. Lan-
440. E. E. Hodgkin, A. Miller, and M. Whittaker,
gridge, Mol. Pharmacol., 27, 493-498 (1995). J. Cornput.-AidedMol. Des., 7,515-534 (1993).
C. Hansch, T. Klein, J. McClarin, R. Lan- 441. M. T . Barakat and P. M. Dean, J. Cornput.-
gridge, and N. W . Cornell, J. Med. Chem., 29, Aided Mol. Des., 4,295-316 (1990).
615-620 (1986). 442. M. T . Barakat and P. M. Dean, J. Cornput.-
, G. E. Kellogg, S. F. Semus, and D. J. Abraham, Aided Mol. Des., 4,317-330 (1990).
J. Cornput.-AidedMol. Des., 5,545-552 (1991). 443. T . D. J. Perkins and P. M. Dean, J. Cornput.-
. G. E. Kellogg and D. J. Abraham, J. Mol. Aided Mol. Des., 7,173-182 (1993).
Graphics, 10,212-217 (1992). 444. I. Motoc, J. Labanowski, C. B. Naylor, D.
, D. J . Danziger and P. M. Dean, J. Theor. Biol., Mayer, and R. A. Dammkoehler, Quant.
116,215-224 (1985). Struct.-Act. Relat., 5, 99-105 (1986).
445. R. D. Nelson, D. I. Gottlieb, T. M. Balasubra- 463. A. W. Schmidt and S. J. Peroutka, Mol. Phar-
manian, and G. R. Marshall in R. S. Rapaka, G. macol., 36,505-511 (1989).
Barnett, and R. L. Hawks, Eds., Opioid Pep- 464. M. L. Connolly, Science, 221,709-713(1983).
tides: Medicinal Chemistry, NIDA Office of 465. M. L. Connolly, J. Appl. Crystallogr., 16, 548-
Science, Rockville, MD, 1986,pp. 204-230. 558(1983).
446. A. K. Ghose and G. M. Crippen, J.Med. Chem., 466. C. E. Kundrot, J. W. Ponder, and F. M. Rich-
27,901-914(1984). ards, J. Comput. Chem., 12,402-409(1991).
447. A. K. Ghose and G. M. Crippen, J.Med. Chem., 467. S. M. Le Grand and K. M. Merz, Jr., J. Comput.
28,333-346(1985). Chem., 14,349-352(1993).
448. A. K. Ghose and G. M. Crippen in C. A. Rams- 468. A. H. Beckett and A. F. Casey, J. Pharm. Phar-
den, Ed., Quantitative Drug Design, Pergamon macol., 6,986-999(1954).
Press, Oxford, UK,1990,pp. 716-733.
469. L. B. Kier and H. S. Aldrich, J. Theor.Biol.,46,
449. A. K. Ghose and G. M. Chippen, Mol. Pharma- 529-541(1974).
col., 37,725-734(1990).
470. L. G. Humber, F. T. Bruderlin, A. H. Philipp,
450. M.R. Linschoten, T. Bultsma, A. P. IJzerman, M. Gotz, and K. Voith, J. Med. Chem., 22,761-
and H. Timmerman, J. Med. Chem., 29,278- 767(1979).
286(1986).
471. G. L. Olson, H. C. Cheung, K. D. Morgan, J. F.
451. G. M. Donne-op den Kelder, J. Cornput.-Aided Blount, L. Todaro, L. Berger, A. B. Davidson,
Mol. Des., 1,257-264(1987). and E. Boff, J. Med. Chem., 24, 1026-1034
452. T. I. Oprea, D. Ciubotariu, T. I. Sulea, and Z. (1981).
Simon, Quant. Struct.-Act. Relat., 12, 21-26 472. H.-D. Holtje and M. Tintelnot, Quant. Strut.-
(1993). Act. Relat., 3,6-9(1984).
453. J. P. Snyder, S. N. Rao, K. F. Koehler, A. Ve- 473. W. C. Probst, L. A. Snyder, D. J. Schuster, J.
dani, and R. Pellicciari in C. G. Wermuth, Ed., Brosius, and S. C. Sealfon, DNA Cell Biol., 11,
Trends in QSAR and Molecular Modelling 92, 1-20(1992).
ESCOM Scientific, Leiden, Netherlands, 1993,
pp. 44-51. 474. S. Trumpp-Kallmeyer, J. Hoflack, A. Bruin-
vels, and M. Hibert, J. Med. Chem., 35,3448-
454. G. Klopman, Quant. Struct.-Act. Relat., 11, 3462(1992).
176-185(1992).
475. D. Timms, A. J. Wilkinson, D. R. Kelly, K. J.
455. I. B. Bersuker and A. S. Dimogo in K. B. Lip- Broadley, and R. H. Davies, Znt. J. Quantum
kowitz and D. B. Boyd, Eds., Revisions in Com- Chem. Quantum Biol. Symp., 19, 197-215
putational Chemistry, VCH, New York, 1991, (1992).
pp. 423-460.
476. D. Zhang and H. Weinstein, J. Med. Chem.,36,
456. Y. C. Martin, M. G. Bures, E. A. Danaher, J. 934-938(1993).
DeLazzer, I. Lico, and P. Pavlik, A., J. Com-
put.-Aided Mol. Des., 7,83-102(1993). 477. B. L. Bush and R. B. Nachbar, Jr., J. Cornput.-
Aided Mol. Des., 7,587-619(1993).
457. G. Jones, P. Willett, and R. C. Glen, J. Mol.
Biol., 245,4343(1995). 478. J. N. Weinstein, K. W. Kohn, M. R. Grever,
V. N. Viswanadhan, L. V. Rubeinstein, A. P.
458. T. I. Oprea and L. Kurunczi in N. Voiculetz, I. Monks, D. A. Scudiero, L. Welch, A. D. Kout-
Motoc, and Z. Simon, Eds., Specific Znterac- soukos, A. J. Chiausa, and K. D. Paull, Science,
tions and Biological Recognition Processes, 258,447-451(1992).
CRC Press, BocaRaton, FL, 1993,pp. 295-326.
479. T. A. Andrea and H. Kalayeh, J. Med. Chem.,
459. W. E. Klunk, B. L. Kalman, J. A. Ferrendelli,
34,2824-2836(1991).
andD. F. Covey, Mol. Pharmacol., 23,511-518
(1982). 480. S.-S. So and W. G. Richards, J. Med. Chem., 35,
460. J. A. Calder, J. A. Wyatt, D. A. Frenkel, and 32014207 (1992).
J. E. Casida, J. Cornput.-Aided Mol. Des., 7, 481. I. V. Tetko, A. I. Luik, and G. I. Poda, J. Med.
45-60 (1993). Chem., 36,811-814(1993).
461. M. F. Hibert, R. Hoffmann, R. C. Miller, and 482. R. D. King, S. Muggleton, R. A. Lewis, and
A. A. Cam, J. Med. Chem., 33, 1594-1600 M. J. E. Sternberg, Proc. Natl. Acad. Sci. USA,
(1990). 89,11322-11326(1992).
462. M. F. Hibert, M. W. Gittos, D. N. Middlemiss, 483. S. A. DePriest, E. F. B. Shands, R. A. Damm-
A. K. Mir, and J. R. Fozard, J. Med. Chem., 31, koehler, and G. R. Marshall in C. Silipo and A.
1087-1093(1988). Vittoria, Eds., QSAR: Rational Approaches to
the Design of Bioactive Compounds, Elsevier R. P. Mason, D. G. Rhodes, and L. G. Herbette,
Science, Amsterdam, 1991, pp. 405-414. J. Med. Chem., 34,869-877 (1991).
S. A. DePriest, D. Mayer, C. B. Naylor, and L. G. Herbette in C. G. Wermuth, Ed., Trends
G. R. Marshall, J. Am. Chem. Soc., 115,5372- in QSAR and Molecular Modelling 92, ES-
5384 (1993). COM Scientific, Leiden, Netherlands, 1993,
C. Hansch, Acc. Chem. Res., 26, 147-153 pp. 76-85.
(1993). H. Heller, M. Schaeffer, and K. Schulten, J.
G. Klebe and U . Abraham, J. Med. Chem., 36, Phys. Chem., 97,8343-8360 (1993).
70-80 (1993).
W . Im and B. R o n , J. Mol. Biol., 319, 1177-
D. P. Getman, G. A. DeCrescenzo, R. M. 1197 (2002).
Heintz, K. L. Reed, J. J. Talley, M. L. Bryant,
M. Clare, K. A. Houseman, J. J. Marr, R. A. T . Kataoka, D. D. Beusen, J. D. Clark, M. Yodo,
Mueller, M. L. Vazquez, H.-S. Shieh, W . C. and G. R. Marshall, Biopolymers, 32, 1519-
Stallings, and R. A. Stegeman, J. Med. Chem., 1533 (1992).
36,288-291 (1993). G. R. Marshall, Tetrahedron, 49, 3547-3558
G. M. Crippen, J. Comput. Chem., 8,943-955 (1993).
(1987). G. V . Nikiforovich and G. R. Marshall, Bio-
M. P. Bradley and G. M . Crippen, J. Med. chem. Biophys. Res. Commun., 195, 222-228
Chem., 36,3171-3177 (1993). (1993).
F. Major, M. Turcotte, D. Gautheret, G. Lap- G. V. Nikiforovich and V . J. Hruby, Biochem.
alme, E. Fillion, and R. Cedergren, Science, Biophys. Res. Commun., 194,9-16 (1993).
253,1255-1260 (1991). G. Nikiforovich and G. R. Marshall, Int. J.
D. Gautheret and R. Cedergren, FASEB J., 7, Pept. Protein Res., 42, 171-180 (1993).
97-105 (1993). G. V. Nikiforovich and G. R. Marshall, Int. J.
P. A. Greenidge, T . C. Jenkins, and S. Neidle, Pept. Protein Res., 42, 181-193 (1993).
Mol. Pharrnacol., 43,982-988 (1993). P. Poulin and F. P. Theil, J. Pharm. Sci., 91,
M . G. Cardozo and A. J. Hopfinger, Mol. Phar- 1358-1370 (2002).
macol., 40, 1023-1028 (1991). G. M. Keseruu a n d L. Molnar, J. Chem. Inf.
M . J. J. Blommers, C. B. Lucasius, G. Kate- Comput. Sci., 42,437-444 (2002).
man, and R. Kaptein, Biopolymers, 22, 45-52
(1992).
H. van de Waterbeemd, Curr. Opin. Drug Dis- .
cov. Dev., 5, 33-43 (2002).
A. G. Palmer I11 and D. A. Case, J. Am. Chem. J. Langowski and A. Long, Adv. Drug Deliv.
SOC.,114,9059-9067 (1992). Rev., 54,407-415 (2002).
K. Boehncke, M. Nonella, K. Schulten, and S. Ekins and J. Rose, J. Mol. Graph. Model.,
A. H.J. Wang, Biochemistry, 30, 5465-5475 20,305309 (2002).
(1991).
T . I. Oprea, I. Zamora, and A. L. Ungell,
J. Xingand H. L. Scott, Biochem. Biophys. Res. J. Comb. Chem., 4,258-266 (2002).
Commun., 165,l-6 (1989).
H. E. Selick, A. P. Beresford, andM. H. Tarbit,
T. R. Stouch, K. B. Ward, A. Altieri, and A. T . Drug Discov. Today, 7, 109-116 (2002).
Hagler, J. Comput. Chem., 12, 1033-1046
A. P. Li and M. Segall, Drug Discov. Today, 7,
(1991).
25-27 (2002).
H. L. Scott and S. Kalaskar, Biochemistry, 28,
3687-3691 (1989). A. Kulkarni, Y . Han, and A. J. Hopfinger,
J. Chem. Znf. Comput. Sci., 42,331342 (2002).
P. S. O'Shea and R. Matela, Biochem. Soc.
Trans., 14,1119-1120 (1986). R. D. Brown, M. Hassan, and M. Waldman, J.
Mol. Graph. Model., 18, 427-437,537 (2000).
D. M. Kroll and G. Gompper, Science, 255,
968-971 (1992). 0.Roche, P. Schneider, J. Zuegge,W . Guba, M.
Kansy, A. Alanine, K. Bleicher, F. Danel, E. M.
L. I. Krishtakik, V.V. Topolev, and Y . I. Khar- Gutknecht, M . Rogers-Evans, W . Neidhart, H.
kats, Biophysics, 36,257-262 (1991). Stalder, M. Dillon, E. Sjogren, N. Fotouhi, P.
E. Egberts and H. J. C. Berendsen, J. Chem. Gillespie, R. Goodnow,W . Harris, P. Jones, M.
Phys., 89,3718-3732 (1988). Taniguchi, S. Tsujii, W , von der S a d , G. Zim-
mermann, and G. Schneider, J. Med. Chem., 530. M. J. Valler and D. Green, Drug Discov. Today,
45,137-142 (2002). 5,286-293 (2000).
525. 0. Llorens, J. J. Perez, and H. 0.Villar, 531. Y. C. Martin, Farmaco, 56, 137-139 (2001).
J. Med. Chem., 44,2793-2804 (2001). 532. J. Xu and J. Stevenson, J. Chem. Inf. Comput.
526. A. Cheng, D. J. Diller, S. L. Dixon, W. J . Egan, Sci., 40, 1177-1187 (2000).
G. Lauri, and K. M. Merz, Jr., J. Comput.
Chem., 23,172-183 (2002). 533. J. S. Mason and B. R. Beno, J. Mol. Graph.
Model., 18,438-451,538 (2000).
527. T . I. Oprea, J. Cornput.-Aided Mol. Des., 14,
251-264 (2000). 534. T . I. Oprea and J. Gottfries, J. Mol. Graph.
528. T . Olsson and T . I. Oprea, Curr. Opin. Drug Model., 17,261-274,329 (1999).
Discov. Dev., 4, 308-313 (2001). 535. A. K. Mandagere, T . N . Thompson, and
529. D. Gorse and R. Lahana, Curr. Opin. Chem. K. K. Hwang, J. Med. Chem., 45, 304-311
Biol., 4,287-294 (2000). (2002).
CHAPTER FOUR
Drug-Target Binding Forces:

Advances in Force Field
Approaches
PETER A. KOLLMAN
University of California
School of Pharmacy
Department of Pharmaceutical Chemistry
San Francisco, California
DAVID A. CASE
The Scripps Research Institute
Department of Molecular Biology
La Jolla, California
Contents
1 Introduction, 170
2 Energy Components for Intermolecular
Noncovalent Interactions, 171
2.1 ~lectrostaticEnergy, 171
2.2 Exchange Repulsion Energy, 172
2.3 Polarization Energy, 173
2.4 Charge Transfer Energy, 173
2.5 Dispersion Attraction, 174
2.6 Summary, 174
3 Molecular Mechanics Force Fields, 174
3.1 Biochemical Force Fields, 175
3.2 Force Field Models for Simple Liquids, 176
3.3 Nonadditive and More Complex Models, 176
3.4 Long Range Electrostatic Effects, 177
4 Thermodynamics of Association, 177
4.1 Gas Phase Association, 177
4.2 Solvation Effects, 177
4.3 An Illustrative Example: Protonation of
Amines, 179
5 Calculating Free Energies, 180
6 Examples of Drug-Receptor Interactions, 181
6.1 Biotin-Avidin, 181
6.2 Dihydrofolate Reductase-Trimethoprim, 183
Burger's Medicinal Chemistry and Drug Discoz'cry 6.3 Nucleotide Intercalator, 183
Sixth Edition, Volume 1: Drug Discovery 7 Summary, 183
ISBN 0-471-27090-3 0 2003 John Wiley & Sons, Inc.
169
Drug-Target Binding Forces: Advances in Force Field Approaches
1 INTRODUCTION the Na, dimer; this interaction is somewhat

stronger than a typical hydrogen bond but has
This chapter describes the forces that hold to- about the same shape. Also shown is the
gether complexes between large and small purely nonbonded interaction between two
molecules, particularly where the large mole- oxygen atoms in different water molecules.
cule is a protein or nucleic acid and the small Here the Dovalue is so small (about 0.15 kcall
molecule is an inhibitor or substrate. Forces mol) that it really cannot be seen on the scale
between atoms are conventionally divided into of this figure. Hence, a significant fraction of
the two categories of covalent and noncovalent nonbonded interactions can be broken at
"bonds." A covalent bond is an attractive in- equilibrium a t room temperature. It is this
teraction between two atoms in which each weakness of noncovalent bonds that makes
contributes a valence electron. For example, them so useful in biological processes, because
such a bond is formed between two hydrogen a small change in the chemical environment
atoms to make the H, molecule: H + H -, (such as temperature, concentrations, or ionic
H-H. It also includes what most chemists strength) can form or break such a bond. Prob-
might consider "ionic" bonds such as Na + C1 ably the best known important noncovalent
+ Na-C1, even though the valence electron bonds are those between the strands of DNA,
pair in this case is much closer to the chlorine where hydrogen bonds hold the double helix
atom than to the sodium atom. The conven- together. When the cells begin to replicate,
tional study of chemical reactions is devoted to chemical signals (e.g., proteins binding to the
describing the strengths of covalent bonds and DNA) shift the equilibrium to the single-
to understanding the ways in which they are stranded DNA, breaking these hydrogen
formed and broken (1). bonds. Other important examples of noncova-
Drug-receptor interactions, on the other lent complexes include those between enzyme
hand, are generally influenced most by and substrate, "receptor" protein and hor-
weaker, noncovalent "bonds," where electron mone, antibody and antigen, and intercalator
pairs are "conserved" in reactants and prod- and DNA.
ucts. Examples of such interactions are "da- Much of our concern in this chapter is with
tive bonds," e.g., H3N: + BH, -+H3N:BH3and the interaction:
hydrogen bonds, e.g., H,O + H,O -,
H,O - .HOH. It is these noncovalent bonds kf
that provide the "force" to make drugs inter- drug + receptor + complex
act strongly with their targets. k,
Some sample potential energy curves for
covalent and noncovalent interactions be- The rate constant for association of the
tween two atoms are given in Fig. 4.1. The left
complex is k,; the rate constant for dissocia-
side shows an interaction curve for the two
tion of the complex is k,; and the affinity, or
oxygen atoms in the 0, molecule. This has a
association constant K,, = k,/k,. It is usually
large dissociation energy Do(about 117 kcall
mol in this case), so that at room temperature assumed that the biological activity of a drug
where RT approximates 0.6 kcallmol (Ris the is related to its affinity K,, for the receptor,
universal gas constant and T is the absolute although there are processes such as actino-
temperature), the fraction of "broken" bonds mycin D-DNA interactions in which the rate
at equilibrium e-DO'RTis very small. By con- of dissociation k, is more relevant to the bio-
trast, noncovalent bonds are much weaker, logical activity (2, 3).
typically 1-10 kcal/mol, and thus much easier The thermodynamic parameters of interest
to break. The right side of Fig. 4.1 shows infor the reactions above are the standard free
teraction curves for the two sodium atoms in energy (AG"), enthalpy (AH"), and entropy
2 Energy Components for Intermolecular Noncovalent Interactions
150 - I I I
100 - -
-
-100 - v 1 1.5 2
Atom-atom distance, ang.
02
2.5 2 3 4
Atom-atom distance, ang.
5
Figure 4.1. Potential energy curves for atom-atom interactions in 0,, N+, and the 0--0 interac-
tion in a water dimer. Note the different energy scales on the left and right.
(AS") of association. These are related by the 2 ENERGY COMPONENTS FOR

INTERMOLECULAR NONCOVALENT
INTERACTIONS
AGO = -RT ln K,
Quantum mechanical calculations on small
0
AG" = AH - TAP molecule association suggest that there are
five major contributions to the energy of inter-
This measurement of K, allows one to cal- molecular interactions in the gas phase (3,4).
culate AG", the free energy of association of The sum of these is the dissociation energy of
the complex. To find AH" and AS" separately the intramolecular complex represented in
requires a determination of K, as a function Fig. 4.1.Table 4.1 contains some examples of
of temperature (if AH" and AS" are relatively magnitudes of the different energy ComPo- .
temperatureindependent, a plot of in K, vs. nents for different interactions. This section
~ yield AH" and AS") or a calorimetric
1 1can provides a qualitative introduction to these
measurement of AH" directly. Because AH 0 forces. Section gives and overview of mathe-
and AS" themselves are often quite tempera- matical models suitable for computer calcula-
er experiment is more tions.
2.1 Electrostatic Energy
This chapter provides some background
Given information on the charge distribution
forces that hold molecules together,
of two molecules A and B, we can evaluate the
hasis on the mmcovalent interac- electrostatic interaction energy between
ter& in b i o l 0 ~ 7and to them. Although nuclei can be treated as point
erimental determinations of the positive charges, the negative charge of elec-
amics of association to the forces trons is smeared out over space. Thus, a rigor-
the association. The discussion in
ous evaluation of the electrostatic energy in-
der of the chapter is divided into volves an integration over the electron clouds
0 parts. First, we discuss the forces that of the two molecules. In most practical calcu-
in the gas phase and lations, however, the electrons as well as the
describe how these forces can be nuclei are represented by point charges,
hematically modeled by fairly simple func- whose position and magnitude are usually
, we discuss biological examples chosen to reproduce known molecular proper-
covalent interactions and analyze the ties. The strength and the directionality of
g forces in particular cases. A. . .B electrostatic interactions are usually
172 Drug-Target Binding Forces: Advances in Force Field Approaches
Table 4.1 Some Examples of Interaction Energies of Noncovalent Complexes (kcaVmol)

Interaction Energies
Interaction -A-, AEes AEdis AEex AEd AJL
H e . . . He
Xe . . . Xe
C6H6.. . C6H6
H,O . . . H,O
TCNE . . . OH,
Lif . . . OH,
F-...OH,
NH: . . . F-
-AE, calculated (or experimental) total interaction energy equal to Doin Fig. 1, kcdmol; AE,,, electrostatic energy;
AEdis,dispersion energy; AE,,, exchange repulsion energy; AE,,,, polarization energy; AE,,, charge transfer energy (valuesin
parentheses are estimated; TCNE, tetracyanoethylene).
"See Karplus and Porter (12).
bSeeJanda et al. (13).
'See Umeyama and Morokuma (7); this value for AE is certainly too large; see better values in Table 3.
a al. (13).
dSee ~ o r o k u m et
'See Kollman (14).
dominated by the first nonvanishing multi- ergy between them dies off as l/Rntm+' . The
pole moment M, of the charge distribution, electrostatic interaction energy between wa-
ter a dipolar molecule (n = 1) and benzene,
no. charges whose first nonvanishing moment is a quadru-
Mn = C qiC pole (m = 2), dies off as 1/R4.
i=l
2.2 Exchange Repulsion Energy
where q iare the individual charges and ri is The Pauli principle keeps electrons with the
the vector from the origin of the coordinate same spin spatially apart. This principle ap-
system to the ith charge (5,6). Molecules that plies whether one is dealing with electrons on
are charged have a nonzero zeroth moment the same molecule or on different molecule's
M,. Ionic crystals such as NafC1- are held and is the predominant repulsive force (6) that
together predominantly by electrostatic at- keeps electrons of different molecules from in-
traction between oppositely charged ions. terpenetrating when noncovalent complexes
Crystals of ice I are mainly held together by are formed. This repulsive term is often repre-
dipolar electrostatic forces where Mo = 0 and sented by an analytical function of the form
MI # 0,because there are virtually no ions in
these crystals. It should be noted here that
"hydrogen bonding" is not a separate energy
component; typically hydrogen bonds contain
important energy contributions from all five where R is the distance between molecules or
energy components, although the electrostatic nonbonded atoms and A is a constant that de-
component is usually the largest contributor pends on the atom types. However, the best
to this interaction (7). available quantum mechanical calculations
Of the intermolecular energy components, suggest that this repulsion should diminish
the electrostatic is the longest range (i.e., it with an exponential dependence on the dis-
dies off most slowly with distance as the two tance between the atoms (6).This difference is
molecules separate). Ion-ion interactions die only important for very precise calculations:
off as 1/R; ion-dipole as 1/R2; dipole-dipole as the key point is that the repulsive energy rises
1/R3, etc. In general, if two molecules have as very quickly once the electrons from two dif-
their first nonvanishing multipole moments ferent atoms overlap significantly. Roughly
M, and M, the electrostatic interaction en- speaking, this happens with the distance be-
2 Energy Components for Intermolecular Noncovalent Interactions 173
Table 4.2 Selected Atomic van der W a d s izabilities are additive to a good approxima-
Radii (in A) tion (B)], and it is roughly proportional to the
Element ~VDW number of valence electrons, as well as on how
tightly these valence electrons are bound to
Hydrogen
Carbon the nuclei. Umeyama and Morokuma (9) have
Nitrogen calculated the ion-induced dipole contribution
Wgen to the proton affinities of the simple alkyl
Fluorine amines. They attributed the order ofgasphase
Phosphorus proton affinities in the alkyl amines [NH, <
Sulfur CH3NH, < (CH3),NH < (CHJ3N1 to the
Chlorine
greater polarizability of a methyl group than a
Bromine
hydrogen. A simple estimate using the above
Values from A. Bondi, J. Phys. Chem. 68,441(1964). empirical equation for an ion-induced dipole
interaction with q = +1, which is the differ-
tween two atoms is less than the sum of their ence in polarizabilities of a methyl and a hy-
van der Wads radii. Table 4.2 gives some typ- drogen (Aa) = 4 cm3, a proton-methyl dis-
ical radii for atoms commonly found in organic tance of 2.0 A, and a proton-proton distance of
molecules. 1.6 A, leads to an expected increase of --20
kcal/mol of proton affinity for every methyl
2.3 Polarization Energy group added to NH,. This very qualitative es-
timate is of the right magnitude but about two
When two molecules approach each other,
there is charge to three times too large (see below).
- redistribution within each mol-
ecule, leading to an additional attraction be-
tween the molecules. The energy associated 2.4 Charge Transfer Energy
with this charge redistribution is invariably
attractive and is called the polarization en- When two molecules interact, there is often a
ergy. For example, if a molecule with polariz- small amount of electron flow from one to the
ability a is placed in an electric field, E, the other. For example, in the equilibrium geom-
polarization energy is etry of the linear water dimer HO-H. . .OH2,
the water molecule that is the proton acceptor
1 has transferred about 0.05e- to the proton do-
EP O
= --
~ 2 nor water (9, 10). The attractive energy asso-
ciated with this charge transfer is the charge
If the electric field is caused by an ion, then transfer energy and can be thought of as a
E = qi/R2, where q is the ionic change, i is the mixing of an ionic resonance structure
unit vector along the ion-molecule direction, H a ( - ' . . .H---OH,(+' into the overall wave
d R the ion-molecule distance, which is the function. Although the charge transfer energy
,
= -1/2aq 2/R 4 for this ion-induced dipole is an important contributor to the interaction
tion. The corresponding formula for di- energy of most noncovalent complexes, the
le-induced dipole interaction between two presence of a "charge transfer" electronic
molecules is transition in the visible spectrum does not
mean that the charge transfer energy is the
a1P; + ~ Z C L : : predominant force holding the complex to-
E P O ~= - 21
-
R6 gether in its ground state. For example, the
complex between benzene and I,, earlier
re the j~'sare the dipole moments of the thought to be a prototype "charge transfer"
ecules, the a's are their polarizabilities, complex, seems to be held together predomi-
R is the distance between molecules. The nantly by electrostatic, polarization, and dis-
zability of a molecule can be broken persion energies in its ground electronic state
into atomic contributions [atomic polar- (11).
2.5 Dispersion Attraction lecular environment to significant energies;

for example, the single largest attractive free
There are attractive forces existing between
energy contribution to binding in the stron-
all pairs of atoms, even between rare gas at-
gest known small molecule-macromolecule
oms (He, Ar, Ne, Kr, Xe), which cause them to
interaction (biotin-avidin) is the dispersion at-
condense at a sufficiently low temperature.
traction (13).
None of the other attractive forces (electro-
One might intuitively expect that benzene
static, polarization, charge transfer) can ex-
dimer would pack together like two flat plates,
plain the attraction between rare gas atoms; it
but this is not the case in the gas phase (14);
is called the dispersion attraction (12). Even
the crystal structure also does not have paral-
though the rare gas atoms have no permanent
lel alignments of benzene molecules (15). Ben-
dipole moments, they are polarizable, and one
zene, although having no dipole moment, does
has instantaneous dipole-dipole attractions
have a quadrupole moment (M, # 0).A simple
in which the presence of a locally asymmetric
way to think about this quadrupole moment is
charge distribution on one molecule induces
to realize that a benzene C-H is somewhat
an asymmetric charge distribution on the
electropositive and its electron cloud is rather
other molecule, e.g., '-Hes+ . . .'-Hes+.
electronegative. A second benzene molecule
The net attraction is called dispersion at-
would like to approach the first one so that its
traction (often known as London or van der
"electropositive" side approaches the other
Wads attraction) and is dependent on the po-
molecule's "electronegative side." Hence the
larizability and the number of valence elec-
main component of binding is expected to be
trons of the interacting molecules. It dies off
electrostatic in nature. The water dimer
as 1/R6,where R is the atom-atom separation.
(H,O), and the ether. . .TCNE interactions
The difference between this attraction and the
are examples of prototypal H bonds and
polarization energy is that the latter involves
"charge transfer" complexes, but both are also
the interaction of a molecule that is already
held together mainly by electrostatic forces,
polar with another polar or nonpolar mole-
although the other attractive energy compo-
cule.
nents contribute significantly to the total AE.
The electrostatic component is predominant
2.6 Summary
in determining all the structural parameters
Having described the components of the inter- except the distance between molecules. Simi-
action energies, let us consider a number of larly, the geometry and net attraction between
specific examples in detail (Table 4.1). Unlike Li+ and OH,, F- and H,O, and NH,+ and F-
the total interaction energy, which can be are dominated by the electrostatic energy
measured experimentally, the individual en- component.
ergy components cannot. The theoretical esti-
mate of these quantities is often dependent on
the method of calculation, but their qualita- 3 MOLECULAR MECHANICS FORCE
tive features are usually independent of meth- FIELDS
odology.
Rare gas-rare gas interactions (He. . .He We move now from qualitative considerations
and Xe. . .Xe) have only dispersion attraction. to a more quantitative approach. It has be-
The difference between the potential well come clear that a simple molecular mechanical
depth of He. . .He and Xe. . .Xe (Fig. 4.1; Do)
at energy expression can represent noncovalent
the equilibrium distance is caused by the interactions surprisingly well (16). Such en-
greater polarizability of the xenon atoms, and ergy expressions contain only the first three
thus to the greater dispersion attraction be- terms mentioned above: electrostatic, ex-
tween them. A simple manifestation of this is change repulsion, and dispersion. By a suit-
the much higher boiling point of xenon than able choice of parameters, change transfer and
helium, caused by the greater attractive forces polarization effects are implicitly included in
in xenon liquid. Although these energies are such an expression, which is simple and easy
individually fairly small, they can add in a mo- to evaluate, along with its derivatives, for
3 Molecular Mechanics Force Fields
molecules with thousands of atoms. Over the On the other hand, biochemists, guided by
past quarter century, many interesting ap- an interest in proteins and nucleic acids, have
plications of such molecular mechanical more generally followed a "bottom up" ap-
methods to complex molecules have been proach (16,19,20).This approach focuses first
carried out (17). on the atomic charges q,. The most general
The ideas that are outlined in a qualitative method to derive the atomic charges is to fit
way above can also be cast into a useful math- them to quantum mechanically calculated
ematical form for computer calculation. The electrostatic potentials on appropriately cho-
basic idea is to write down a (fairly simple and sen molecules or fragments. In early attempt
approximate) function that gives the energy of to do this, computational limitations in quan-
the system as a function of the positions (or tum mechanical calculations led to the use of a
coordinates) of its atoms. Because the deriva- minimal basis set STO-3G to derive the q i(16).
tive (or gradient) of this function yields the More recent efforts have used a 6-31G* or
forces for Newton's equations, such a function larger basis set (19). The 6-31G* basis set has
is often called a "force field"; and because mol- the fortunate property in that it leads to
ecules are viewed as being made up of balls charges (dipole moments) that are enhanced
and springs (so that quantum effects are ig- over accurate gas phase experimental values,
nored), the term "molecular mechanics" is and thus, implicitly builds in "polarization"
used to represent a concrete, mechanical pic- effects characteristic of polar molecules in
ture of molecular motions and energies. condensed phases. The fact that this basis set
enhances the polarity just about the same
3.1 Biochemical Force Fields amount as the popular water models TIP3P
Equation 4.1 represents about the simplest (21) and SPC (22), (where the charges are em-
functional form of a force field that preserves the pirically adjusted to reproduce the water en-
essentialnature of molecules in condensed phases. thalpy of vaporization) is a fortunate fact and
WR) = 2 Kr(r - re,)' bond

bonds
+ C Kd 0 - eeq)'
angle
angles
+ C Vn
T (I+"
dhedrals
atoms
+ C jA,j p qB, van der Wads

2 <J
atoms
+C% electrostatic
L <J
ERV
The earliest force fields, which attempted is key in leading to balanced solvent-solvent
describe the structure and strain of small and solvent-solute interactions.
rganic molecules, focused considerable atten- van der Wads parameters are generally
on on more elaborate functions of the first dominated by the inner closed shell of elec-
terms, as well as cross terms (18),repre- trons and thus are fortunately far more trans-
ing a "top down" philosophy. ferable than atomic charges. Therefore, gener-
ally only one set of van der Waals parameters drocarbons, N-methyl acetamide, and di-
(radius and well depth) per atom type need be methyl sulfide, as well as the liquid structure
employed, with the important exception of hy- and energy of methanol and N-methyl acet-
drogen (23). Unfortunately, it is harder to de- amide, show good agreement with experi-
rive van der Wads parameters than charges ment, with little or no adjustment of parame-
using a b initio quantum mechanics (6, 24). ters. For example, Fox and Kollman (25) have
The alternative that has emerged as a general shown that this approach leads to a density
model is to empirically calibrate results to fit and enthalpy of vaporization of liquid di-
experimental liquid structures and enthalpies methyl sulfoxide (DMSO) within 2% of exper-
(25). iment, using restrained electrostatic potential
Continuing with the "bottom up" develop- charges (RESP) and van der Wads parame-
ment of a force field, we come to the torsion ters taken without modification from the cor-
energy term, where the V, and y either come responding values in proteins. Similar results
from experiment or quantum mechanical cal- have been obtained for other organic liquids.
culations on small molecule models. Whereas
"top down" force fields often use many terms 3.3 Nonadditive and More Complex Models
in the Fourier series for rotation around a
given bond type and attempt to reproduce the What are the most important weaknesses in
conformational energy for a collection of mol- the above-described parameterizational ap-
ecules, most "biochemical" force fields take a proach and the use of Equation (4.1)? In our
minimalist approach (16,19,20).For example, opinion, the main ones are the use of an effec-
we would have only a single V3 torsional term tive two-body potential and the use of only
around an X-C-C-Y bond except when X or Y atom-centered charges.
are electronegative, where another term can
be rationalized from electronic effects and can 1
atom
be derived directly using quantum mechanical E,, =- 3 2 pi Ep) polarization (4.2)

calculations. This helps our model to be more i
easily generalized to new molecules, albeit in
some cases probably at the cost of some accu- where pi is atomic polarizability. Substantial
racy. Exceptions to this minimalist approach progress has been made in laying the founda-
are the +, 4 of peptides and x of nucleic acids, tion for the development of a complete force'
where more terms were added to ensure as field including explicit nonadditive effects
accurate as possible a reproduction of the con- (adding Equation 4.2 to Equation 4.1). First,
formational energies around these key bonds. we have shown that such models, in contrast
Finally, to ensure reasonable representa- to additive models, lead to good agreement
tion of bond and angle terms, we use empirical with experimental solvation free energies of
data (structures and vibrational frequencies). representative organic ions CH3NH3+ and
The use of this simple harmonic model pre- CH3C0,- without any adjustment of van der
cludes high accuracy, but in our opinion, one Waals parameters (26). Second, we have
would compromise the simplicity and general- shown that such nonadditive terms are essen-
ity of the model with more complex functional tial in accurately describing cation-.rr interac-
forms. tions (27). Third, we have shown that one can
equally well describe liquid CH30H and N-
3.2 Force Field Models for Simple Liquids
methyl-acetemide (NMA) with additive mod-
A key test of this approach is the ability to els or a nonadditive model in which the
accurately reproduce liquid structures and en- charges are uniformly reduced (by 0.88) (28).
ergies and free energies of solvation; these Finally, the interaction free energy of Li+ with
have traditionally been considered as key ele- hexaanisole spherand is more accurately de-
ments in the development of successful force scribed by nonadditive than additive molecu-
fields for liquids (25). The aqueous solvation lar mechanical models (29). In addition, con-
free energies of a large number of molecules, sidering off-center charges in electrostatic
including substituted benzenes, methanol, hy- potential fit models of atoms with "lone pairs"
4 Thermodynamics of Association
shows that they can often be important in by six (six translations and six rotations in the
leading to very accurate description of H bond free molecules, three of each in the complex)
directionality (30). during complex formation, and replacing
these with vibrations, which have lower entro-
3.4 Long Range Electrostatic Effects pies (33).
To accurately describe the energy and struc-
4.1 Gas Phase Association
ture of complex systems, not only are the func-
tional form and parameters of molecular mod- For example, at 300 K, two CH, molecules
els described by Equations 4.1 and 4.2 have a translational entropy of 69 eu (entropy
important, but also the manner in which the unit, or caVK) and a rotational entropy of 31
long range electrostatic effects are repre- eu, whereas (CH,), has a translational en-
sented. The standard approach is to use anon- tropy of 37 eu and a rotational entropy (as-
bonded cutoff for both electrostatic and van suming a C. . .C distance of 4 & of 22 eu. Thus,
der Wads interactions, which seems to be a one can see that the translational and rota-
reasonable method for proteins but seems to tional entropy contributions to the reaction
be a poor method to describe highly charged 2CH4 -,(CH,), is -41 eu. These six degrees of
molecules such as nucleic acids. For periodic freedom become vibrations in the complex
systems, Ewald methods (which are too com- (CH,),, and as such, might contribute a vibra-
plex to be described here) have been known for tional entropy of about 20-30 eu. Thus, for the
-
a long time to remove most of the artefacts dimerization of CH, in the gas phase, we ex-
arising from cutoffs, and impressive efficiency pect TAS" of about -3 to -6 kcal/mol at 300 K.
and accuracy of a variant called particle-mesh As stressed in the second law of thermody-
Ewald (PME) has been demonstrated for pro- namics, the tendency for a chemical process to
tein crystals (31) [0.3Arms deviation from the occur is governed both by the energy released
observed crystal structure for bovine pancre- (exothermicity) in the process and the entropy
atic trypsin inhibitor (BPTI) in a 1-nssimula- gained (the tendency of the reaction to go to a
tion with an increase in computer time of only more random, disordered state). In the case of
~ 5 0 %over standard cutoff methods]; the gas phase association, the energy term is in-
PME method also leads to accurate simula- variably exothermic if the reactants approach
tions of proteins, DNA, and RNA in solution each other in an appropriate orientation, and
(32). the entropy term is always negative, opposing
association. Table 4.3 gives an example of the
thermodynamics of association of water mole-
4 THERMODYNAMICS OF ASSOCIATION
cules in the gas phase. As one can see, the
entropy (AS") contribution to association of
We have focused mainly on the energy of asso- water molecules in the gas phase is substantial
ciation between molecules; in any drug- recep- and negative; thus, there is little tendency for
tor interaction, we typically want to know water molecules to associate in the gas phase
the equilibrium constant for association K, at room temperature and 1atm pressure, even
and the free energy of association AGO. The though the hydrogen bond energy is about 5
difference between the free energy (AGO) and kcal/mol.
energy
-- (AEO) of association is given by
AG" = AH" - TAP, and AH" = AEO-+ (APV). 4.2 Solvation Effects
For gas phase associations, (APV) is N-RT,
which is -0.6 kcal/mol at room temperature. The thermodynamic cycle (Fig. 4.2) illustrates
Thus, this term, when added to AE, favors as- the problems we face in transferring our
sociation (the more negative AG, the greater knowledge of gas phase intermolecular inter-
tendency for association). However, AS, the actions to solution phase phenomena.
entropy of association, is typically large and Our real interest is in AG,, the solution
negative. The reason is that one is reducing phase free energy of association. Until now,
the "floppy" degrees of freedom, which have our discussion has focused on the energy
large translational and rotational entropies, (AE,), enthalpy (AH,), and free energy (AG,)
Table 4.3 Thermodynamic Functions for step taken by Kauzmann (35) in his classic
Gas Phase Association of Water Molecules: paper on the forces that affect protein stability
2H2O + (HzO)z and structure. He examined the thermody-
Thermodynamic Value for H,O namics of association and solution of small
Function Dimerization (kcal/mol) nonpolar molecules in aqueous solution. The
AE" (0 K)" -6.2
associations were characterized by a largepos-
AE" (300 K)" -4.2 itive entropy term and the solution by a large
AH0 (300 K)" -5.2 negative entropy, with the enthalpy terms less
AS" (300 K)b -9.0 important. Thus, the well-known lack of solu-
AG" (300 K) +3.8 bility of hydrocarbons in water was not caused
by a net loss of hydrogen bonds; the hydrocar-
"See Joesten and Schaad (13).
'~stimatedusing the vibration frequencies employed bons cause the water molecules to become
by Joesten and Schaad (14). more ordered (thus to lose entropy) so that
they can still find a good hydrogen bond part-
ner (AH of solution of these hvdrocarbons
" is
of association in the gas phase. To be able to often negative, but much smaller in magni-
calculate AG,, we need to know AG,, the sol- tude than the TAS of solution). By coming to-
vation free energy of the drug-receptor com- gether in aqueous solution, these hydrocar-
plex; AG2,, the solvation free energy of the bons "release" some H20's, and this favorable
drug; and AG, the solvation free energy of TAS association is the driving force for this
the receptor. These solvation free energies are association. It is generally agreed that this
the free energies gained (or lost) by taking the "hydrophobic" effect of hydrocarbon groups is
molecule from a standard concentration in the a key feature in many drug-receptor associa-
gas phase to a corresponding concentration in tions. A lucid description of hydrophobic
solution. Using the thermodynamic cycle in forces is given by Jencks (36) and Dill (37).
Fig. 4.2, it follows that Computer simulation approaches have
proven very useful in enabling calculation of
the association of molecules. For example, the
association of two methane molecules in the
Similar relationships hold for AH, and AS,. gas phase would lead to a AEo (0 K) of N ~ l
There is no reason to expect AG, and AG, to be kcal/mol, and by analog with water dimer (Ta-
similar, so we face the problem of estimating ble 4.3), a very positive AG" (300 K) and thus
AGw, AG, and AG,. We cannot measure no tendency for association. In aqueous solu-
AG2, or AG,, because this would require us to tion, one can calculate, using modern statisti-
vaporize a measurable amount of a receptor or cal mechanical simulation methods, the po-
drug-receptor complex. For most polar and tential of mean force for association of two
ionic drugs, AG, is not measurable either. molecules, which is the free energy as a func-
Therefore, one resorts to measuring the free tion of molecular se~arationin solution. Al-
energy of transfer from octanol to water though there is some controversy about
AG,(oct) rather than the free energy of
transfer from the gas phase to water, AG2,.
This situation underlies the postulate of the
Hansch approach (34), which suggests that
the differences in AGW(oct)[AAGW(oct)lmay
be related to the biological activity of drugs,
and in many cases this desolvation (water +
octanol) does indeed seem to be related to drug
binding and/or biological activity.
Because the individual free energies in
Equation 4.3 are so hard to measure, one is led Figure 4.2. A schematic representation of the
to smaller model systems to analyze the major thermodynamic cycle for molecular association in
driving force for drug- receptor association, a the gas phase and in solution.
4 Thermodynamics of Association 179
whether there are both "solvent-separated" either a hydrophobic or an ionic association

and "contact" minima for two methane mole- (35). In either case, the driving force for asso-
cules in aqueous solution, there is no question ciation is likely "release" of H,O from "tight"
that methane association is quite attractive in binding to the solute.
aqueous solution compared with the gas phase One final consideration in determining ei-
(38). ther gas phase or solution phase association
One can also apply such approaches to constants of drug-receptor complexes is con-
study association of ionic and polar molecules. formational flexibility. Medicinal chemists
For example, the association of Naf and C1-
-
have often attempted to synthesize rigid drug
has a free energy of association that is very of different stereochemistries in the hopes of
small in magnitude, in contrast to the gas finding one that fits "perfectly" into the recep-
phase (39). The association of two amides tor site. If, for example, the drug has three
through a C=O . . . H- N hydrogen bond is
equal energy conformations and only one can
very favorable in vacuo and progressively less
favorable in non-polar and aqueous solution
fit the receptor site, a price must be paid of
(40). Thus, water has a significant "leveling" AG = +RT in 3 in binding free energy relative
effect on association, making nonpolar associ- to the drug that is "locked" in the right con-
ations more favorable and ionic and polar as- formation. If the receptor has to be locked in a
sociations less favorable than their gas phase conformation to "accept" the drug, one must
counterparts. pay a similar free energy price. A nice example
Let us now summarize the foregoing dis- of the latter situation is the difference in bind-
cussion. Unlike the gas phase association, ing free energies between "locked" and "un-
where AH, and AS, are invariably negative, locked" macrocyclic crown ethers (41) that
for the corresponding thermodynamics in so- bind t-BuNH,+ cation.
lution, AH, and AS, can be of either sign. The
enthalpy of association AH, of two molecules 4.3 An Illustrative Example: Protonation
in solution will bepositive if the interactions of of Amines
the solvent with the uncomplexed drug and
receptor are sufficiently stronger and more Before we turn to some examples of drug-re-
exothermic (AH, - AH,, - AH, is more pos- ceptor interactions, let us present a specific
itive than AH, is negative) than are the inter- example of the difference between gas phase '
actions of the solvent with the drug-receptor and solution interactions. We choose the pro-
complex. Similarly, the entropy of association tonation of amines, because of the large liter-
in solution AS, can be positive if AS, - AS, - ature that attempted to explain the irregular
ASm is more positive than AS, is negative. order of pKa's of the alkyl amines [NH, =
This can come about if the entropy gain from 9.25; CH,NH, = 10.66; (CH3),NH = 10.73;
release of solvent from its interaction with the and (CH,),N = 9.811. This reaction can be rep-
isolated drug and receptor is sufficiently resented as
larger than the entropy gain from release of
solvent from the drug receptor complex.
An additional important point to keep in
mind is that the solution phase thermodynam-
ics may be dominated (as in the case of the
hydrophobic effect, the association of nonpo- in the gas phase and
lar solutes in water) by changes in solvent-
solvent interactions in the presence of solute.
It is also important to stress that even an R,N(aq) + H+(aq)+R3NHf(aq)
alysis of the relative contributions of AH
d AS to AG may not give definitive insight in aqueous solution. As we noted in connection
nto the "nature" of the drug-receptor bond. with Fig. 4.2, the difference between the free
or example, a large positive AS (and small energies of protonation in solution and the gas
egative AH) for association might come from phase is given by
Table 4.4 Free Energies in Cycle (Fig. 4.2) for Protonation of Alkyl Amines (kcaVmo1)"
"See Aue et al. (42).
To calculate the relative solvation free en-

ergies of molecules A and B in solvent S, we
can use a thermodynamic cycle such as in Fig.
4.3. The relative solvation free energy of A and
Recall also that the solution pK, = -log K, B, determined experimentally, is AAG,, =
= AG40/2.3RT. When the gas phase basicities
AG,,,, (B) - AG,, (A), and because the free
were measured and showed a regular order, it
energy is a state function, AAG is also = AG,
was clear that the irregular order in solution (S) - AG, (g), which are the free energies
was caused by a solvation effect. In the gas
determined by computational means by "mu-
phase, NH, is a weaker base than (CH,), by tating" the molecular mechanical model of A
about 23 kcal/mol; in solution this difference is
into B in solvent S and in the gas phase (g). Of
only about 1 kcallmol.
course, if B consists of all "dummy" (non-in-
Table 4.4 lists the free energies appropriate
teracting) atoms, this approach leads to the
to the thermodynamic cycle (Fig. 4.2) for the
calculation of the absolute solvation free ener-
protonation of the amines. Two points deserve
gies of A.
strong emphasis.
Being able to accurately calculate free en-
ergies of solvation suggests a reasonable bal-
1. The magnitude of AG, is much smaller
ance in solute-solvent and solvent-solvent in-
than that of AG, for protonation, because
teractions. The next key challenge is to
in aqueous solution, the amines must com-
calculate AAGbin, of guests G and G' to a host
pete with H,O for the proton; in the gas
H, all in aqueous (or other) solution.
phase there is no competition. A typical cycle for free energy calculations
2. As clearly analyzed by Aue et al. (421, the (45) where H is a host, G is a guest, and HG is
smaller the protonated m i n e , the more ef- the host-guest complex is given in Fig. 4.4.
fectively solvated it is, and the better base Now one requires a correct balance of sol-
it becomes compared with its relative rank ute (host)-solute (guest), solute (host or guest)
in the gas phase. -solvent, and solvent-solvent interactions to
correctly calculate AAG,, although there
5 CALCULATING FREE ENERGIES clearly can be compensating errors in the calcu-
lation of AG,,,, and AG,.
Free energy is certainly one of the most impor-
tant concepts in physical chemistry. The
groundwork on calculating free energies was
laid by Kirkwood (43) and Zwanzig (431, and
the first key "modern" developments and ap-
plications came from the work of Postma et al.
(4.9,Jorgensen and Ravimohan (451, Tembe
and McCammon (46), and Warshel (47). The
fundamentals of computational approaches to
calculating free energies are reviewed by Bev-
eridge and Mezei (48), and we attempted to
exhaustively review applications up to 1993 Figure 4.3. Basic thermodynamic cycle for solva-
(49). tion free energy.
6 Examples of Drug-Receptor Interactions
(AGbind),using molecular dynamics to create

an ensemble average of the system. The differ-
ence between these calculated free energies
AAGbindis equal to the difference in the ob-
served relative free energies of ligand binding.
The biotin-streptavidin system provides a
"textbook case" of the relative free energies of
the binding of biotin, aminobiotin, and thiobi-
otin, as illustrated in Table 4.5. First, the cal-
culated relative free energies are in reasonable
agreement with experiment; thiobiotin is cal-
Figure 4.4. Thermodynamic cycle for host-guest
interactions. culated and observed to bind w103 or ~4 kcall
mol more weakly to streptavidin than biotin,
and iminobiotin is calculated and observed to
6 EXAMPLES O F DRUG-RECEPTOR
bind ~ 1 or 0~7 ~ kcdmol more weakly than
INTERACTIONS
biotin. What is more interesting are the en-
ergy components. Thiobiotin is easier to de-
We discuss three examples of "drug target"
solvate than biotin by ~9 kcdmol (AGSo1,)but
interactions: (1) biotin-avidin (2) dihydrofo- interacts more weakly with the protein by ~ 1 3
late reductase-trimethoprim, and (3) DNA-in- kcal/mol, leading to the observed -4 kcallmol
tercalator. The first is the strongest character- preference for biotin binding. On the other
ized protein-ligand association, the second a hand, iminobiotin is ~5 kcallmol harder to de-
prototype enzyme-inhibitor interaction, and solvate than biotin, but interacts only ~2 kcall
the third describes drugs interacting with nu- mol more weakly with streptavidin, thus lead-
cleic acids. ing AAGbina= AGbind- AGsol., = 2 - (-5) to
6.1 Biotin-Avidin
COOH
Biotin (Fig. 4.5) is involved in the strongest I
known non-covalent macromolecule-ligand
interaction. In fact, given the small size of bi-
tin, it is surprising to many that this associ-
ation is so strong (Ka corresponding to
-AG of ~ 2 kcdmol)
0 (50). The X-ray struc-
e of streptavidin (a related protein to avidin Biotin
h nearly as large a biotin affinity) biotin COOH
mplex has been solved (51). The ureido
oup of biotin was thought to be the reason
I
strong binding of this ligand
e have carried out free energy calcula-

relative binding of biotin,
obiotin, and thiobiotin to streptavidin, as Thiobiotin
as absolute free energy calculations of bi-
COOH
binding. The results of these simulations
instructive in the insight they give us into
association. These free energy calcula-
ns can best be understood by considering
c cycle in Fig 4.6. The free
rgy calculations enable one to determine
free energies of the vertical processes by lmminobiotin
ing one ligand into another in solution
) and when bound in the active site Figure 4.5. Structures of biotin and two analogs.
Drug-Target Binding Forces: Advances in Force Field Approaches
them in the streptavidin active site, which, not

coincidentally, contains four tryptophan resi-
dues.
But why don't the van der Waals interac-
tions with water lost when one moves biotin
from water to the streptavidin active site can-
cel with those gained in the active site? This
can be understood by noting, as Sun et al. (52)
and Rao and Singh (53) have, that a unique
aspect of water as a solvent is its large ex-
Figure 4.6. Thermodynamic cycle for protein-li- change repulsion contribution to AG,,,,. This
gand interactions. The experimentally measurable exchange repulsion contribution represents
free energies are AGhindl and AGhind1 (horizontal), the "hydrophobic effect," the fact that meth-
and the calculated values (AG,,,, and AG,,,) are the
ane is less stable by 2 kcallmol at a 1 M stan-
vertical processes.
dard state in water than in the gas phase. This
exchange repulsion cancels (and sometimes
its -7 kcallmol weaker binding to streptavidin outweighs) the dispersion attraction that oc-
than biotin. The above examples illustrate the curs for any solute when transferred from the
interesting tradeoff in binding and solvation gas phase to a condensed phase. On the other
effects in analysis of ligand-macromolecule in- hand, in the streptavidin binding site, preor-
teractions. ganized during protein synthesis, one gains
The fact that one loses only 4-7 kcaVmol dispersion attraction when biotin binds with-
out of the ~ 2 kcal/mol
0 in free energy of bind- out the compensation from exchange repul-
ing when mutating the ureido group to its thio sion. The magnitude of this effect is height-
and imino analog is strongly suggestive that ened by the large "atom density" both in
the "ureido resonance," suggested by the crys- biotin, with its bicyclic structure and in
tallographers (50) who solved the structure as
streptavidin, with its four tryptophan resi-
the reason for the unusually high K,,,cannot
dues. Thus, the key aspects in biotin's tight
be the main reason. Calculations on the abso-
lute free energy of biotin-streptavidin binding binding with (strept)avidinis the preorganiza-
suggest that electrostatic effects, which might tion and high atom density of the protein ac-
include ureido resonance (although perhaps tive site (54).
not all of it), contribute ~6 kcallmol to Recently, Dixit and Chipot (55) have re-ex-
AAGbi,,, whereas van der Waals effects con- amined this problem, using the improved
tribute ~ 1 kcallmol.
4 power of modern computers to expand the
The large contribution of van der Wads in- sampling of configurations. The results con-
teractions (dispersion plus exchange repul- tinue to be in good accord with experiments
sion) is surprising to many, because an indi- and offer a modern paradigm for how the con-
vidual van der Wads atom-atom dispersion vergence of these simulations can be moni-
attraction is very small. But there are many of tored.
Table 4.5 Results of Relative Free Energy Calculationsa (kcaUmo1)

AAGbind
Calc. Expb
Perturbation AGsah AGpmt 'Gprot - AGsolv AGbind2 - AGbindl
Biotin -+ thiobiotin 8.8 2 0.1 12.0 + 0.3 3.2 + 0.3 3.6

Biotin +iminobiotin +
-5.3 0.1 1.2 + 0.7 6.5 + 0.8 6.2
"Errors, where listed, correspond to half the hysteresis between forward and reverse runs.
bExperimentaldata, Ref. 51.
7 Summary
6.2 Dihydrofolate Reductase-Trimethoprim turn to a nucleic acid-small molecule interac-

tion for our last example. There have been
A classic example of a drug that works by spe- many experimental studies of the "intercala-
cies-specific protein inhibition is trimeth- tion" of flat, planar dyes into double-stranded
oprim (TMP). Because this drug binds to bac- DNA and other polynucleotides.
terial dihydrofolate reductase (DHFR) w104 The flexibility of the sugar-phosphate back-
more tightly than to the mammalian enzyme, bone allows the intercalator to be sandwiched
there is a therapeutic concentration in which between the nucleotides with relatively little
the drug can be used as an antibacterial with "strain." The interaction with polynucleotides
little deleterious consequences for a mamma- by a wide variety of intercalators has been stud-
lian host. ied by physicochemical techniques. The driving
DHFR was the fist example where one has force for association can be primarily hydropho-
solved the X-ray crystal structure of the enzyme bic, as in actinomycin D, where the driving force
protein complexes for both bacteria and mam- for association is AS" (57), or it can contain a
malian enzymes. Matthews et al. (56) have sug- large contributionfrom electrostaticeffects as in
gested that it is a key hydrogen bond involving ethidium bromide and adriamycin analogs,
the pyrimidine ring of TMP, which is present in where the driving force for association is AH"
the bacterial but not mammalian enzyme com- (60) (Table 4.6). Both molecules have binding
plex, that is responsible for the selectivity. This association constants K, to DNA of about lo6.
has not been definitively established with car- The role of dispersion binding is not clear at this
boxyclic analogs, but analogs have clearly shown point, but it is likely to be very important as well
an important role of the three methoxy groups (13).As noted above, the ability of these drugs to
in TMP in causing species selectivity. For exam- interfere with DNA replication is apparently re-
ple, the TMP analog without the three OCH, lated to their rate of dissociation k, from DNA
groups have a binding preference for the bacte- rather than to their association constant K,.
rial enzyme of only ~ 1 0 . Muller and Crothers (2) showed that both acti-
Kuyper (57)has analyzed the structure of the nomycin and actinornine had values of K , sim-
bacterial and mammalian complexes and sug-
ilar to that of DNA, but the former had a much
gested that the oxygens of the 4 C H 3 group smaller k, and a much greater effect on the rate
plays a key role in species selectivity. The me-
thoxy oxygens are sigmficantly more solvent ex-
of DNA replication. .
posed in the bacterial complex that the mamma-
lian. Thus, because these oxygens do not form 7 SUMMARY
hydrogen bonds to enzyme groups in either com-
plex, the desolvation penalty for the oxygen is The foregoing examples illustrate the likely na-
smaller in the bacterial enzyme and does not as ture of drug-receptor binding. It seems that hy-
extensively cancel the favorable hydrophobic1 drophobic and dispersion binding do contribute
dispersion effects on binding of the methoxy a substantial amount to the net binding amnity.
methyl groups. This interpretation is supported However we have noted some cases (e.g., the
by the fact that replacing the 4 C H 3 with ureido group in biotin and the intercalation of
CH,CH3 makes the molecules less species selec-
tive;such analogs bind only a little better to bac- Table 4.6 Thermodynamics of Binding
terial DHFR but significantly better to mamma- of Drugs to DNA
lian DHFR (58,591. AH0 AF A C
Free energy calculations/moleculardynarn- Drug (kcallmol) (eu) (kcallmol)
ics have and will continue to give interesting
insight into the DHFR-TMP species selectiv- Proflavin -6.7 +4.7 -8.1
Ethidium
bromide -6.2 +9.4 -9.0
Actinomycin D +2.0 +39.0 -9.6
6.3 Nucleotide lntercalator Daunomycin -6.5 +7.7 -8.8
Because our first two examples have empha- "Conditions in all cases as follow: T = 25", 0.01 M buffer,
sized protein-small molecule interactions, we p H = 7,l= 0.015[see Quadrifoglio and Creseenzi (60)l.
positively charged groups into DNA) in which 2. W. Muller and D. Crothers, J. Mol. Biol., 35,251
there might be an important polar or electro- (1968).
static driving force for binding. Again, it is diB- 3. K. Kitaura and K. Morokuma, Int. J. Quant.
cult to ascertain whether these polar contribu- Chem., 10,325 (1976).
tions come from "freeing up" water or from 4. J. C. G. M. van Duijnevelt-van der Rijdt and
direct interactions, but they seem to contribute F. B. van Duijneveldt, J. Am. Chem. Soc., 93,
in a sigmficant fashion to the driving force for 5644 (1971).
association as well as being important in deter- 5. J. Hirschfelder, C. Curtiss, and R. Bird, Molec-
mining biological specificity. The lessons for the ular Theory of Gases and Liquids, Wiley, New
medicinal chemist attempting to design a drug York, 1954.
to maximize the drug receptor association in- 6. R. H. Margenau and N. Kestner, Theory of In-
clude the following: termolecular Forces, 2nd ed., Pergamon Press,
Oxford, 1971.
1. Conformational flexibility can decrease the 7. H. Umeyama and K. Morokuma, J. Am. Chem.
association constants in a straightfor- Soc., 99, 1316 (1977).
wardly predictable way. 8. R. Lefevre, Adv. Phys. Org. Chem., 3, 1 (1965).
2. Hydrophobic effects usually contribute sig- 9. H. Umeyama and K. Morokuma, J. Am. Chem.
Soc., 98,4400 (1976).
nificantly to drug-receptor association, but
one must also consider possible specific po- 10. P. Kollman and L. C. Allen, Chem. Rev.,72,283
lar and ionic interactions. (1972).
11. M. Hanna, J. Am. Chem. Soc., 90,285 (1968);R.
3. Preorganization of the receptor or ligand is
Lefevre, D. V. Radford, and P. Stiles, J. Chem.
a key to obtaining optimal electrostatic or Soc. B, 31, 1297 (1968).
van der Wads interactions.
12. M. Karplus and R. Porter, Atoms and Molecules,
Benjamin, Menlo Park, CA, 1971.
We have tried to provide examples in this 13. K. C. Janda, J. C. Hemminger, J. W. Winna,
chapter both of the qualitative arguments that S. E. Novick, S. J. Harris, and W. Klemperer,
are important for understanding ligand-protein J. Chem. Phys., 63,1419 (1975);M. Joesten and
or ligand-DNA interactions and of some typical L. Schaad, Hydrogen Bonding, Dekker, New
numerical results arising from computer exper- York, 1974; K. Morokuma, S. Iwata, and W.
iments. Understanding these interactions is key Lathan in R. Daubel and B. Pullman, Eds., !Fhe
to the rational design of inhibitors, and a com- World of Quantum Chemistry, D. Reidel, Dor-
puter-aided approach is increasingly being used drecht, Holland, 1974, p. 277.
to screen libraries of potential inhibitors and to 14. P. Kollman. J. Am. Chem. Soc., 99,4875 (1977).
suggest improvements to lead compounds (61). 15. G. E. Bacon, N. A. Curry, and S. A. Wilson, Proc.
As force fields and sampling methods improve R. Soc. Ser. A, 279,98 (1964).
and as computers become ever-more powerful, 16. S. J. Weiner, P. A. Kollman, D. A. Case, U. C.
the practical use of methods like these should Singh, C. Ghio, G. Alagona, S. Profeta, and P.
improve as well. Weiner, J. h e r . Chem. Soc., 106, 765 (1984).
17. A. McCammon and S. Harvey, Molecular Dy-
AUTHOR'S NOTE: namics of Proteins and Nucleic Acids, Cam-
bridge University Press, Cambridge, UK, 1987.
Peter Kollman died unexpectedly in May, 18. U. Bukert and N. L. Allinger, Molecular Me-
2001. He had authored an article on "Drug- chanic, American Chemical Society, Washing-
Target Binding Forces" for the Fifth Edition ton, DC, 1982.
of this series. This revision and extension for 19. W. D. Cornell, P. Cieplak, C. I. Bayly, I. R.
the Sixth Edition is based primarily on Peter's Gould, K. M. Merz Jr., D. M. Ferguson, D. C.
writings, and is dedicated to his memory. Spellmeyer, T. Fox, J. W. Caldwell, and P. A.
Kollman, J . Am. Chem. Soc., 117,5179 (1995).
20. A. D. MacKerell Jr., D. Bashford, M. Bellott,
REFERENCES R. L. Dunback Jr., J. D. Evanseck, M. J. Field, S.
1. P. Atkins, Physical Chemistry, 4th ed., W. H. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-Mc-
Freeman, New York, 1990. Carthy, L. Kuchnir, K. Kuczera, F. T. K. Lau, C.
erences
Mattos, S. Michnick, T . Ngo, D. T . Nguyen, B. W . Jorgensen, J. Amer. Chem. Soc., 111, 3770
Prodhom,W . E. Reiher 111, B. Roux, M. Schlenk- (1989).
rich, J. C. Smith, R. Stote, J. Straub, M. Wa- J. Timko, S. Moore, D.Walba, P. Hiberty, and D.
tanabe, J . Wirkiewicz-Kuczera, D. Yin, and M. Cram, J. Am. Chem. Soc., 99,4207 (1977).
Karplus. J. Phys. Chem. B , 102,3586 (1998). D. Aue, H. Webb, and M. Bowers, J. Am. Chem.
W . L. Jorgensen, J. Chandrasekhar, J. Madura, SOC., 31,318 (1976).
R. W. Impey, and M . L. Klein, J. Chem. Phys., J. Kirkwood, J. Chem. Phys., 3, 300 (1935); R.
79,926 (1983). Zwanzig, J. Chem. Phys., 22, 1420 (1954).
H. J. C. Berendsen, J . R. Giegera, and T . J. P. M. Postma, H . J. C. Berendsen, and J. R.
Straatsma, J. Phys. Chem., 91, 6269 (1987). Houk, Faraday Symp. Chem. Soc., 17, 55
D. L. Veenstra, D. M. Ferguson, and P. A. Koll- (1982).
man, J. Comput. Chem., 8,971 (1992). W . Jorgensen and C. Ravimohan, J. Chem.
J . Pirssette and E. Kochanski, J. Am. Chem. Phys., 83,3050 (1985).
SOC.,100,6609 (1978). B. L. Tembe and J. A. McCammon, J. Comput.
W . L. Jorgensen and J . Tirado-Rives, J. Am. Chem., 8,281 (1984).
Chem. Soc., 110,1657 (1988);W. L. Jorgensen, A. Warshel, J. Phys. Chern., 86,2218 (1982).
D. S. Maxwell, and J. Tirado-Rives, J. Am. D. L. Beveridge and M. Mezei, Annu. Reu. Bio-
Chern. Soc., 118,11225 (1996);G. Kaminski and phys. Chem., 18,431 (1989).
W . L. Jorgensen, J. Phys. Chem., 100, 18010 P. A. Kollman, Chem. Rev., 93,2395 (1993).
(1996); T . Fox and P. A. Kollman, J. Phys.
N. Green, Biochem. J., 101,774 (1966).
Chem. B, 102,8070 (1998).
P. C. Weber, J. J. Ohlendorf, and F. R. Salemne,
E. C. Meng, P. Cieplak, J. W . Caldwell, and P. A. Science, 243,85 (1989).
Kollman, J. Am. Chem. Soc., 116,12061 (1994).
Y . Sun, D. Spellmeyer, D. Pearlman, and P.
J. W . Caldwell and P. A. Kollman, J . Am. Chem. Kollman, J. Amer. Chem. Soc., 114, 6798
SOC.,117,4177 (1995). (1992).
J. W. Caldwell and P. A. Kollman, J. Phys. B. C. Rao and U. C. Singh, J. Amer. Chem. Soc.,
Chem., 99, 6208 (1995). 111, 3125 (1989); B. C. Rao and U. C. Singh,
Y. Sun, J. W . Caldwell, and P. A. Kollman, J. J. Amer. Chem. Soc., 112, 3803 (1990).
Phys. Chem., 99, 10081 (1995). S. Miyamoto and P. Kollman, Proc. Natl. Acad.
R. W . Dixon and P. A. Kollman, J. Comput.
Chem., 18, 1632 (1997).
Sci. USA, 8402 (1993);S. Miyarnoto and P. Koll-
man, Proteins, 16,226 (1993).
.
D. M. York, A. Wlodawer, L. Petersen, and T . A. S. B. Dixit and C. Chipot,J. Phys. Chern. A, 105,
Darden, Proc. Natl. Acad. Sci. USA, 91, 8715 9795 (2001); B. Kuhn and P. A. Kollman, J. Am.
(1994). Chem. Soc., 122, 3909 (2000).
T. E. CheathamIII, J . L. Miller, T . Fox, T. A. D. Matthews, J . Bolin, J . Burridge, D. Filman,
Darden, and P. A. Kollman, J. Am. Chem. Soc., K. Volz, B. Kaufman,C. Beddell, J. Champness,
117,4193 (1995). D. Stammers, and J. Kraut, J. Biol. Chem., 260,
N. Davidson, Statistical Mechanics, McGraw- 381 (1985).
Hill, New York, 1962; M . I. Page and W . P. L. Kuyper in C. Bugg and S. Ealick, Eds., Crys-
Jencks, Proc. Natl. Acad. Sci. USA, 68, 1678 tallographic and Molecular Modeling in Drug
(1971). Design, Springer-Verlag, NY,1989, pp. 56-79.
C. Hansch, Biological Correlations-The S. Fleischman and C. L. Brooks, Proteins, 7,52
Hansch Approach, ACS, Washington, DCJ973. (1990); C. L. Brooks and S. Fleischman,
W. Kauzmann, Adu. Protein Chem., 14, 1 J. Amer. Chem. Soc., 112,3307 (1990).
(1975); C. Tanford, The Hydrophobic Effect, J. J. McDonald and C. L. Brooks, J. Amer.
Wiley, New York, 1973. Chem. Soc., 113, 2295 (1991); J . J. McDonald
and C. L. Brooks, J. h e r . Chem. Soc., 114,
W. Jencks, Catalysis in Chemistry and Enzy-
2062 (1992).
mology, McGraw-Hill, New York, 1969.
F. Quadrifoglio and V . Crescenzi, Biophys.
K. Dill, Biochemistry, 29, 7133 (1990). Chem., 1, 319 (1974); F . Quadrifoglio and V .
W. Jorgensen, J. K. Buckner, S. Boudon, and J . Crescenzi, Biophys. Chem., 2, 64 (1974).
Tirado-Rives, J. Chem. Phys., 89, 3742 (1988). T . J. A. Ewing, S. Makino, A. G. Skillman, and
L. X. Dang, J. Rice, and P. Kollman, J. Chem. I. D. Kuntz, J. Comput. AidedMol. Des., 15,411
Phys., 93,7528 (1990). (2001).
CHAPTER FIVE
Combinatorial Library Design,

Molecular Similarity, and
Diversity Applications
JONATHAN S. MASON
Pfizer Global Research & Development
Sandwich, United Kingdom
STEPHEN D. PICKETT
GlaxoSmithKline Research
Stevenage, United Kingdom
Contents
1 Introduction, 188
1.1Scope, 188
1.2 Molecular Similarity/Diversity, 188
1.3 Combinatorial Library Design, 190
1.4 Subset Selection and Screening Set
Enrichment, 190
2 Molecular Similarity/Diversity, 191
.
2.1 Descriptors, 191
2.1.1 2D Substructural and Topological
Descriptors, 192
2.1.2 AtomiJMolecular Propertie and 2DJ3D
Structural Descriptors, 193
2.1.2.1 Physicochemical, 193
2.1.2.2 2/3D Structural, 193
2.1.3 3D Properties, 194
2.1.3.1 3D Pharmacophores, 194
2.1.3.2 Shape, 199
2.1.3.3 Field-Based, 201
2.1.4 Analysis, 201
2.1.4.1 Descriptor Transformations, 201
2.1.4.2 Similarity and Distance
Measures, 201
2.2 Analysis and Selection Methods, 202
2.2.1 Cell-Based Partitioning Methods, 203
2.2.1.1 Diverse Solutions, 203
2.2.1.2 Pharmacophore Fingerprints,
204
urger's Medicinal Chemistry and Drug Discovery 2.2.2 Cluster-Based Methods, 206
2.2.3
. . . 'ty-Based Methods, 206
ixth Edition, Volume 1: Drug Discovery
d by Donald J. Abraham 2.2.4 Biasing to Desiredmesirable
0-471-27090-3 0 2003 John Wiley & Sons, Inc. Properties, 208
Combinatorial Library Design, Molecular Similarity, and Diversity Applications
2.2.5 Relative DiversityISimilarity, 209 4.8 Oriented-Substituent Pharmacophores, 224

3 Virtual Screening by Molecular Similarity, 209 4.9 Integration, 224
3.1 Use of Geometric Atom-Pair Descriptors, 210 4.10 Structure-Based Library Design, 225
3.2 Use of 3D Pharmacophore Fingerprints 5 Example Approaches, 228
(Three- and Four-Point), 210 5.1 General Target Class-Focused Approaches,
3.3 Validation Studies, 210 228
4 Combinatorial Library Design, 214 5.1.1 Defining the Chemical/Biological
4.1 Combinatorial Libraries, 214 Space, 228
4.2 Combinatorial Library Design, 214 5.1.2 7-Transmembrane G-Protein-Coupled
4.3 Optimization Approaches, 217
Receptors, 229
4.4 Handling Large Virtual Libraries, 220
5.2 Property-Biased Design, 234
4.5 Library Comparisons, 221
5.3 Site-Based Pharmacophores, 235
4.6 Pharmacophore-Based Fingerprints, 223
4.7 Combined Pharmacophore Fingerprints and 6 Conclusions and Future Directions, 237
BCUTs, 223
1 INTRODUCTION sity involves the calculation of descriptors for

each structure and the determination of the
1.1 Scope proximity of compounds within the descriptor
(or chemical) space. Virtual screening is the
This chapter discusses molecular similarity name given to the process by which these com-
and diversity methods and their main applica- putational methods are used to identify a sub-
tions to combinatorial library design, the se-
set of compounds from a database for a specific
lection of compound subsets, ahd ligand-based
purpose. The source database may, for exam-
virtual screening. Protein structure-based vir-
ple, be compounds in a corporate registry
tual screening is discussed in chapters 6 and 7.
Medicinal chemistry-relevant applications where the goal may be to identify compounds
discussed include the design of "diverse," for a biological assay. Alternatively, the source
"representative," and "thematic/focused/bi- database may be compounds that the chemist
ased" libraries and subsets. The last applica- believes are synthesizable and the goal of vir-
tion is of particular relevance, in that there is tual screening is to prioritize compounds for'
a recent trend to approach drug discovery by a synthesis. Depending on the amount of infor-
"target class" or "gene family" approach; for mation available to guide the computational
example, 7-transmembrane G-protein-cou- screening, and the method used, different lev-
pled receptors (7-TM GPCRs); nuclear hor- els of enrichment (number of actives selected
mone receptors (NHRs); ion channels; pro- in a set relative to a random selection) are ob-
teases; kinases; phosphodiesterases. tained. It should be noted also that virtual
screening applies not only to the selection of
compounds for biological screening but also to
1.2 Molecular Similarity/Diversity
the prioritization of compounds based on gen-
Molecular similarity and diversity methods eral properties of biological relevance, for ex-
have been developed based on the principle ample, selecting compounds more likely to be
that similar molecules exhibit similar activi- well absorbed.
ties/properties (1).Molecular similarity is a Molecules are typically represented by a
key concept in the identification of new mole- vector of real-valued properties (molecular
cules that have similar biological activity to weight, log P, etc.) or binary values (e.g., 0 for
one or more molecules of known activity. Mo- absence, 1 for presence of a substructure fea-
lecular diversity concepts are used to explore ture) in a bit-string or binary fingerprint. The
"chemical space," with the scope of applica- term fingerprint or key or signature thus re-
tion ranging from a particular structurelreac- fers to an encoding of features/characteristics
tion to a large database of different molecules. a molecule exhibits (e.g., substructures present,
The process of evaluating similarity and diver- all possible combinations of 2-4 pharmaco-
1 Introduction
I 0 0
Figure 5.1. A simple illustration of bit-
string encoding of chemical structure (7).
(a) A fragment dictionary-based approach.
(b)Illustration of a hashing scheme using a
path-based decomposition of the structure.
The asterisk denotes an element in the bit
, 1 1 1 ~ 1 ~ ~ ~ ~ ~ ~ ~ ~ 1 ~ , 1 1 1 1
string where a collision has resulted from1 ,
* the hashing procedure.
phoric features) as a string of bits (indicating and protein structure-based universes. The
either the presence or absence of a particular pharmacophore fingerprints also represent a
characteristic; see section 2.1.1 and Fig. 5.11, simplified approach to the goal of providing
optionally including a count of the number of molecular descriptors with 3D shape and
times the characteristic is exhibited. A wide property content, while obviating the need for
variety of descriptors is available to evaluate molecular superposition or refined pharma-
the potential similarity or diversity between cophore hypothesis generation.
structures (2). These range from one-dimen- Partitioning methods are widely used. The
sional (ID) descriptors based on molecular compounds are grouped using either a cell-
properties such as molecular weight, which based approach, in which each dimension of
can be derived from the molecular formula;
the chemical space is subdivided or "binned,"
two-dimensional (2D) substructural finger-
or by a clustering approach, in which islands of'
prints, topological methods, and atomiclmo-
similar compounds are formed. Alternatively,
lecular properties [e.g., physicochemical prop-
erties such as calculated log P (c log P)] that the distance between pairs of molecules can be
require knowledge of the "flat" or 2D struc- calculated, and this distance minimized (for
ture, which represents the bonds between the similarity) or maximized (for diversity). For
atoms; to three-dimensional (3D) properties diversity the goal is normally not to identify a
(e.g., pharmacophoric fingerprints), requir- diverse compound in isolation, but to explore a
ing knowledge of the full 3D conformational range of diversity through selection of a di-
space available to a molecule. A 3D pharma- verse subset of compounds. Cell-based meth-
cophoric fingerprint marks the presence or ods provide the advantage of a common frame
absence of potential pharmacophores [com- of reference in terms of the multidimensional
binations of different features and distances cell positions. It is possible with a cell-based
between them, often for three- or four-point method to evaluate both what is there and
pharmacophore fingerprints (i.e., triplets/ what is missing (in terms of empty cells); clus-
triangles or quartetsltetrahedra)] within a tering, by contrast, is based on exploring what
molecule. is there. The same method/descriptor may
Three-dimensional -properties
- such as the thus be used to evaluate both similarity and
pharmacophore fingerprints can also be calcu- "diversity." In practice, "dissimilarity" ap-
lated for the target protein binding site, being proaches often provide a more acceptable ap-
derived from site points complementary to the proach to diversity, ensuring that compounds
functional groups in the protein backbone and are not too similar, but avoiding a potential
side chains, thus bridging the ligand-based pitfall of exploring too frequently the ex-
190 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
tremes of chemical space. Methods and de- 1.4 Subset Selection and Screening Set
scriptors are discussed for each of these cate- Enrichment
gories.
A related task to combinatorial library design
1.3 Combinatorial Library Design that uses molecular diversitylsimilarity meth-
Combinatorial library design is an important ods is subset selection of compound screening
application of molecular similarity and diver- sets. Initial efforts were focused on small "di-
sity principles and methods. Combinatorial verse" or "representative" sets of large corpo-
chemistry approaches can exploit automation rate compound collections. The increased ca-
and robotics to enable the rapid production of pabilities of high throughput screening have
large numbers of compounds. Libraries are changed the demand for such sets, and there is
synthesized for both lead identification and a renewed demand for "focused" and "repre-
lead optimization purposes. The resultant li- sentative" screening subsets of varying sizes;
braries consist of products formed by combin- this includes target class (gene family) focus
ing "reactants" (reagents, monomers) with and the identification of "interesting" (e.g.,
each other or with a "scaffold" (template, novel) compounds in a large set. Newer bio-
core). The most efficient use of reactants and physical screening methods [e.g., NMR-based
automation/robotics would use a strictly com- screening (311 still have capacity issues and a
binatorial combination of reactants/scaffold, need for smaller representative and focused
but other constraints, including the issue of sets. Diverse subset selection can be used to
generating products that have suitable prop- generate sets of compounds to probe a biolog-
erties for biological screening and as potential ical assay or to select a subset of reactants to
drugs, often lead to sparse arrays. Parallel probe the scope of a chemical reaction scheme
synthesis, in which multiple analogs are syn- or screen. However, such methods have a ten-
thesized at a time, is now a standard part of dency to select compounds at the extremes of
the drug discovery process. chemical space; that is, the selected com-
Many molecular diversity and similarity pounds tend to be less suitable as drug candi-
approaches are brought together in the com- dates, and hence the approach is less favored
binatorial library design process. Either the for general screening sets. Rather, diversity
properties of the reactants/scaffolds are used methods are used to ensure that a random,
(reactant-based design) or the properties of subset of a screening set contains compounds
the resultant enumerated products are used in that are representative of the whole, or, in
selecting appropriate reactants (product- conjunction with a focused method, to ensure
based reactant selection). The latter approach a representative sampling of biologically rele-
requires much greater computational re- vant chemical space.
sources, and a preselection of potential reac- Compound subsets focusedjbiased to par-
tants may need to be made to control the total ticular target classes (gene families) have be-
size of the "virtual" (potentially synthesiz- come of greater importance, with application
able) library to be analyzed. Regardless of the to both lead identification and de-orphaning of
method, the required deliverable is sets of re- new targets from genomics studies. Properties
actants/scaffolds to be combined. When work- important for the target class of interest are
ing with the properties of the products, the identified, using descriptors used for molecu-
constraint that reactants are to be used as ef- lar similarity/diversity. A focused subset can
ficiently as possible presents a major optimi- then be selected using a combination of all the
zation problem. possible hypotheses for activity for that target
Virtual screening, with experimental veri- class, including the use of one or more molec-
fication by biological screening, has provided a ular similarity approaches to select com-
validation of many of the molecular similarity1 pounds similar to any known active com-
diversity methods used for combinatorial li- pounds. For targets that have structural
brary design, and some ligand-based ap- information available, docking methods (one
proaches and examples are discussed in widely used method for virtual screening) can
Section 3. be used to select compounds that are comple-
2 Molecular Similarity/Diversity
mentary to the binding site(s). Applications the various methods for applying these repre-
encompass both high throughput screening sentations to real-world problems. The reader
(HTS) and therapeutic area screening where is referred to a number of reviews on various
only smaller numbers of compounds can be aspects covered by this chapter (2,4-9). A di-
screened. For HTS, smaller thematic studies verse set of perspectives/reminiscences on
using these enriched focused sets enable the computational aspects of molecular diversity
rapid prosecution of a set of related targets, has been assembled by Martin (10).
and make the use of duplicate runs for all com-
pounds feasible. This enables selectivity to be
2.1 Descriptors
addressed up front, and the duplicate runs
provide potentially higher quality informa- The problem lies in finding a representation of
tion, with the potential for the identification of chemical structure that allows a mapping be-
hits that might otherwise be missed. tween the chemical structure and its response
General enrichment of the available screenin a biological or physical process. The repre-
ing compound set for lead identification is a sentation must be general enough to be appli-
major application for both combinatorial li- cable to a range of chemical structures but
brary designlsynthesis and compound acquisi- specific enough to capture the differences be-
tion. The goal of in silico (i.e., computer- tween structures that account for differences
based) studies in compound acquisition is to in response. Once found, this representation
evaluate the interest of compounds that could or set of descriptors can be said to define a
be purchased to add to the screening file, and chemistry space (11)for the population of com-
to select a subset that meets the same type of pounds of interest. The similarity between
physicochemical/"druglikeness" criteria dis- two compounds is their distance within this
cussed for combinatorial libraries. The "inter- space. Unfortunately, this simple statement
est" of a compound or compound set is evalu- hides a number of difficulties. Many descrip-
ated as in combinatorial library design: tors of choice are correlated and it can be dif-
diversity relative to existing compound, tar- ficult to combine categorical (e.g., acid, base,
get, target-class focus, and so forth. neutral) and real-valued (charge, dipole, c log
P) variables. The issue of how to analyze com-
pounds within the chemistry space is covered
2 MOLECULAR SIMILARITY/DIVERSITY in Section 2.2.
Methods for describing chemical structures
The field of medicinal chemistry is based on fall into two broad classes. Two-dimensional
the hypothesis that similar compounds will (2D) methods can be calculated from the 2D
display similar, but probably not identical, ac- graph in which atoms are nodes in the graph
tivities in some biological screen, and that po- and the bonds are the connections between
tency, selectivity, and properties can thus be the nodes. Three-dimensional (3D) methods
modulated by analog synthesis. The challenge require the generation of a 3D structure (x, y,
facing the computational chemist is how to z coordinates) for a structure. Because a mol-
represent compounds in a computer in such a ecule does not exist in a single low energy con-
wav that "similar" comlsounds in the in silico former, the issue of conformer generation also
world are "similar" in the biological world. It requires addressing with this latter method.
is evident that the biological process that is Combining the various descriptors, particu-
being modeled will influence the nature of the larly 2D and 3D, is an area of active research.
chosen representation. For example, c log P is The potential advantage of 3D descriptors
a useful descriptor for modeling processes in- (ligand-protein binding is a 3D spatidelec-
volving cell penetration, whereas a pharma- tronic property that can be described only in
cophoric representation would be more appro- part using 2D descriptors) (5c) has led many
priate for selecting compounds for screening groups to identify 3D descriptors that can han-
against a particular protein active site. In this dle large numbers of compounds and multiple
section we review the wide range of represen- potential models, and do not require a super-
tations that have been developed and describe imposition in 3D coordinate space (e.g., for re-
view, see Ref. 12). The pharmacophore finger- ical representation of the structure where
prints described in Section 2.1.3 are an bonds are represented by the edges between
example of this. nodes (atoms). They provide a direct represen-
tation of the topological structure of a mole-
2.1 .I 2D Substructural and Topological De- cule encoding information such as the degree
scriptors. The principle behind substructural of branching (IX) and the adjacency of the
keys or fingerprints is shown in Figure 5.1. A branch points (3X), flexibility, and shape (20a).
molecule is encoded by the presence or ab- The superscript describes the number of
sence of a set of predefined atoms, atom types, bonds in the path between atoms used to cal-
and fragments (e.g., S, aromatic nitrogen, culate the index. The software package MOL-
C0,H). The most widely used set of keys is the CONN-Z (20b) was developed specifically for
publicly available ISIS (MACCS) key set pro- generating these descriptors. A number of au-
vided by MDL (13a). An alternative to the use thors have included topological indices or vari-
of predefined fragments is provided by soft- ants thereof in their description of molecules
ware packages such as Daylight (13b) and for describing compound collections (21) or
UNITY (13c). In this approach, all possible large combinatorial libraries, often allied to a
bond paths in a molecule from zero (the at- dimensionality-reduction algorithm such as
oms) to a specified number of bonds (usually 7) principal components analysis (PCA) (6a, 23).
are identified. A hashing procedure is used to Cahart et al. (24) introduced the concept of
store the paths in a bit string of fixed length. atom-pairs, where the topological distance
Each path will set several bits in the bit string (number of bonds) between atoms of specified
(giving them the value of 1) and there is the element type are encoded in a bit string. This
possibility of different paths setting some of was extended to the topological torsion (251,
the same bits. As a result, individual bits lose where elements on all paths of length four are
any meaning. encoded. Kearsley et al. (26) extended this ap-
The origin of the 2D substructural repre- proach to use more generic atom-type proper-
sentation lies in the first chemical registration ties in place of element type. They termed
systems where some means was required to these types binding property classes because
enhance the speed of compound retrieval. they represent key features of intermolecular
Thus, if the query molecule contains a partic- interactions (positiveand negative charge; hx-
ular combination of features, the whole data- drogen bond donor, hydrogen bond acceptor,
base can be screened very rapidly using the and groups that are both of these, such as hy-
keys to identify compounds that are likely to droxyl; hydrophobic atoms; and all others).
contain those features before a more exhaus- These descriptors have been used widely for
tive graph matching is performed to ensure an similarity- and diversity-related tasks. The
exact match with the query. The features rep- CATS descriptors of Schneider et al. (27a) are
resented in the keys (ISIS) or the fingerprint a variant on this approach. All topological dis-
length and density (Daylight) were selected to tances (number of bonds) between a pair of
optimize the process of compound retrieval. binding property classes (e.g., acid-base) in a
Nevertheless, they have proved very useful for molecule are recorded with count information
a variety of similarity-based tasks (14). De- in a correlation vector; that is, how often that
spite these successes, issues surrounding their topological distance occurs between a specified
use in diversity-based approaches have been pair of features in the molecule of interest.
highlighted (15). Similarity is calculated as the Euclidian dis-
Molecular connectivity indices were first tance between the correlation vectors. These
proposed by RandiC in 1975 (16) as a means of CATS descriptors were shown to be useful in
estimating physical properties of alkanes. scaffold-hopping, identifying actives with a
This formalism was quickly extended to other structural type distinct from that of the initial
types of molecules (17) and, since then, a wide lead structure, and have also been used as the
range of indices has been proposed, as re- basis for a de novo design program, TOPAS
viewed by Hall and Kier (18) and RandiC (19). (27b).
The indices are derived from a graph theoret- Functional diversity requirements of com-
pound libraries have been reviewed (28), for taining suitable physicochemical properties.
which molecular descriptors that relate to This is addressed in later sections. Such prop-
both structure and properties are needed, as erties can also be used to identify particular
well as their evaluation in terms of biological combinations that are preferred for different
relevance. gene families, and these are used to focus a
design.
2.1.2 Atomic/Molecular Properties and 2D/3D 2.1.2.2 2 / 3 0 Structural. The issues with
Structural Descriptors whole molecule descriptors mentioned above
2.1.2.1 Physicochemical. The descriptors led Pearlman (11)and colleagues to look at an
in the previous section focus largely on the alternative representation ("BCUT" descrip-
structure of the compound. The binding prop- tors/metrics) based on atomic properties and
erty classes generalize this to some extent by on how atoms are connected. The approach
replacing relationships between elements or stems from original work of Burden (30) to
atom types with a broader definition, still derive a unique signature for a molecule.
within the framework of an atoms-and-bonds Pearlman extended the concept to develop the
description of the molecule. An alternative ap- BCUT descriptors suitable for diversity- and
proach would be to describe compounds by similarity-related tasks. Each molecule is de-
whole molecule properties, such as molecular scribed by a series of square matrices with
weight and log P. Indeed such properties have atom labels defining the rows and columns. In
been related to important pharmacological a given matrix, the diagonal represents an
and physical properties such as absorption atomic property such as charge, hydrogen
across cell membranes, distribution, and solu- bonding ability (donor/acceptor), or polariz-
bility. These properties are represented, in ability, with optional weighting by accessible
part, by the well-known Lipinski Rule-of-5 surface area; the off-diagonal terms represent
based on molecular weight, calculated log P, topological or Cartesian interatomic distance
and hydrogen bond donor and acceptor counts or other such property. Molecular descriptors
(29). Thus, such properties have an important are generated from the lowest and highest eig-
role in drug design, and in general assess- envalues of these matrices. and describe the
ments of "druggability." However, their use as molecular surface distributions of positive or
descriptors for tasks related to similarity or negative charge, H-bond donors, H-bond ac-.
diversity in the context of receptor affinity is ceptors, and high or low polarizability.
less clear and has been questioned (llb). A A number of such matrices can be calcu-
primary concern is that such properties do not lated based on the nature of the diagonal and
reflect sufficient information regarding chem- off-diagonal properties and the scaling be-
ical structure to enable their use for lead fol- tween them. An "auto-choose" algorithm [see
low-up or similar purposes. For example, a ste- the DiverseSolutions (DVS) program below1
roid and a benzodiazepine can have identical typically finds a 5D or 6D orthogonal chemis-
log P values but are clearly dissimilar from a try space that best represents the diversity of a
medicinal chemistry perspective. Another ma- given population. This ability to identify rele-
jor problem is that many properties (e.g., log vant (to drug-receptor interactions and re-
P, molecular weight, surface area, volume, flecting molecular substructure) and orthogo-
molar refractivity, molecular polarizability) nal (noncorrelated) descriptors is critical for
are correlated, making it difficult to find a rea- the effective use of both distance-based and
sonable set of orthogonal descriptors for the cell-based methods. Three-dimensional prop-
calculation of meaningful distances or for cell- erties may be included by the use of a single
partitioning (see Section 2.2.1). Such conformer to represent atom-atom distances
molecule properties are best used as or the inclusion of quantum mechanical prop-
aints on a design, to define boundaries erties (bond order or overlap-squared) from
of a pharmacologically relevant chemical semiempirical molecular orbital (MO)calcula-
pace or to define a distribution to match. The tion. However, the inclusion of 3D/MO infor-
challenge is then how to combine the mea- mation significantly slows down descriptor
res of diversity while simultaneously main- calculation and does not appear to offer any
practical advantage. DVS, a suite of programs, 1. A single fixed conformer is used.

has been written to calculate and manipulate 2. A relatively small number of representa-
the BCUT (and other) descriptors for a variety tive conformers is generated.
of library design, similarity- and diversity-re-
3. An exhaustive enumeration of conformers
lated tasks (31) (see Section 2.2.1.1). DVS uses
the power of a cell-based method as a rapid is used.
means to derive a chemistry space relevant to
the representation of the diversity of large 3D BCUTs are an example of case (1).They
populations of compounds and methods to reflect confonnational differences, but only to
pick diverse subsets and compare large data a limited extent because they are inherently
sets rapidly. The BCUTs provide an excellent low dimensional. This is actually somewhat
diversity metric based on electronic properties advantageous because the single low energy
directly related to ligand-receptor interaction conformation from which they are computed
that should also relate to biological activity. may or may not be similar to the bound con-
Indeed, BCUT metrics appear to reflect phar- formation for a particular receptor. Pearlman
macophorically important information, albeit (11)has noted that 3D BCUTs appear to be
in a relatively crude (low dimensional) fash- advantageous when the population of interest
ion. They have proven useful for quantitative is a single combinatorial library but that, on
structure-activity relationship (QSAR) and average, 2D and 3D BCUTs appear to be
quantitative structure-property relationship equally advantageous when the population of
(QSPR) analyses (32a,b), classification of interest is much more diverse. In cases (2) and
pharmacologically active compounds (32j), di- (31, the descriptors need to be accumulated
verse and focused combinatorial library de- over all conformers. In the case of bit strings
sign (lld, 32c-e), rational compound acquisi- this means "ORing" them over all conforma-
tion strategies (llc), and various other tions (combine using logical OR). Herein lies a
diversity-related tasks: potential issue with such techniques, in that
data from multiple conformations could ob-
2.1.3 3D Properties. The properties and scure the signal from the particular bound
descriptors above are essentially 2D in nature, conformation relevant to a particular target.
in that they can be generated from the com- 2.1.3.1 3 0 Pharmacophores. The repre-
pound connectivity table, that is, from a sentation of a set of active compounds by a'
knowledge of the bonding pattern within a single or small set of pharmacophores that is
molecule. There are many advantages to this, necessary for that activity was first proposed
not the least of which is the speed of descriptor many years ago and is an excellent model for
calculation. Nevertheless, compound interac- lead optimization. The development of data-
tions with most biological targets are largely base systems capable of handling three-di-
3D in nature. That is, it is the disposition of mensional structures in the late 1980s en-
key functional groups in the molecule in rela- abled the further exploitation of such methods
tion to complementary groups within the en- through giving the ability to search a corpo-
zyme or receptor that is important. Thus, rate collection for molecules containing a par-
there has been much active research into how ticular pharmacophore. This approach to lead
best to represent the spatial properties of mol- generation has proved highly successful (e.g.,
ecules. A particular issue that needs to be han- for reviews, see Ref. 33). In particular it is pos-
dled is conformational flexibility, given that sible to identify active compounds that con-
most compounds have rotatable bonds that tain a different core structure from that of the
will change the 3D properties and there is no compounds used to generate the model (lead-
means apriori of identifying which particular hopping). This success and the importance of
conformation is the bioactive conformation the pharmacophore hypothesis in understand-
(i.e., the conformation of the ligand bound to ing the interaction of a ligand with a protein
the biomolecular receptor). target prompted groups to look for ways to use
The methods presented below tackle this in pharmacophores to generate a molecular de-
one of three ways: scriptor for similarity- and diversity-related
Similarity/Diversity 195
Aromatic ring donor

centroid
H-bond
donor
H-bond
donor
Figure 5.2. Illustration of the creation of a pharmacophore key. As the conformation of a molecule
changes, so do the distances between the pharmacophoric groups, shown as spheres. The two differ-
ent three-point pharmacophores shown each set their own particular bit in the pharmacophore key.
tasks. The diversity-related use was based on terized by the pharmacophores that they ,
the!hypothesis that sampling over all potential matched. This method was powerful because
phiirmacophores leads to diversity in a biolog- it gave precise control over the queries that
ically relevant space, in contrast to some other were generated and ensured that the com-
methods that focus on chemical diversity. The pounds matched the query, as opposed to sat-
desicriptor thus generated identifies in a sys- isfylng a set of distance constraints; however,
tennatic way all the potential pharmacophores it was slow in execution. The Chem-XIChem-
thttt a molecule could exhibit. Triplet (three- Diverse implementation (36) generates a
point) and quartet (four-point) pharmacoph- pharmacophore fingerprint during the course
ore! representations have been extensively of a single systematic conformational search,
USf ?d(in addition to two-point/2Dapproaches), with a bump-check and/or rules to eliminate
wit;h a variety of features sampled at each high energy conformers. The details of the
poi nt and interfeature distances considered in conformational search and the definitions of
a discrete set of ranges ("bins") (see Fig. 5.2). the pharmacophoric features are key compo-
The ability of pharmacophores to divorce the nents of the system and this methodology has
thrnee-dimensional structural requirements been used extensively for a range of library
for biological activity from the two-dimen- design and both diversity-and similarity-
sional chemical makeup of a ligand has been based tasks (e.g., see Ref. 37). The use of 3D
hierhlighted in a recent review (34). pharmacophores in drug design applications
In an initial implementation from the au- has recently been reviewed (12, 34).
thc)rs (35), a set of 5916 three-point pharma- To perform the necessary analyses to gen-
co1~horequeries was generated and used to erate the pharmacophore fingerprint, relevant
setrch a database. Compounds were charac- features in a molecule need to be identified.
196 Combinatorial Library Design, Molecular Similarity, and Diversity Application
Either substmctural definitions to find pharma- Hydrophobic region

cophoric features are applied at search time or feature
atom types [and, optionally, additional centroid
"dummy" atoms (35)l are used. These can be
preassigned (e.g., on database registration) or
assigned at search time; a variety of approaches
is used (including the use of substructures and
connectivity, and of more sophisticated compu-
tational approaches). Six properties (features)
Aromatic ring centroid
have commonly been used to describe the poten-
tial pharrnacophoric features of a structure:
Figure 5.3. Example of how a 3D molecular struc
ture can be broken down into its constituent p h a ~
1. Hydrogen bond donor (e.g., amide NH, ar- macophoric elements.
omatic amine, and hydroxyl)
2. Hydrogen bond acceptor (e.g., carbonyl, Figure 5.3 illustrates how a molecule can b
ether, and hydroxyl) broken down into pharmacophoric element5
3. Basic ionizable center (positively charged The atom types can be assigned using sub
at physiological pH of about 7) (e.g., ali- structural fragments, taking into account th
phatic mines, amidineslguanidines, and environment (e.g., a NH group attached to
conjugating group such as 0 is not basic o
4-amino pyridine)
a H-bond acceptor). Atom types can be autc
4. Acidic ionizable center (negatively charged matically assigned when reading a molecuk
at physiological pH of about 7) (e.g., carbox- such as through a customizable substructura
ylic acid, unsubstituted tetrazole, acyl fragment database and parameterization fil
sulfonamide) (e.g., Chem-X/ChemDiverse software, 37a:
5. Aromatic rings (ring centroids often used) The fragments identify the environment of a1
6. Hydrophobic regions (e.g., isopropyl, butyl, atom or group, enabling the correct assign
cyclopentyl, and certain aromatic rings) ment of a designed feature type. Different OF
tions can be set (e.g., a hydroxyl group can b
assigned to be both a hydrogen bond dono
It has also been useful to define a seventh and acceptor andlor can be assigned to a spe
feature type in some situations. For example, cial feature type for atoms that have botl
it may be beneficial to classify separately the characteristics), and reassignment is possibl
groups that can be both hydrogen bond donor at search time for structures stored in a data
and acceptor such as hydroxyl groups or imi- base. The identification and representation c
dazole nitrogens. Alternatively, the seventh hydrophobic regions is one of the most diffi
feature provides a mechanism to identify an cult yet critical tasks. Dummy atoms can b
anchor point to substructures of particular in- used to represent the hydrophobic regions, a
terest (see Section 2.2.5). a centroid of a group of relevant atoms. Thi
All combinations of three or four pharma- limits the number of hydrophobic features t~
cophoric points (forming triangles or tetrahe- comparable numbers to other features. An au
dra), for all accessible conformations of a given tomatic method to add them that uses bon~
molecule, can be analyzed, with the resultant polarities (hydrophobic regions defined fo
descriptor bit-string fingerprint (key) contain- groups of three or more atoms that are no
ing the pharmacophores from the whole con- bonded to atoms with a large electronegativit,
formational ensemble of the molecule (see Fig. difference) has been implemented in Chem-X
5.2). Each bit represents a particular combina- ChemDiverse. Other pharmacophore aton
tion of pharmacophore points (Donor-Aro- types have also been developed (26).
matic-Acceptor, Donor-Aromatic-Basic, etc.) The extension to four-point pharmacoph
and distances between them (defined using ores enables chirality to be handled and en
discrete ranges, or bins). ables some elements of volumelshape linked t
electronic properties to be included. This can screening applications. Around 2-10 million
give a much better performance in similarity different potential pharmacophores are re-
searching. It also increases enormously the solved in &ch a fingerprint. A limited sam-
number of potential pharmacophores that pling of conformations has generally been
need to be considered. To analyze pharma- used to achieve reasonable times (in seconds)
cophoric patterns in molecules, the distances for descriptor calculation. For example, Ma-
between pharmacophoric features are divided son et al. (37) use two (conjugated), three (sin-
into a finite number of ranges using a pre- gle bonds), or four (sp2-sp3 and some conju-
defined binning scheme (e.g., 0-2, 2-3, 3-5, gated) increments with large data sets, using a
5-8 A, etc.), up to a maximum distance nor- systematic analysis for less flexible molecules
mally between 15 and 20 A [a nonuniform bin- and random sampling for flexible molecules.
ning is often used because this mirrors the See Fig. 5.4 for a comparison of three- and
tolerances (e.g., 220%) used in 3D database four-point fingerprints. Software companies
searching that can be more appropriate than such as Accelerys, Tripos, the Chemical Com-
b e d increments, given the limited conforma- puting Group (MOE, http:llwww.chemcomp.
tional sampling that is possible]. The addi- com), and Treweren Consultants (THINK)
tional pharmacophoric combinations created are developing their versions of pharmacoph-
in moving from a three- to four-point descrip- ore fingerprinting methods, with three-point
tion provides additional shape information, pharmacophore fingerprints already imple-
thus increasing molecular separation in simi- mented. The automatic assignment of phar-
larity and diversity studies. macophore features such as hydrophobes, ac-
Separation has a central role in determin- ids and bases, conformational sampling, and
ing the final result of such calculations, with other key options discussed above for the
too little separation resulting in a noisy de- Chem-X software (now no longer supported;
scriptor and too many molecules being defined owned by Accelerys) such as nonuniform bin-
as similar, whereas when too large a separa- ning are challenges that have variable levels of
tion exists, trivial differences can have a current implementation; other options and ex-
disproportionately negative effect on the sim- tensions such as overlapping bins are becom-
ilarity value. Conformational sampling is nec- ing available.
essary, and the granularity of this affects the Others have developed similar approaches .
useful resolution that can be used, as defined for library design (38, 39). Horvath (40) gen-
by the number and size of the distance bins. erates an autocorrelogram of feature-feature
The sampling is generally performed by tor- distances for conformers and calculates a dis-
sional sampling of rotatable bonds. similarity score that takes into account sepa-
Thus fewer ranges are generally considered rate weightings for each feature and allows
with four-point pharmacophores while con- fuzziness between the distance bins. These 3D
comitantly maintaining or improving on the pharmacophoric descriptors were termed
performance of three-point pharmacophore fuzzy bipolar pharmacophore autocorrelo-
methods. For example, by the use of 32 dis- grams (FBPAs), and the use of fuzzy logic to
tances for three-point pharmacophores with build up and compare the fingerprints avoids
seven different features possible for each of the "all-or-nothing" bitwise match of bit-
the points, there are about one million possi- string representations in which sampling arti-
ties (35). Expanding to four-point pharma- facts can cause significant differences. The
phores, just 15 distance bins generate about method has been shown useful in library de-
million geometrically valid possibilities. sign and for analyzing selectivity profiles in
refore for pragmatic reasons of both mem- terms of pharmacophore similarity (41).
disk space, and the limited resolution of It is possible to represent not only a ligand
conformational sampling that is normally by the potential pharmacophores it possesses
plied, seven or 10 distance ranges for four- but also a protein target. In this case the phar-
int pharmacophore fingerprints have been macophore points are identified by the posi-
ed by Mason et al. (37) and recommended tions where a ligand atom of a particular type
r combinatorial library design and virtual (donor, acceptor, acid, base, hydrophobic, aro-
H-bond H-bond
Acid Base Aromatic Hydrophobe
donors acceptors ring (lipophile)
All combinations of 6 features &
7 distance ranges
9,000
I 10 distance ranges
33,000
1
I 3-point
potential pharmacophores
Summed for all 4-point

conformers potential pharmacophores
Figure 5.4. Three- and four-point (tripletlquartet) pharmacophore fingerprint creation. Assign-
ment is often binary (on or off), although a count can be kept, and has been used in more recent
studies. The large difference in bin numbers between three- and four-point pharmacophores provides
additional shape information, thus increasing molecular separation in similarity and diversity stud-
ies.
matic centroid) is likely to bind and so provide trates the favorable energy contours for a va-
a complementary interaction with the adja- riety of pharmacophoric probes for the Factor
cent protein residue side chain. The pharma- Xa serine protease active site. Atoms (with as-
cophore fingerprints are thus generated from sociated pharmacophore features) are then
these complementary site points. The site added in the positions for the most favorable
points can be positioned in the active site us- interaction (also shown in Fig. 5.5).
ing methods such as GRID (42),in which an The resultant ensemble of atoms repre-
energetic survey of the site is made using a sents a hypothetical molecule that interacts at
variety of functional groups. Figure 5.5 illus- all favorable positions in the binding site, and
H-bond donor H-bond acceptor Lipophilelarom.
-
CO-NH
Acid Base
ore fingerprint calculations (lower

right). See color insert. w 2 - SP --NH+
2 Molecular Sirnilarity/Diversity 199
a pharmacophore fingerprint is calculated 1-A grid (Fig. 5.6). Cells occupied by a par-
from this. This fingerprint represents a form ticular feature are recorded in a bit string.
of "protein structure-based diversity," quanti- This descriptor is ideally suited to monomer
fyng the range of different pharmacophoric acquisition and reactant diversity.
shapes complementary to a target protein Topomer shape similarity, developed by
binding site. For example, for the Factor Xa Cramer (45) at Tripos, has been used for sim-
serine protease active site, 13 complementary ilarity searching and targeted library design
site points generated a fingerprint of 2103 (using Tripos' proprietary software, "Chem-
four-point pharmacophore shapes, of which Space"), building on earlier work on steric
354 were the same as the 2062 found for the fields of single "topomeric" conformers,
mine protease thrombin, generated from 13 clustering reactants by their 3D steric fields
site points. Only 11 significant complemen- into "bioisosteric" clusters. The descriptor
tary site points were found for the serine pro- was considered to be useful in describing
tease trypsin, which has a less defined S4 variations about a fixed molecular core. de-
pocket. Of the 1233 total pharmacophore fining a single, unambiguous, aligned con-
shapes, 363 were in common with Factor Xa, formation for any nonchiral molecule.
with 120 in common for all three serine pro- Approaches such as the Gap program that
teases. It is thus possible to identify ensembles exploit 3D descriptors for monomer selection
of pharmacophores that can be used to both address a need for an easily accessible set of
differentiate the sites (selectivity) and identify in-house monomers available for library gen-
common features. Comparison of these pro- eration. Such monomers need to be diverse in
tein-derived pharmacophore fingerprints with nature and able to probe regions of space
known ligands, using four-point fingerprints, through attachment to known leads, while
shows that they can be used for searching for producing compounds with druglike proper-
novel ligands within a database and that they ties. More detailed conformational searching
are specific enough to capture ligand selectiv- paradigms can be used for the smaller mono-
ity between similar proteins such as the serine mer compomds, and approaches such as Gap
proteases thrombin, Factor Xa, and trypsin and OSPPREYS exploit this opportunity.
(37). With three-point fingerprints, the com- For the selection of diverse compound sub-
parison of ligand- and site-derived finger- sets, studies (46a) have compared three-point .
prints could identify common binding motifs, pharmacophore descriptors and 2D finger-
although selectivity was not captured (37b). prints. These have highlighted benefits of the
Pharmacophore fingerprints are relatively different approaches, and the improved per-
slow to calculate, however. Thus, their appli- formance of some combined descriptors. The
tion to very large virtual libraries requires a use of clustering for the rational selection of
t deal of computer power. Researchers at compounds for acquisition and for in-house
ron (12, 43) have developed a pharmaco- compound collections used for screening has
ore-based methodology applicable to reac- also been investigated (46b),with comparable
nts, OSPPREYS (Oriented-Substituent results obtained with 3D pharmacophore-de-
Pharmacophore PRopErtY Space). In this rived fingerprints to the typically used 2D fin-
proach, reactant pharmacophores are calcu- gerprints.
d with respect to the reactant attachment 2.1.3.2 Shape. Pharmacophores capture
m and combinations of up to nine pharma- the key features of intermolecular interac-
cophore centers are considered (see Section tions. However, they do not explicitly capture
8). In the Gridding and Partitioning (Gap) the shape and volume of the ligand, even if this
proach, developed at GlaxoWellcome (44), is crudely implied by the largest four-point
actants are aligned such that the bond be- pharmacophore exhibited, and the totality of
een the attachment atom and the first potential pharmacophores exhibited across a
hydrogen atom is along the x-axis with range of conformations encodes shape frag-
attachment atom at the origin. A confor- ments. Hahn (47) has described a method for
ational analysis is then performed and the three-dimensional shape-based searching im-
harmacophore features are mapped to a plemented in the Catalyst program. Seven
Attachment group at origin Free x-axis rotation
------- about attachment bond
Ho~~ul -' Pharmacophore

point
Track locations of
pharmacophores
within regular grid
000110001o...
I J 2 Pharrnacophore key
Figure 5.6. Overview of the Gridding and Partitioning (Gap) procedure as applied to monomers,
exemplified using phenylalanine as a potential primary amine. This molecule thus contains two
pharmacophoric groups (the aromatic ring and the carboxylic acid). During the conformational
analysis the locations of these pharmacophoric groups are tracked within a regular grid. See color
insert. [Reproduced from A. R. Leach and M. M. Hann, Drug Discovery Today, 5, 326-336 (2000),
with p-rmia-ion of Elrevier Science 1
2 Molecular Sirnilarity/Diversity 201
shape indices, positive and negative extents tion with PLS as in the CoMFA (comparative
along the three principal axes from the molec- molecular field analysis) 3D-QSAR methodol-
ular centroid, and the volume of that con- ogy (53). More recently, these fields have been
former are computed and stored in a database. further transformed to generate 3D molecular
These indices can then be used for rapid com- descriptors. The VolSurf program (54) calcu-
parison with a query shape derived from ac- lates a wide range of descriptors from the grid
tive structures. Conformers passing this filter energies [calculated with the program GRID
are then aligned with the query and the simi- (42)l. These have been shown to correlate to a
larity is assessed from the volume overlap. range of properties such as membrane pene-
Shape-based searching can be used indepen- tration and solubility (55). The Almond pro-
dently, in which case it will complement a 2D gram (56) uses a transform known as the
similarity search. The method can also be em- Maximum Auto-Cross Correlation (MACC)
ployed in conjunction with a 3D pharmacoph- between pairs of grid nodes, to give a type of
ore search; however, it is not clear that results two-point pharmacophoric representation of
are improved in this case (48). the fields. Such descriptors have been useful in
2.1.3.3 Field-Based. A receptor site recog- QSAR studies because they are alignment
nizes the surface properties of a molecule. free; that is, they are independent of the posi-
These can be represented by different types of tion within the defining grid, and have also
molecular fields, electrostatic, steric, and hy- been used in reactant selection (Pickett, un-
drophobic, that can be calculated from the published results, 1999). However, the limita-
atomic com~ositionof the molecule and com- tions of the lack of conformational flexibility
paredusing a measure such as the Carbo index have so far precluded their use in more general
(49). A gaussian representation of the field al- database searching and diversity applications.
lows for a more rapid alignment of the mol-
ecules (50). Willett's group has developed a 2.1.4 Analysis
program FBSS (51), which uses a genetic algo- 2.1.4.1 Descriptor Transformations. A large
for the alignment of the molecular number of potential descriptors are available
They have compared the performance and this presents a number of issues. Many
s method with a 2D structural finger- descriptors will tend to be correlated with one
nt (UNITYsoftware, (13c),in searching the another to a greater or lesser degree. There is .
I, a collection of drug molecules and com- the question of the scale of the descriptors and
nds in development, and the BIOSTER da- also the difficulty of combining, say, a finger-
age, a database of functional groups that print with a calculated property. Thus the de-
been used to replace other groups and scriptors must first be transformed in some
n biological function (e.g., a carboxylic way. A key study in this regard was the work
d and a tetrazole). Although the 2D mea- of the Chiron group (57). Groups of similar
m e will tend to find more bioactive mole- descriptors were combined using principal
es, the 3D measure gives a greater struc- components analysis (PCA) and multidimen-
a1 diversity in the hits (52). This seems to sional scaling (MDS),to give a total of 16 com-
the case for most 3D methods. In these ex- posite descriptors. D-optimal design was then
ples conformational flexibility can be con- used to further analyze a data set. Also of in-
dered during the alignment stage but will terest was the use of a "flower plot" to visual-
w the search down considerably and may ize the results. In the DPD (diverse-property
lead to the algorithm becoming stuck in derived) methodology (21a),the search was for
ima. six noncorrelated descriptors. The selection of
ternative to using the molecule com- relevant BCUT descriptors using a 2 test is
ion in calculating the fields is to use mo- mentioned below.
ar fragments as probes to represent pro- 2.1.4.2 Similarity and Distance Measures. A
in side chains. The interaction energy variety of measures exist for assessing the
ween the probe and the molecule is calcu- similarity or distance between molecules in a
d on a grid surrounding the molecule. given descriptor space (2a), as described
ese grid fields can then be used in conjunc- above. Similarity measures give a direct mea-
sure of similarity between molecules in some expressly include the absence of a feature (or
property space and give values in the range of low values for real-valued properties) in the
0 to 1, with 1 being identical. Typical examples measure of similarity. This has led to the sug-
are the Tanimoto coefficient and the Cosine gestion (58) that, in the chemical domain at
coefficient. For real-valued properties the Tan- least, such measures are best for relative sim-
imoto is defined as ilarity; that is, ranking the similarity of two
molecules to a target, as opposed to measuring
the absolute similarity of molecules for which
similarity measures, are preferred.
i=l Similarity and distance measures form the
Tanimoto = basis for most of the analysis and selection
i=N i=N i=N
2 ( x ~ A ) ~2 - 2 methods described in the next section and the

reader is referred to the reviews by Willett et
i=l i=l i=l
al. (2, and references therein) for a fuller dis-
where x, is the value of property i of molecule cussion of the characteristics and specific
A. When i can take values of only 0 or 1 as in a properties of these measures.
bit string, then this reduces to 2.2 Analysis and Selection Methods
Tanimoto = abl(a + b - c) In this section we describe some general meth-
ods for analyzing and partitioning large data
where a is the number of on-bits in A and c is sets, with particular reference to selecting rep-
the number of bits in common between A and resentative or diverse subsets. Library design
B. The Cosine coefficient can be defined as also employs many of the strategies described
here and is discussed in more detail in Section
4. The methods fall into two broad categories:
i=l C
cell-based or partitioning methods and dis-
Cosine = tance-based methods. Partitioning methods
use the population to define the limits for cells
into which the compounds are divided. Adding
i=l i=l or comparing to other compound sets requires
identifying the cells into which the new coml
For field-based measures and overlap of elec-
pounds would fall based on their descriptors.
tron density functions then the Carbo index
This is very rapid and the partitioning process
can be used (49), which is equivalent to the
provides a frame of reference for many design
Cosine coefficient.
tasks; for example, compounds can be readily
Distance measures give 0 for identical
identified to fill empty or poorly represented
structures and have an upper bound defined
cells. Potential issues are where to place the
by the property space. The Euclidean and
cell boundaries and the handling of com-
Hamming distances are the most common:
pounds that fall near to a cell boundary. Also,
new compounds may fall outside the range of
Euclidean distance = properties of the initial population. Distance-
i=l
based methods, such as clustering and dissim-
~~ ilarity-based methods, require the calculation

of similarity between members of the popula-
tion and are thus population dependent. Add-
i=N
ing new members to the population requires
Hamming distance = 2 [xd - xiBI+ recalculating similarities and could change
i=l
the distribution of compounds between the
clusters. Identifying poorly represented or
empty areas of property space is not possible.
The fundamental difference between similar- Each of these methods is further described be-
ity and distance measures is that the latter low with examples of their application.
2 Molecular SimilarityIDiversity
2.2.1 Cell-Based Partitioning Methods. Par- each bin. Follow-up of initial hits involves the
titioning methods divide chemistry space into screening of additional compounds from the
hyperdimensional "cells" by "binning" the cells containing hit molecules. Several leads
axes (descriptors) that define the chemistry were identified using this approach (7).
vector space, just as the eight divisions on the 2.2.1.1 Diverse Solutions. DiverseSolutions
x- and y-axes of a two-dimensional checker (DVS) is software developed by Pearlman et
board divide the board into 64 squares. A al. (11,31) to generate and use the BCUT de-
chemical compound occupies a position in scriptors in addition to other DVS-computed
chemistry space determined by the descrip- or user-provided low dimensional descriptors.
tors (coordinates) computed based on its (DiverseSolutions is also designed to work
structure. Once the compounds have been with high dimensional metrics such as 2D fin-
partitioned, selecting diverse or representa- gerprints, and includes some novel algorithms
tive sets of compounds involves selecting a for such distance-based work.) DVS uses a 2-
small number of compounds from each occu- based "auto-choose" algorithm ( l l c ) to iden-
pied cell, either in proportion to the number of tify the combination of low-D descriptors,
compounds in the cell or a specified number which are mutually orthogonal and which
from each occupied cell. For focused sets, com- most uniformly distribute a given large popu-
pounds are sampled from cells neighboring lation of compounds among the cells of the
the population of actives. The real advantage resulting chemistry space. Originally, the bin-
of partitioning methods, however, lies in their ning was performed in a uniform manner
ability to readily identifjr underpopulated re- along each axis, with a given percentage of
gions of property space. Selections can then be outliers to avoid sampling the extremes of
made from a second population of mole- space. This could be useful for large sets of
cules-a virtual library for i n s t a n c e t o in- diverse compounds where the extremes tend
crease the occupancy of underpopulated cells. to be undesirable compounds. However, for
Usually, such methods require a low dimen- large (virtual) libraries initial filtering can re-
sional representation of the space, although move these before the analysis, and thus a
the pharmacophore
- methods are a notable ex- nonuniform binning scheme was suggested
ception to this. The low dimensional space (59),so that acceptable compounds are not lost
may be the result of a dimensionality-reduc- as outliers, and is now the preferred option.
tion algorithm, as described earlier. Alterna- Often, the large population of compounds '
tively, a small number of descriptors may be used as the basis for defining a chemistry
judiciously selected. This latter approach was space is the entire compound collection avail-
taken by Lewis et al. in their DPD methodol- able to a pharmaceutical company for its drug
ogy (21a), which is a good example of parti- discovery efforts, together optionally with
tion-based selection. The aim was to select a structures from commercial databases of bio-
representative set of compounds based on mo- logically active compounds. The resulting
lecular and physicochemical properties for chemistry space can be regarded as the "cor-
screening. Six properties were chosen from porate standard chemistry space" and pro-
ong 49, based on their low pairwise corre- vides an ideal basis for comparing large sets of
number of H-bond acceptors, number compounds such as alternative commercially
nd donors, molecular flexibility, an available compound collections or alternative
pological state index, c log P, and a combinatorial libraries. It is also a good basis
easure of aromatic density. Each descriptor for comparing small sets of compounds such as
(axis) was divided into two to four partitions, compounds with reasonable affinity for vari-
give a total of 576 bins. A major issue was in ous bioreceptors.
ntifymg six relevant and reasonably non- The axes of a corporate standard chemistry
lated (orthogonal) descriptors, leading to space are intended to represent all aspects of
ition of a new descriptor. The chosen molecular structure. Thus, all axes of the cor-
ranges covered more than 85% of a 150,000 porate chemistry space must be considered for
ubset of the corporate collection and approx- purposes such as general diverse subset selec-
ately three compounds were taken from tion or rational compound acquisition. How-
ever, not all aspects of molecular structure binatorial libraries, and the MDDR drugs da-
may be important for understanding struc- tabase) was also shown. The methods used
ture-activity relationships (SARs) for a partic- were a 2D structural characterization (Day-
ular receptor. This led Pearlman and Smith light fingerprints), DiverseSolutions, and 3D
( l l d ) to introduce the concept of a receptor- pharmacophore fingerprints. A combinatorial
relevant subspace (RRSS) of a full chemistry library of 100,000 structures appeared struc-
space. For example, starting with a chemistry turally different from the other databases by
space of six dimensions, defined to best repre- the Daylight fingerprint clustering, yet the
sent the diversity of all druglike compounds in bulk of its compounds overlapped with drug-
the MDDR (MDL Drug Data Report) database like compounds (MDDR) in DiverseSolutions
(13a), they showed how to perceive the three- BCUT chemistry space and 3D pharmacoph-
dimensional subspace that conveys informa- ore space ("cells" in fingerprints). It was
tion that is particularly relevant for affinity to shown and "quantified" that new diversity rel-
the ACE (angiotension converting enzyme) re- ative to the company database was explored,
ceptor. ACE inhibitors of diverse structure with much of this new diversity in desirable
were tightly clustered with respect to the re- areas occupied by MDDR compounds. The
ceptor-relevant metrics, thereby providing an nonuniform binning scheme was developed to
obvious near-neighbor strategy for lead fol- enable the use of chemistry spaces scaled to
low-up. They ( l l d ) also emphasized the im- include all structures within a set, while main-
portance of not considering metrics that are taining a reasonable distribution of com-
not "receptor-relevant" when computing dis- pounds within cells. The method was used to
tances for such near-neighbor-based discovery select a subset for initial screening of a large
efforts. This also enables diversity in these set of combinatorial libraries designed for
other dimensions to be explored (e.g., with 7-TM GPCR targets.
combinatorial libraries), to obtain compounds 2.2.1.2 Pharmacophore Fingerprints. Phar-
with a modified profile for other properties macophore fingerprints can also be considered
such as bioavailability. as a high dimensional partitioning of the com-
Work on the design and diversity analysis pound space (35). Underrepresented pharma-
of large combinatorial libraries at Pharmaco- cophores within a population can be identified
peia using BCUT metrics and DiverseSolu- and act as a possible focus for library design or
tions was reported by Schnur (32). A cell- compound acquisition. Using six feature typ'es
based analysis of synthon-derived libraries (hydrogen bond acceptor, donor, acid, base,
was performed, using full product libraries, in- hydrophobe, and aromatic ring centroid) with
cluding library comparisons. Active molecules four-point pharmacophores and 7-10 binned
in these libraries, which involved multiple distance ranges, it is possible to resolve about
scaffolds, were found to cluster in various 2-10 million different phannacophoric shapes.
three-dimensional subspaces of the diversity Different databases can be compared using
spaces. The utility of a simple property-based this fingerprint, and differences identified. For
reactantlsynthon selection tool was also de- example, by comparing a corporate screening
scribed, targeted at the synthetic chemists, file (100,000 structures) with the MDDR data-
with reactants binned according to patterns base (62,000 structures) of biologically active
based on the ranges of a set of user-selected compounds (as discussed above for Diverse-
properties that form a diversity hypothesis. Solutions, Refs. 62,80) "holes" could be iden-
Chemistry space metrics have been used at tified, in terms of about 1 million 3D pharma-
Rh6ne-PoulencRorer for diversity analysis, li- cophores exhibited only by MDDR compounds
brary design, and compound selection (59,80) (about 2.7 million were in common and 0.2
using DiverseSolutions to generate a "univer- million unique to the corporate set). This pro-
sal" chemistry space for use as a standard for vides a design space for which combinatorial
profiling structural sets of interest. The libraries were designed and synthesized. A to-
complementarity of three different diversity tal of 100,000 combinatorial library com-
measures for comparing and profiling compounds were able to match about 40% (0.4 mil-
pound collections (a corporate database, com- lion) of the pharmacophore "holes" (i.e.,
MDDR Corporate Libraries MDDR Corporate Libraries Total of sets
62 K 100 K 100 K rand 14 K rand 14 K single (from a theor-
chemistry etical 9.7 M)
14 K each
Filwe 5.7. Comparisons of the 3D four-point pharmacophore fingerprints exhibited by several sets
[MDDR database of 62,000 biologically active compounds, a corporate registry database of 100,000
COTnpounds used for screening, 100,000 compounds from combinatorial libraries (from a four-com-
PO' lent Ugi condensation reaction), and 14,000 compound random subsets (MDDR, corporate) or
indlividual libraries]. The four-point potential pharmacophores were calculated using 10 distance
rarige bins and the standard six pharmacophore features.
MDDIE pharmacophores not in corporate set), ing, has been described (37d; see Section 4.7).
and aclditionally explore about 0.3 million new Simulated annealing is a widely used optimi-
pharrriacophores. Figure 5.7 illustrates the zation methodology whereby the "tempera-
numbcEr of pharmacophores found in these ture" of the system is used to control the d c
sets, together with those for the ACD (Avail- gree of sampling of solution space. The
able (2hemicals Directory), random 14,000 "temperature" is cooled or annealed as the
subsel;s of the database sets and some of the run progresses so that the system moves into a
combinatorial libraries (-14,000 each, from a minimum for the function at low "tempera-
four-c~ omponent Ugi condensation reaction, ture." In the classical sense, temperature con-
12 x 1.2 x 12 x 8 reactants). The relative rich- trols the kinetic energy of the system; in a
ness tind diversity of the MDDR database, more general sense, the "temperature" has no
which includes structures from a large num- physical meaning and is a parameter to con-
ber of ' companies, is clear from the compari- trol the sampling of solution space. Diversity
sons. 'The contributions, and eventual dimin- was the goal (function to be optimized) of the
ishing;return,of successive libraries using the studies reported, but the approach can equally
same chemistry is discussed in Section 5.1.2 be applied to optimize to a desired distribution
(see Fig. 5.24 below). of properties (e.g., from sets of biologically ac-
An example of the use of 3D pharmaco- tive compounds). The power of this pharma-
phore fingerprints for the design of GPCR li- cophoric approach has been exemplified by
brarie!s (37a) using "relative" fingerprints fo- Leach et al. in their Gap protocol for monomer
cused around privileged substructures is acquisition (44).
described in Section 5.1.2. An approach that Pharmacophore fingerprints derived from
combines an optimization of a four-point complementary site points to a target binding
pharnnacophore fingerprint and BCUT chem- site have been used as a quantification of "bi-
istry f;pace diversity, using simulated anneal- ological diversity"/structure-based diversity
(371, defining a measure of the intersection method requires the user to specify the num-
between chemical and biological space. They ber of clusters desired, and tends to be prone
can be compared to the pharmacophore finger- to singletons (clusters of one) and/or a small
prints calculated from ligands, and the phar- number of very large clusters. The cascade
macophore fingerprints of different target clustering methodology (59b) was developed
binding sites can also be compared to identify to address some of these issues. Parameters
similarities (e.g., common binding motifs) and were selected to produce an acceptable size
differences (e.g., for selectivity). The four- distribution for the largest clusters and the
point pharmacophore fingerprint of a serine small clusters were then reclustered. Doman
protease binding site was used to quantify all et al. (63) have developed a fuzzy clustering
the possible binding modes. An example was technique, also based around the Jarvis-
given of how a combinatorial library could be Patrick algorithm but which has no user-de-
designed to match as many as possible of these fined parameters and allows a compound to
site pharmacophores, with the idea that the belong to more than one cluster.
biological screening of the resultant library Hierarchical methods can be further subdi-
would provide information as to which hy- vided into agglomerative and divisive meth-
potheses lead to (the best) binding. The site ods. Agglomerative methods start with each
points can be generated by both geometric compound in a separate cluster and iteratively
methods (as implemented in Chem-X/Chem- join the closest clusters together. Divisive
Protein; see Ref. 133) or through energetic methods start with a single cluster and itera-
surveys of the site [e.g., by using a variety of tively subdivide until each compound is a sin-
probe atoms (as implemented and used for gleton. Hierarchical clustering methods gen-
pharmacophore fingerprint generation) (37); erate a dendrogram showing the relationship
see Section 2.1.3.11. between the compounds, the issue being the
The pharmacophore fingerprinting method level at which to cut the hierarchy (i.e., how
thus provides a novel method to measure many clusters to generate). Although heuris-
similarity when comparing ligands to their tics exist, there is no automated method. Such
binding site targets, with applications such algorithms, however, at best scale to order
as virtual screening and structure-based (N? in time, where N is the number of com-
combinatorial library design, as well as to pounds, and so are limited in application to a
compare binding sites themselves. Flexibil- few hundred thousand compounds at mos't
ity of the binding site can also be explicitly (64). Nevertheless, they have been shown to
accounted for by using a composite finger- be superior to nonhierarchical methods for
print generated from several different bind- clustering of chemical compounds (65).
ing site conformations. Ward's method was shown (5) to be the most
effective at separating active from inactive
2.2.2 Cluster-Based Methods. Clustering compounds by clustering bit strings that de-
methods have a long history of application in scribe the presence or absence of 153 small
chemical information (60). Any set of descrip- generic and specific fragments (ISIS struc-
tors can be used in the clustering, but most tural key descriptors). Even better perfor-
typically some form of structural fingerprint is mance was obtained with the inclusion of
used in conjunction with a similarity measure pharmacophore distances between site points
such as the Tanimoto coefficient (see Section complementary to hydrogen bonding and
2.1.4.1). The methods fall into two broad charged groups combined with distances be-
classes, hierarchical and nonhierarchical. tween centers of aromatic rings and attach-
Nonhierarchical methods such as that dement points for hydrophobic groups.
scribed by Jarvis and Patrick (61) have been
widely used for compound selection from large 2.2.3 Dissimilarity-Based Methods. The meth-
databases (62). The principle behind the ods for compound selection described above
Jarvis-Patrick method is to group together essentially group compounds either by par-
compounds that have a large number of near- titioning into cells or by clustering. Dis-
est neighbors in common. However, the similarity-based methods (66) avoid this step.
The subset selection can be performed itera- Maximum dissirnilaritv-based

" methods tend
tively. The first compound is chosen at ran- to give diverse selections, including many out-
dom and the next compound is selected to be liers of less potential interest to a medicinal
maximally dissimilar to the first; the third is chemist. By contrast, methods such as sphere
then selected to be maximally dissimilar to the exclusion (minimum dissimilarity selection)
first two, and so on. The selection stops when a tend to give representative selections that
prespecified number of compounds have been mimic the underlying distribution of com-
selected or no more compounds can be chosen pounds. The OptiSim method developed by
that are below a given similarity or above a Clark (66c) attempts to achieve a compromise
certain distance to another compound in the between these two extremes. Three parame-
selected set. Pearlman (llb,c) refers to such ters are required: the first two, radius or sim-
methods as "addition" algorithms because ilarity cutoff and maximum number of selec-
they add compounds to a diverse set of increas- tions are common to the other algorithms. A
ing size. He notes that such algorithms are third parameter, K, is required to define a sub-
quite satisfactory when the size of the desired sample size. Up to K selections are added to
subset is relatively modest but, given that the the subsam~leat each iteration and the best
time required for such algorithms is propor- compound from the subsample added to the
tional to the size of the total population and selected set. At the limit ofK = 1, this is equiv-
the square of the size of the desired diverse alent to minimum dissimilarity selection,
subset, they are far less satisfactory when, for whereas at the limit of K = N (total number of
example, selecting a subset of 10,000 from a compounds) the algorithm is equivalent to
population of 1,000,000. maximum dissimilarity selection. By altering
Alternatively, the number of desired com- the value of K the user can thus achieve a
pounds can be predefined and a stochastic al- compromise between the diversity and repre-
gorithm used to maximize the diversity of the sentativeness of the selected set. Tests of the
selected set, although these methods are even algorithm suggest that it is possible to achieve
slower than addition methods. Sphere-exclu- selections similar to those achieved from hier-
sion methods, which Pearlman calls "elimina- archical clustering methods at a greatly re-
tion" algorithms because the diverse subset is duced computational cost. Maximum dissimi-
created by eliminating compounds from the larity methods were shown (66d) to lead to
superset, have been implemented in Diverse- more stable QSAR models with higher predic-
Solutions (31) (see Section 2.2.1.1), providing tive power, based on a comparative mean field
a rapid distance-based diverse subset selection analysis of angiotensin-converting enzyme
method. The minimum distance between inhibitors.
nearest neighbors within the diverse subset is Hudson et al. (67) describe two parameter-
first defined (Dm,,), a compound is chosen at based methods for compound selection. The
random, all compounds within Dminare re- most descriptive compound (MDC) method is
moved, a second compound is chosen at ran- aimed at selecting compounds that represent
dom, all compounds within Dminare removed, the population as a whole. An information vec-
and this is repeated until no more compounds tor is accumulated from the ranked Euclidean
can be chosen. The algorithm controls the size distance of each compound in the data set to
of the resulting diverse subset by automati- all others. The most descriptive compound is
cally repeating the process with a larger or that with the largest information, which
smaller value of Dminas necessary. Because equates to the compound with the smallest
the time required for each elimination sub- overall distance to all other compounds. The
set is proportional to the size of the superset next compound is chosen to give the greatest
and size of the subset (not size of subset .
additional information and so on. The s ~ h e r e -
squared), the elimination method, despite exclusion method used attempts to select com-
the need for (typically) four or five automatic pounds that most effectively cover the prop-
repetitions, is far faster than the addition erty space. A compound is selected, say the
method and yields subsets of essentially MDC, and all compounds are removed that are
equal diversity. closer to it than a user-defined radius. The
next selection is that compound in the remain-

ing set that is closest to the one already se-
lected. This process is repeated until no com- The reader is referred to reviews (e.g., see
pounds are left for selection. The methods Refs. 2, 6, 7) for further detail and discussion
were applied to the selection of standard sets of other related measures and methods for di-
for biological screening at Wellcome. versity-based selection.
The maxmin approach (23c)uses the short-
est nearest-neighbor distance as a measure of 2.2.4 Biasing to Desired/Desirable Proper-
diversity in the sample: ties. Any of the methods above can be used to
bias the compound selection toward a particu-
lar region of property space, for example, by
restricting the selection to cells or clusters
containing known actives (in the latter case it
This measure is particularly useful in select- may mean reclustering if the active is from an
ing diverse compound sets from a corporate external source). However, it is a common ex-
collection, as exemplified by Higgs et al. (68). perience when applying diversity-based selec-
They also introduce the concept of a coverage tion to large databases to see a number of com-
design for lead follow-up, in which compounds pounds that are undesirable for a number of
are selected to be maximally similar to a set of reasons. They may be too large, too flexible,
leads. too lipophilic, contain too many acid groups,
An alternative approach is to use the sum and so forth. Thus, it is general practice to
of pairwise similarities in the maxsum ap- apply filters during the selection process.
proach: These can include limits on property values
N N such as calculated log P and molecular weight
2 2 sim(i,j ) (68) and the application of substructure filters
i=lj=l to remove undesirable or reactive compounds
Dmsurn = 1- N2 (21a, 66a, 74). Lipinski (29) formalized some
of these ideas in the Rule-of-5, derived from
This approach is particularly efficient when analysis of orally absorbed drugs, with prob-
combined with the Cosine coefficient (69) and lems more likely if two or more of MW > 500,
was used by Pickett et al. in combination with c logP > 5, sum NH, OH (- H-bond donors) >
pharmacophore descriptors (70). In lower di- 5, sum N,O (- H-bond acceptors) > 10. Vari-
mensional spaces the maxsum measure tends ants of this, often stricter (e.g., only one viola-
to force selection from the corners of diversity tion and/or lower values for hitslleads), are
space (6b, 71) and hence maxmin is the pre- now widely used in conjunction with other
ferred function in these cases. A similar con- methods for classification of compounds as
clusion was drawn from a comparison of algo- likely to be orally absorbed or to penetrate the
rithms for dissimilarity-based compound CNS. The reader is directed to several reviews
selection (72). on this topic (75). The usefulness of such ap-
An excellent discussion of different diver- proaches has been shown by the work of Pick-
sity functions has been given by Waldman et ett et al. (76), where a library was designed
al. (73). A set of ideal behaviors for a diversity using simple descriptors such as polar surface
function was defined. These are particularly area for oral absorption (77). The designed li-
relevant to library design tasks. Thus, al- brary showed improved absorption in a Caco-2
though maxmin is suitable for selecting highly system over a previous related library where
diverse molecules, it is less well suited to li- the products had not been formally designed
brary design optimization processes. An alter- to these criteria.
native function was defined based on mini- In a more general sense, compounds can be
mum spanning trees, which had previously selected to reproduce a given set of property
been used by Mount et al. (71) and a gaussian profiles for calculated log P, molecular weight,
error function, erf( ): and so forth derived from, say, a set of known
~ a Screening
l by Molecular Similarity
I Substructure,
featureirnotif. -- I \
Figure 5.8. Example of privileged four-point pharmacophores, either created from a ligand using a
Particular feature (e.g., the centroid of a "privileged" substructure) or complementary to a protein
sit;e using a site point or attachment point of a docked scaffold. Only pharmacophores that include
this special feature are included in the fingerprint, thus providing a relative measure of diversity1
sixnilarity with respect to the privileged feature.
drugs. Such an approach is most widely used further in Section 5.1.2. The use of "receptor-
as an additional constraint in library design relevant" BCUT chemistry spaces from Di-
algorithms (78) and is further discussed verseSolutions provides a different approach
below to a focused similarity/diversity measure (lld,
An interesting example of biasing in com- 32e-h).
pounc1 selection is provided by Grassy et al.
(79). 1Lead compounds were used to derive a
range of acceptable values for topological indi- 3 VIRTUAL SCREENING BY MOLECULAR
ces a1nd other molecular descriptors. These SIMILARITY
were used to filter a large virtual library and
led to an active compound being synthesized. The use of molecular similarity to analyze
large databases of structures using informa-
. 5 Relative Diversity/Similarity. This de-
scribes an approach that measures "relative"
tion derived from one or several ligands pro-
vides a powerful ligand-based virtual screen-
.
similarity and diversity between chemical ob- ing method (protein structure-based virtual
jects, in contrast to the use of the concept of a screening methods are by comparison based
total IDr "absolute" reference space (80). The on docking structures into a binding site). Vir-
abilitjr of 3D pharmacophoric fingerprint de- tual screening requires that a set of structures
scriptors to separate ligand-binding proper- is ranked, with the goal of identifying new
ties firom chemical structure has enabled a structures that have similar biological activ-
usefulI modification to the way the descriptor ity, with top-scoring compounds sent for eval-
is evahated (37). It is possible to identify one uation in a biological assay. Usually, the re-
of the points of a pharmacophoric description quirement is to provide a small subset of
such iis a triplet or quartet with a special fea- compounds (10-1000) from a large set
ture, such as a "privileged" substructure (100,000-1,000,000) of possible compounds
?d important for binding or a pharma- for screening that is enriched in actives (i.e.,
cophore group. A fingerprint can be generated contains a greater proportion of actives than
that (lescribes the possible pharmacophoric that of the full compound set). In this context,
shape,s from the viewpoint of that special enrichment involves identifying the highest
point/substructure (see Fig. 5.8). This creates number of new chemotypes as opposed to an-
a "re1ative" or "internally referenced" mea- alogs of the query structure($. Pharmaco-
sure (~f diversity, enabling new design and phoric methods have been found to be partic-
analyrris methods. The technique has been ex- ularly effective for this, building on the
tensiv.ely used to design combinatorial librar- successful use of 3D database searching for
ies thtit contain "privileged" substructures fo- lead generation. Other similarity methods
cused on GPCRs (37a), and this is described such as the use of 2D descriptors (Section
2.1.1) are also commonly used to identify molecule-by-molecule searches. This provides
structures for screening based on the struc- the ability to search mixtures, which some
ture of a known ligand. The use of similarity companies use for high throughput screening,
searching in chemical databases has been re- in that both the search query and/or the data-
viewed by Willett et al. (2a), comparing newer base being searched can be mixtures of struc-
types of similarity measure with existing ap- tures.
proaches. In this section the focus is on the use
3.2 Use of 3D Pharmacophore Fingerprints
of the 3D pharmacophoric methods, which
(Three- and Four-Point)
have been shown to provide a ligand-based vir-
tual screening method that yields new chemo- Some research groups have extended the at-
types. om-pair descriptors to three-point (triplets)
and four-point (quartets) pharmacophore de-
3.1 Use of Geometric Atom-Pair Descriptors
scriptors (35,37,76,81)as described in section
The topological atom-pair descriptors (24) 2. These descriptors have a potentially supe-
have been extended by Sheridan and cowork- rior descriptive power, and a perceived advan-
ers to geometric atom pairs (26),and shown to tage over atom pairs is the increased "shape"
be effective at generating hit lists enriched in information (intrapharmacophore distance
active molecules of different chemotypes. A set relationships) content of the individual de-
of precalculated conformations (-10-25) is scriptors (37a). The quartet (tetrahedral)
used for each molecule, and each atom is as- four-point descriptors offer further potential
signed two different atom types: (1)a binding 3D content by including information on vol-
property (donor, acceptor, acid, base, hydro- ume and chirality (37a, 82), compared with
phobic, polar, and other); (2)a combination of the triplets that are components of the quar-
element type, number of neighbors and T-elec- tets and represent planes or "slices" through
tron count. All combinations of atom pairs are the 3D shapes.
analyzed, for each conformation, and result- The fingerprints can be precalculated for
ant histograms of each probe and database database compounds, with conformational
molecule conformation are compared. The sampling, and stored in an efficient format
technique was compared with its topological (e.g., four-point pharmacophore fingerprints,
equivalent (counting bond connections be- where one line of encoded information uses,
tween atoms to estimate interatomic "dis- about 11 kilobytes of space for 1000 pharma-
tance"). This demonstrated that, although cophores). Probe fingerprints from one or
both methods were able to significantly enrich more structures can be rapidly compared
the highest ranking structures with other ac- against such databases at speeds of >100,000
tive molecules for the same target (-20- to compounds/min, even for large four-point
30-fold enhancement over random in the top pharmacophore fingerprints, representing
300 compounds), the 3D structure-derived de- about 10 million different pharmacophoric
scriptors were able to show their advantage by shapes. Similarity is measured using potential
picking out active chemotypes with greater pharmacophore overlap and similarity indices
structural variation relative to those from the such as the modified Tanimoto index (37a).
2D searches. The analysis used about 30,000 The relative merits of two-, three- and four-
structures from the Derwent Standard Drug point pharmacophore descriptors for different
File (SDF; version 6, developed and distrib- applications is an area of ongoing study (37,
uted by Derwent Information Ltd., London, 83). Figure 5.9 shows some structurally di-
England, 1991, now known as the World Drug verse endothelin antagonists that exhibit low
Index) using probe molecules with known ac- 2D similarity, but maintain significant over-
tivity against a particular target to rank the lap of their four-point pharmacophore finger-
database. Sheridan et al. (26c) have also prints (37a).
shown how a single combined atom-pair de-
3.3 Validation Studies
scriptor from a set of molecules can be used in
a single fast search to provide results similar The validation issue for ID, 2D, and 3D de-
to those from the slower process of individual scriptors for similarity searching and virtual
3 Virtual Screening by Molecular Similarity
OMe
u r/\/O do
%SB 209670
0
H3C
,& N
N,wcH3
:'? CH3
L-746,072
Figure 5.9. Structurally diverse endothelin antagonists exhibiting low 2D similarity wk lain-
taining common pharmacophoric elements crucial to activity.
reening has been addressed in several pub- These relate to bias in the data sets arising
cations (5d, 14a, 45, 72, 84, 85). Conflicting from the presence of closely related analogs,
esults have been reported, probably because which by their nature have high 2D substruc-
if the wavu the different descri~torswere used
A
tural similarities, and the way the 3D pharma-
and biases in the test sets. Two primary con- cophoric descriptors were generated (single
pts have been applied
-- to the analysis of bio- conformation only) and used (bin setting,
&a1 data. The concept of "neighborhoodn Tanimoto index).
~ehavior(84) as a measure of descriptor utility Some comparative studies of ligand-based
las been promoted, based on the idea that if a virtual screening methods have been under-
lescriptor is able to cluster molecules with a taken within Bristol-Myers Squibb (85) using
articular biological activity, the descriptor more optimum settings for pharmacophore
ncodes information regarding the require- fingerprint generation [four-point pharma-
ments for that activity, and by extension is a cophores, 7 distance bins, and full conforma-
lseful measure for molecular similarityldiver- tional analysis (37a)l, which gave quite differ-
ity. Comparisons using 2D fingerprints with ent results. An example using melatonin as a
harmacophore fingerprints with this ap- probe molecule to search against a database of
roach led to the conclusion that 2D descrip- about 150,000 compounds containing about
rs performed better than their 1D and 3D 250 known melatonin antagonists is shown in
ounterparts (14a, 45). However, issues with Fig. 5.10. The graph shows the hit rates ob-
he studies undertaken have been raised (85). tained by similarity ranking in terms of the
21 2 Combinatorial Library Design, Molecular Similarity, and Diversity Applications
Daylight
Isis/MACCS
Atom pairs
4-point
pharmacophores
4-point pharmacophores
+ Atom pairs
Figure 5.10. Ligand-based virtual screeninghimilarity analysis of a 150,000-compound database con-

taining about 250 known melatonin antagonists, showing the strong performance of the pharmacophore
descriptors, and the complementarity of atom pairs and pharmacophore descriptors when combined.
number of active compounds located across The advantage of 3D pharmacophore-

the top-ranking 1000 compounds. In this case, based geometric descriptors over topological
for the 2D descriptors shown, only the atom- descriptors in being able to pick up new che-
pair (26a) descriptor (which has elements of a motypes with major 2D structural variations
two-point pharmacophore fingerprint with in- is of particular importance when exploiting a
tercenter distance replaced by bond count) peptide lead. In such cases, the goal of screen-
produces comparable results. A 2D similarity ing is normally a nonpeptidic molecule. Using
search using the UNITY 2D fingerprint (13c), the pharmacophore fingerprint from rela-
with a very low 50% similarity cutoff, pro- tively large and flexible peptidic molecules
duced a hit list of 1669 compounds containing (e.g., tetrapeptides), it is possible to identify
only 10 melatonin actives (an enrichment of structures that match just part of the pharma-
3.6 relative to random screening of 1669 com- cophoric information; a modified Tanimoto co-
pounds, for which 2.8 actives would be ex- efficient can be used to reduce the penalty for
pected to be found). In contrast, the pharma- only a partial match. It is possible to identify a
cophore fingerprint similarity search finds 93 set of reasonable structures that, as an ensem-
actives in the first 1669 compounds, a total ble sample most of the potential pharmaco-
enrichment of >33 relative to random, and a phores exhibited by the peptide. With 2D
further >ninefold relative to the 2D similarity methods the high ranking molecules will tend
search. Preliminary studies on systems with a to be similarly peptidic, rather than more
much wider structural diversity in active li- druglike molecules exhibiting 3D properties of
gand chemotypes suggested hit rates even the peptide.
more favorable to pharmacophore finger- An example of the use of peptidic informa-
prints, with cases where no 2D methods were tion comes from the work of Pickett et al. (76)
able to improve on random hit rates. It is of using the known tripeptide RGD (Arg-Gly-
interest that averaging the four-point phar- Asp) (see Fig. 5.11) motif fibrinogen uses for
macophore/atom-pair rankings leads to even receptor binding (86). A database of 100,000
better results in the melatonin investigation, compounds, which had been seeded with fi-
highlighting (45) the potential advantages of brinogen receptor antagonists covering a wide
combined descriptors. range of structural classes (see Fig. 5.12), was
3 Virtual Screening by Molecular Similarity
gested by structural biology studies, highlight-

ing an advantage of the descriptor encoding a
conformational ensemble and not requiring a
bioactive conformation to be known. Virtual
screening can thus be undertaken even when
only limited structural information is avail-
able. The four-point pharmacophore finger-
print of the RGD tripeptide probe, N- and C-
Figure 5.11. Structure of the RGD motif. capped with amide groups, was generated
with a full conformational analysis and used
screened using a customized (35, 37, 70) ver- for the search. All the actives appeared in the
sion of Chem-WChemDiverse with four-point top 3%of the data set, with a reasonable diver-
pharmacophore fingerprints. A degree of flex- sity of hits and little 2D resemblance to the
ibility of the RGD motif loop region was sug- probe peptide.
Figure 5.12. Some structurally diverse

RGD antagonists matched through virtual
screening using the RGD motif and 3D
pharmacophore fingerprints.
4 COMBINATORIAL LIBRARY DESIGN searching or diverse set selection), lead to non-

combinatorial synthetic schemes and are
4.1 Combinatorial Libraries clearly at odds with the efficiency objectives of
Normally, the term combinatorial library im- a combinatorial synthesis. Reactant-based
plies a library of a few hundred to many thou- methods suggest lists of reactants based solely
sands of products produced using high on comparisons of reactant properties without
throughput robotics in a facility dedicated for regard for the properties of the resultant vir-
such purposes. In contrast, the term parallel tual products. Thus, reactant-based methods
library implies a library of less than 10 to a few avoid the need for enumerating what could be
hundred products produced using more or less very large numbers of virtual products and
traditional medicinal chemical synthetic pro- making even greater numbers of comparisons
cedures or increasingly common low through- of product properties. Compromise solutions,
put robotics. In either case, lists of reactants discussed later, that approximate certain
are combined in a combinatorial fashion to product properties from the reactants without
yield an array of products. The methods de- enumeration have been developed. By directly
scribed in this section are applicable equally to selecting a desired number of each type of re-
both cases. A strictly combinatorial combina- actant, the chemist can ensure an efficient,
tion of reactants (or reactants and scaffold) full-combinatorial array design, and the expe-
produces the most efficient use of reactants diencies of reactant-based methods led to their
and automation/robotics for a library synthe- widespread use for the design of both large
sis. However, the issue of generating products and small combinatorial libraries. However,
that have suitable properties for biological
the growing awareness of a need to maintain
screening and as hitllead material is key, and
druglike properties, if only for practical issues
constraints discussed below are often used, re-
such as solubility, has led to the use of reac-
sulting in not all combinations of reactants
tant-biased methods that consider the proper-
being used.
ties of the products from enumerated struc-
tures, for which the full array of ligand- and
4.2 Combinatorial Library Design
structure-based design methods can be ap-
The process of combinatorial library design plied, and the resultant synthesis of sparse q=
brings together many molecular diversity and rays (see later). In addition, the assumption
similarity approaches with the aim of identify- that optimal product diversity can be approx-
ing a set of reactants that are to be combined imated by using diversity-optimized reactants
(reacted) to form products. Combinatorial li- has been questioned (87),and several product-
brary design is, inevitably, an iterative pro- based diversity methods for selecting reac-
cess: software is used to suggest lists of reactants have been developed that consider the
tants; chemists accept some suggestions but need for full or sparse combinatorial arrays in
reject others (for various reasons ranging the design process. The rationale behind this
from cost or availability to poor synthetic observation can be understood when one con-
yield). If software is to be used to suggest re- siders that "constraints" based on whole mol-
placements for rejected reactants, it must be ecule properties and some form of molecular
designed to accommodate this iterative pro- similarity/diversity are usually required.
cess. Pearlman (11, 8 7 4 has made this argu-
Although the objective is always to identify ment quite convincingly by comparing the re-
which reactants should be used to make the sults of alternative diverse library design
products, there are two fundamental ap- methods in a low dimensional chemistry
proaches to library design: reactant-based space, as illustrated in Fig. 5.13. Figure 5.13a
methods and product-based methods. Purely depicts a virtual library of 634,721 allowed
product-based methods, which select (or combinatorial AB products (remaining after
cherry pick) desired products without regard optional filtering of the full virtual library
for the number of reactants required to form based on Lipinski's Rule-of-5 "druglikeness"
those products (as in standard similarity criteria) in a chemistry space specifically cho-
iatorial Library Design
Figure 5.13. (a) A virtual library of 634,721 allowed combinatorial AB products (after filtering out
proclucts that failed Lipinski's Rule of 5 "druglike" criteria) shown in a BCUT chemistry space
specifically chosen to best represent the diversity of the virtual library. (b) The maximally diverse
96013-compound subset of the virtual library, illustrating the results of purely product-based "library
design." Although providing the maximal diversity, synthesis of these 9600 AB products would
reqllire the use of 347 A's and 1024 B's-clearly unacceptable from the perspective of synthetic
ecoriomy (numbers of reactants and robotic control). (c) The 9600-compound library resulting from
the traditional, purely reactant-based library design strategy of selecting the 80 most diverse A's and
the 120 most diverse B's. Although providing user-selected synthetic economy, the diversity of these
96010 AB products is clearly quite poor. (d) The 9600-compound library resulting from the reactant-
biased, product-based (RBPB) algorithm developed by Pearlman and Smith (see Refs. 31, 87c and
text). The algorithm selected a different set of 80 A's and a different set of 120 B's, thus providing the
same level of user-selected synthetic economy, while also providing substantially greater diversity
tha~I could be achieved using a purely reactant-based library design strategy. See color insert.
lest represent the diversity of that vir- omy. Although the diversity of these products
Irary. Figure 5.13b illustrates an opti- is clearly optimal, the fact that 347 A's and
diverse "library" of 9600 products 1024 B's would be required to make the 9600
!~ectedby using cell-based diverse subset se- AB products provides an equally clear indica-
ction to cherry pick the 9600 most diverse tion of why purely product-based methods are
roducts without regard for synthetic econ- unsatisfactory from an economical perspec-
4 COMBINATORIAL LIBRARY DESIGN searching or diverse set selection), lead to non-

combinatorial synthetic schemes and are
4.1 Combinatorial Libraries clearly at odds with the efficiency objectives of
Normally, the term combinatorial library im- a combinatorial synthesis. Reactant-based
plies a library of a few hundred to many thou- methods suggest lists of reactants based solely
sands of products produced using high on comparisons of reactant properties without
throughput robotics in a facility dedicated for regard for the properties of the resultant vir-
such purposes. In contrast, the term parallel tual products. Thus, reactant-based methods
library implies a library of less than 10 to a few avoid the need for enumerating what could be
hundred products produced using more or less very large numbers of virtual products and
traditional medicinal chemical synthetic pro- making even greater numbers of comparisons
cedures or increasingly common low through- of product properties. Compromise solutions,
put robotics. In either case, lists of reactants discussed later, that approximate certain
are combined in a combinatorial fashion to product properties from the reactants without
yield an array of products. The methods de- enumeration have been developed. By directly
scribed in this section are applicable equally to selecting a desired number of each type of re-
both cases. A strictly combinatorial combina- actant, the chemist can ensure an efficient,
tion of reactants (or reactants and scaffold) full-combinatorial array design, and the expe-
produces the most efficient use of reactants diencies of reactant-based methods led to their
and automation/robotics for a library synthe- widespread use for the design of both large
sis. However, the issue of generating products and small combinatorial libraries. However,
that have suitable properties for biological
the growing awareness of a need to maintain
screening and as hitllead material is key, and
druglike properties, if only for practical issues
constraints discussed below are often used, re-
such as solubility, has led to the use of reac-
sulting in not all combinations of reactants
tant-biased methods that consider the proper-
being used.
ties of the products from enumerated struc-
tures, for which the full array of ligand- and
4.2 Combinatorial Library Design
structure-based design methods can be ap-
The process of combinatorial library design plied, and the resultant synthesis of sparse q=
brings together many molecular diversity and rays (see later). In addition, the assumption
similarity approaches with the aim of identify- that optimal product diversity can be approx-
ing a set of reactants that are to be combined imated by using diversity-optimized reactants
(reacted) to form products. Combinatorial li- has been questioned (87),and several product-
brary design is, inevitably, an iterative pro- based diversity methods for selecting reac-
cess: software is used to suggest lists of reactants have been developed that consider the
tants; chemists accept some suggestions but need for full or sparse combinatorial arrays in
reject others (for various reasons ranging the design process. The rationale behind this
from cost or availability to poor synthetic observation can be understood when one con-
yield). If software is to be used to suggest re- siders that "constraints" based on whole mol-
placements for rejected reactants, it must be ecule properties and some form of molecular
designed to accommodate this iterative pro- similarity/diversity are usually required.
cess. Pearlman (11, 8 7 4 has made this argu-
Although the objective is always to identify ment quite convincingly by comparing the re-
which reactants should be used to make the sults of alternative diverse library design
products, there are two fundamental ap- methods in a low dimensional chemistry
proaches to library design: reactant-based space, as illustrated in Fig. 5.13. Figure 5.13a
methods and product-based methods. Purely depicts a virtual library of 634,721 allowed
product-based methods, which select (or combinatorial AB products (remaining after
cherry pick) desired products without regard optional filtering of the full virtual library
for the number of reactants required to form based on Lipinski's Rule-of-5 "druglikeness"
those products (as in standard similarity criteria) in a chemistry space specifically cho-
4 Combinatorial Library Design
thesis of a library of compounds with a high contain molecules constrained to certain drug-
degree of control over associated properties. like properties with only a small trade-off in
Thus, the combinatorial library design pro- terms of the maximum possible diversity.
cess brings together many of the methods al- The design of leadlike combinatorial librar-
ready described for molecular similarity and ies is an approach of more recent interest. A
molecular diversity coupled to synthetic feasi- lower molecular weight starting point is ad-
bility considerations. Diversity-based and vantageous, in that bulk can be added for po-
structure-based approaches to the design of tency/selectivity/propertieswithout exceeding
virtual libraries have been reviewed (7, 91a). "rule of 5" parameters for orally absorbed
Both ligand-based and protein structure- drugs; otherwise a more labor-intensive step
based virtual screening methods can be used, may be needed to identify a smaller active part
with the combinatorial nature of the virtual of the hit. The properties required of library
compounds being exploited to increase the compounds intended to provide leads suitable
speed of the analysis. Some properties of the for further optimization, that may be rather
products can be estimated rapidly on the fly different from final optimized leads, has been
from the reactants, and products can be gen- reviewed (95).
erated in the active site. The CombiDOCK ap- Thus, library design is a complex optimiza-
proach that can rapidly analyze very large vir- tion problem with often competing con-
tual databases in a binding site by connecting straints, including requirements to have com-
reactants to scaffolds docked in multiple binatorial efficiency and/or several specifled
orientations is discussed in Section 4.10. A product properties (both desired and nonde-
genetic algorithm-based method for the com- sired). Methods such as genetic algorithms,
binatorial docking of reactants has been de- simulated annealing, and Monte Carlo optimi-
scribed by Jones et al. (921, with the applica- zation have been used, and iterative cyclic ap-
tion of a ligand-docking genetic algorithm to proaches applied. The next section describes
screening combinatorial libraries. the application of these methods within the
A challenge in the design of small- and me- context of library design but the reader should
dium-sized focused combinatorial libraries is note that some of these methods are applicable
to harness for use in library design the experi- only for the design of diverse libraries.
ence and knowledge gained in generating
4.3 Optimization Approaches
structure-activity relationships (91b). Screen-
ing libraries biased for pharmaceutical discov- The most basic product-based selection pro-
ery are often designed to augment the struc- cess used in library design is an order-depen-
tural diversity of a chemical library. The dent analysis of products, selecting a com-
approach used in the LASSO0 algorithm (93) pound if it exhibits sufficient "diversity" to
is based on the identification of compounds products already selected. This approach was
from a virtual library that are most different used in the Chem-X/ChemDiverse software
from those already present in a screening set with three- and four-point pharmacophore
and to a reference set of undesirable com- fingerprints. A compound was selected if the
pounds, while being simultaneously most sim- overlap with the ensemble fingerprint of al-
ilar to a set of compounds with desirable char- ready selected compounds was less than a
acteristics. An illustration of the method using user-defined amount; that is, the molecule
bit-string structure descriptors is given. contains a significant number of pharmacoph-
Combinatorial library design approaches ores not already exhibited in selected com-
have been discussed (94), with the design of pounds. This cherry-picking process is an effi-
library subsets that simultaneously optimize cient method for ensuring a high diversity
the diversity or similarity of a library to a tar- library, but can be a combinatorially ineffi-
get, properties (such as druglikeness) of the cient selection for synthesis, with no explicit
library members, properties (such as cost or reference to the constituent reactants being
availability) of the reactants required to make made (see Section 4.2 above for further exam-
them, and the efficiency for array synthesis. ples). A preferred selection for combinatorial
They showed that libraries can be designed to efficiency is arrays of reactants, in which all
reactants from one component of a combina- been used to perform reactant selection for
torial library are reacted with all the reactants combinatorial libraries based on three-point
in the other components, or sparse arrays, in pharmacophores (78a,b), as described above,
which subsets of reactants are combined. Ad- and other metrics (6b, 23d, 97c,d).
ditional constraints such as physicochemical Genetic algorithms (GA) are another class
properties and flexibility are addressed implic- of optimization techniques widely used within
itly by assigning upper and lower bounds for chemistry (98) that have been explored for li-
given properties, or controlling the order in brary design. A GA is an attempt to utilize the
which molecules are processed. Darwinian process of evolution in an optimi-
To address the issue of using pharmaco- zation procedure. A solution is represented by
phore fingerprints in a way that enabled a a string of fixed length, the chromosome, and
combinatorially efficient selection of reactants is evaluated according to some criterion to
to be selected, and the explicit inclusion of ad- give the fitness score, for example, the phar-
ditional molecular properties such as a bal- macophore coverage of the solution (78b). The
ance of druglike physicochemical properties GA maintains a number of chromosomes (po-
and shape descriptors, the HARPick program tential solutions) that are ranked on their fit-
(78a,b) was created. A stochastic optimization ness and are then modified according to oper-
technique [Monte Carlo simulated annealing ators including mutation, where one element
(9611 was used to enable selections in reactant of the string is changed, and crossover, where
space, whereas diversity is still calculated in the string is cut at some position and swapped
product space. User-defined flexibility for the with equivalent portions of another solution.
reactant array sizes was possible, and addi- These new solutions are evaluated and the
tional descriptors could be used (e.g., to ad- process is repeated for a defined number of
dress the selection of non-drug-like com- iterations or until all (or most) solutions con-
pounds). The pharmacophore fingerprint verge on one result. For library design, the
(three-point, triplets) was used in a nonbinary string represents the selected monomers at
mode (the frequency of occurrence of each each variable position of the library. Evalua-
pharmacophore was calculated), and the tion involves enumerating the sublibrary de-
HARPick diversity measure was tuned to in- fined by the solution and calculating the score
clude a term (Conscore) to force molecules to associated with the products. The stochgstic
occupy relative rather than absolute voids in nature of the process means that the GA is run
pharmacophore space. This avoids the prob- several times to ensure good convergence.
lem of saturation of the fingerprint with large A GA was used by Sheridan and Kearsley
databases in a binary mode, particularly a (99) to design peptoid libraries focused to cho-
problem with the three-point pharmacophore lecystokinin by scoring on similarity to two
descriptors. It was thus possible to design peptide leads. Biological activity, rather than a
combinatorial libraries that exhibited phar- computed fitness, has been used as the score in
macophores that were poorly represented in a a directed combinatorial synthesis program
reference set of compounds. The Conscore (100). Brown and Martin developed GA-
constraint score sums the product of the num- LOPED (101) as a way to design combinatorial
ber of times pharmacophore i has been hit for mixtures. The SELECT program (78c) com-
molecules selected from the current data set bines measures of diversity and the physical
with the score associated with pharmacophore properties of the designed library. The library
i for the constraining library. The Conscore can be designed to be both internally diverse
term can be inverted, enabling focused de- and diverse with respect to a reference popu-
signs, in which the selection of products that lation. Physical properties are optimized by
occupy the more highly occupied bins (e.g., comparing to a user-defined profile for the
from a set of active compounds) is desired. The property of interest, c logP for example. As for
flexibility and success of this kind of stochastic the HARPick approach (78a,b), however, it is
optimization methodology has led to its use by necessary to define a weighting scheme be-
many other researchers for library design (5c, tween the different elements of the score,
6b, 23d, 78c, 97c,d). Simulated annealing has which leads to a number of difficulties. Selec-
4 Corr~binatorialLibrary Design 219
(4
0.6 1 1
0.58 0.6 0.62 0.64
AMW
A A
0.575 - j, Figure 5.14. (a) Results from multiple
'
". .
0.58 - SELECT runs with alternative weightings
? A
% for molecular weight vs. diversity. Filled tri-
0.585 -
'9ik rn
angles, 1.OxDiv and l.OxMW; filled circles,
1.OxDiv and 0.5xMW; filled squares,
0.59 -
0.595 -
x
' 10.OxDiv and l.OxMW. (b) As in a, with
results of a single MOGA run shown as
crosses. [Reproduced from V. Gillet, et al.,
0.6 I I J. Chem. Inf. Comput. Sci., 42, 375-385
0.58 0.6 0.62 0.64 (2002) with permission of the American
AMW Chemical Society.]
tion of the weights

- is nonintuitive when com- reto surface, as displayed in Fig. 5.14b, which .
parir~gdifferent properties or concepts (e.g., overlays the MOGA results onto the SELECT
dive1asity and c log P) and the use of weights results. Thus, the MOGA has several advan-
constrains the search space. Figure 5.14a il- tages over a traditional GA. In one run it gen-
lustr,ateshow changing the function weight in erates multiple solutions that are equally
a SELECT run alters the final solution; note valid, more fully explore solution space, and
also that the two objectives, molecular weight gives the designer an understanding of the re-
and Idiversity, are competing in this example. lationships between the different objective
Given these limitations, a novel modification functions.
has l3een made to the SELECT methodology. The RBPB algorithm of Pearlman and
The GA in SELECT is replaced by a multiob- Smith, described above (Section 4.2), consid-
jectilve GA (MOGA) (102) that eliminates the ers all possible candidate libraries, which sat-
needI for a weighting scheme. Instead, each el- isfy the user's constraints regarding economy.
erne]nt of the scoring function is optimized in- These include min/max range constraints re-
dependently and solutions scored according to garding library size (e.g., number of AB prod-
the idea of dominance (see Fig. 5.15). The so- ucts) and the number of each type of reactant
lutions of rank 0, the nondominated solutions, (e.g., number of A's and number of B's). These
are t,hosesolutions for which there is no supe- also include specification of the minimal unit
rior solution when considering all objectives; dimensions (MUDS),which define the small-
solurtions of rank 1 would be dominated in one est combinatorial array that the user is willing
I obje~:tive and so on. It is these ranks that are to address on the robotic table. Each candidate
usedI to describe the fitness of the solution. library corresponds to a different way of, at
Solutions of rank 0 are said to define the pa- least conceptually, arranging the required
most able to discriminate between active and

inactive molecules. The ensemble comes from
an analysis of a large number of pharma-
cophore hypotheses, with full conformational
sampling for both active and inactive com-
pounds. The ensemble hypothesis is used to
rapidly search virtual chemical libraries to
identify compounds for synthesis. Large vir-
tual libraries (e.g., a million structures) can be
analyzed efficiently. The method was applied
to a,-adrenergic receptor ligands, where het-
erocyclic a,-adrenergic receptor ligand leads
were evolved to highly dissimilar active N-sub-
stituted glycine structures.
Figure 5.15. Pareto optimality. The filled circles LiBrain (103) is a collection of software
represent rank zero or nondominated solutions for modules for automated combinatorial library
functions fl and f l . Point C is rank 1 because it is design, including the incorporation of desir-
dominated by point B. (Permission as in Fig. 5.14.) able pharmacophoric features and the optimi-
zation of the diversity of designed libraries. A
number of MUDSto construct a library within Chemistry Simulation Engine module is
the user's specified range limits. Each candi- trained by" chemists to determine the suitabil-
date library is scored based on an appropriate ity of reactants for a specified reaction, to rec-
function of the scores of the individual prod- ognize the risk of undesirable side reactions,
ucts it contains divided by its size; hence, an and to predict the structures of the most likely
average product score. The reactants used to reaction products, so as to circumvent major
make each candidate library are determined bottlenecks associated with automating the
by reactant scores, which are functions of process.
product scores and which are updated at each Legion and Selector (66c,d, 104) are soft-
step of the design of that particular candidate ware from Tripos (13c) for characterization,
library. For example, at a given step in the comparison, and sampling of sets of com-
design process, the reactant score for reactant pounds, including a combinatorial builder
A, depends on the scores of the products actu- (104),with available descriptors including fin-
ally accessible, given the current choices of B- gerprints and atom-pair distances. Clustering
type reactants. The score also depends on the tools (Hierarchical, Jarvis-Patrick, and Recip-
scores of the products that could be made us- rocal Nearest Neighbor) and compound selec-
ing B-type reactants, which may be selected at tion and diversity comparison methods avail-
a subsequent step in the process. The candi- able include Tanimoto Dissimilarity, the
date libraries with the highest scores are out- Reciprocal Nearest Neighbor approach, and
put for the user's final decision. In addition to the OptiSim algorithm (see Section 2.2.3).
be being remarkably thorough yet fast, the
4.4 Handling Large Virtual Libraries
RBPB also makes it very easy to address the
iterative nature of library design and to sug- The rate-limiting step in a product-based li-
gest replacements for previously suggested re- brary design process is often the calculation of
actants that had to be rejected for one reason molecular descriptors. This becomes particu-
or another. larly acute as one moves into the 3D arena, of
A rapid computational method for lead evo- course, but even the simplest 2D descriptors
lution has been described by the CombiChem take a finite time to calculate. In addition,
(now DeltaGen) group (39). Their 3D compu- there are the logistics associated with storing
tational approach for lead evolution is based virtual libraries of potentially tens of millions
on a pharmacophore fingerprint approach us- of compounds. The ability to search within the
ing multiple pharmacophore hypotheses. A set possible chemical space of a particular chem-
or ensemble of hypotheses is generated that is istry, as opposed to the limited space of syn-
thesized compounds, is an important compo- trieving them. There is also interest in extend-
nent of lead identification because this allows ing the approach to 3D property calculation
a weak hit from primary screening to be rap- (1054.
idly expanded into a more potent lead. Exist- An alternative approach has been taken by
ical database systems can be used or Agrdotis and colleagues. In a conference pre-
adily modified to benefit from the combina- sentation (106) they show how a neural net-
torial nature of libraries (64a) but they do not work can be trained on a small sample of enu-
overcome the fundamental issues. merated combinatorial products to reproduce
Downs and Barnard (105a) have proposed 2D molecular descriptors and properties for all
elegant solution to these problems using library members without the need to con-
struct their connection tables.
e Markush representation commonly used
A method for rapid similarity searching in
chemical patents. The key component of
large combinatorial spaces using a new algo-
descriptor calculation rithm Ftrees-FS was published by Rarey and
sis can be performed with- Stahl (135). The similarity search is based on
the need for full enumeration of the prod- the feature tree similarity measure represent-
ther words, both storage and calcula- ing molecules by tree structures. Combinato-
on will tend to scale as the sum of the rial chemistry spaces are handled as a whole
of building blocks in the library rather than looking at subsets of enumerated
an the product as in techniques re- compounds. A set of 17,000 fragments of known
n. The method has been drugs was used, which could be combined to
into a software suite and released 10'' compounds of reasonable size. A novel
commercially as the LibEngine module of the ChemSpace approach (45a)for searching large
&nus2 suite for combinatorial library analy- virtual libraries that does not require enumer-
8ia and design (105d). ation has also been developed by Tripos, using
ckground and theory behind the ap- shape descriptors (topomeric fingerprints) on
ach have been published (105b). In sum- the monomers, and has been used for targeted
, the algorithm relies on identifying a library design (45b).
sociated R-groups that define the
4.5 Library Comparisons
s may or may not be directly re-
to the manner of synthesis. For example, In the previous sections we described the de-
agine a tripeptide library synthesized from sign of libraries based on a number of user-
0 amino acids. The algorithm de- defined criteria, whether they were focused or
the tripeptide backbone as the core and whether they were of a more general nature.
cid side chains as the R-groups. So far, these designs have been undertaken,
e fingerprints are calcu- treating the library in isolation, with the in-
agment basis representing the clusion of property profiles in methods such as
and R-groups taking full account of the HARPick and SELECT to ensure that the syn-
the core and the possibility that thesized compounds are of a suitable physical
icular path may extend between two R- nature. In this sense, the designed library can
gerprints are then com- be said to be internally diverse; that is, the
full fingerprint, a relatively fast selected compounds are diverse within the
proach is a couple of orders of limited chemistry space of all virtual products.
tudes faster than calculating finger- Even for very large virtual libraries, the chem-
s from fully enumerated products. Addi- istry space is still small with respect to the
roperties such as molecular possible chemistry universe. It is very diffi-
-bond donor and acceptor cult, a priori, to address how "diverse" a de-
, and logP can be calculated in a similar signed library is compared to a library gener-
er as well as topological indices. Finger- ated with another set of reactions without
or property data can also be calculated having to go through the computationally ex-
mand for use with clustering algorithms, pensive process of computing all pairwise sim-
avoiding the overhead of storing and re- ilarities between members of the libraries.
Nevertheless, questions such as "How diverse is still defined with respect to a reference pop-
is the library compared to the screening collec- ulation. By comparing the libraries with refer-
tion?" or "Which of the following chemistries ence to a population (REFDB), such as a cor-
should I choose for a library?" are often posed porate database or a combination of known
and methods are required to answer them. drug databases, one can make statements
Distance-based methods such as clustering such as, library A shows the greatest overlap
can be and have been used but suffer from a with REFDB, whereas library B fills the great-
number of drawbacks both in terms of speed est number of empty or low occupancy cells.
and the fact that the exercise needs to be re- Cummins et al. (22) used a cell-based ap-
peated for every additional library (i.e., there proach to compare five databases, including
is no common frame of reference). In addition, the Wellcome Registry, to select screening sets
all pairwise comparisons would need to be per- of diverse compounds. Topological indices and
formed. Thus, Shemetulskis et al. (107) used a measure of free energy of solvation were
clustering methods to compare the Parke- taken as the descriptors and factor analysis
Davis corporate collection (117,000 com- was used to combine them and define a four-
pounds) with external compounds from dimensional chemistry space that was then
-
Chemical Abstracts Service (380,000) and partitioned. Outliers were removed to allow
Maybridge (42,000). Even today, clustering the partitioning to focus on the most densely
half a million compounds is a daunting task populated region. The use of pharmacophore
and interpreting the results is not straightfor- descriptors in such a task was illustrated by
ward. The Jarvis-Patrick method employed by Mason and Pickett (41, where the pharma-
Shemetulskis et al. has several input parame- cophore overlap between three libraries was
ters, including the need to predefine the num- calculated. It was possible to identify the li-
ber of clusters. Voigt et al. (108) compared the brary covering regions of pharmacophore
National Cancer Institute (NCI) database, a space not covered by the other two. Alterna-
publicly available database of compounds used tively, given that library A is synthesized and
in the NCI screening program, to a number of gives hits in screening, then presumably the
compound databases. The diversity of each library that overlaps best with A should be
collection was estimated by the number of made. Pearlman and Smith ( l l d ) have
compounds selected by use of a diversity-selec- adapted their DVS software to identify what
tion algorithm as a function of database size. they term a receptor-relevant subspace, where
The similarity overlap between two databases the BCUT metrics are selected to best group
has been determined by calculating the per- the active compounds within a population (in
centage of compounds of the first database for fact, it is possible to have several groupings of
which a compound exists in the second data- actives within the space) (see Section 2.2.1.1).
base with a similarity greater or equal to a Comparing two populations by pharma-
specified cutoff (109).Such an approach neces- cophore coverage, although straightforward,
sitates the calculation of the Tanimoto simi- does ignore the contribution from individual
larity coefficient of all compounds in a data- compounds. This is important, in that two li-
base with all compounds in the other braries could cover similar regions of pharma-
databases. As indicated before, the largest cophore space but individual compounds in
drawback of distance-based methods is that the two libraries could be displaying different
they give no indication of where the voids are subsets of the total pharmacophores covered.
within the chemistry space, and searching an This prompted Pickett et al. (70) to explore an
additional compound source for interesting alternative approach. In this case, a number of
compounds would require reclustering. potential scaffolds were available and the aim
Therefore, partitionlcell-based methods was to find which of these would best comple-
are preferred for such library comparison ment previously synthesized libraries. Virtual
tasks. They provide a common frame of refer- libraries were generated using a predefined
ence in which it is possible to identify voids set of reactants and pharmacophore finger-
within the chemistry space of a population. It prints were calculated for these and the previ-
must be emphasized that the chemistry space ously synthesized libraries. By use of mea-
sures proposed by Turner et al. (69b), the macophoric features, plus an additional defi-
virtual libraries were compared to the synthe- nition of other for all remaining unassigned
sized libraries at both a whole library level and atoms. A subset of the MDDR database (13a)
an individual molecule level. From this analy- was used to define a reference set of bioactive
sis it was possible to select the scaffold that molecules, separated into target classes (gene
best complemented the previously synthe- families). The discriminating power of several
sized libraries. molecular descriptors was measured using the
An alternative methodology based on the target class assignments for this set, and it
ring content of a database, using precalculated was found that the pharmacophore finger-
structure-based hashcodes has been proposed print outperformed other descriptors.
(110). The comparison of the hashcode tables 4.7 Combined Pharmacophore Fingerprints
can be used to compare two databases and the and BCUTs
number of distinct ring-system combinations
can be used as an indicator of database diver- Library design using a simultaneous optimiza-
sity. A method for diversity assessment called tion of BCUT chemistry-space descriptors (11)
the saturation diversity approach, based on and four-point pharmacophore fingerprints
picking as many mutually dissimilar com- has been reported (32d, 37d). The authors in-
pounds as possible from a database was also vestigated the feasibility and results in terms
proposed. The methods were used to compare of complementarity of simultaneously opti-
a number of public databases and gave similar mizing two product-based descriptors for reac-
results. tant selection from large virtual libraries. Di-
versity around a chosen chemistry was the
4.6 Pharmacophore-Based Fingerprints goal of the studies reported, but the approach
The examples of GPCR library design (de- could equally be applied to optimize to a de-
scribed in Section 5.1.2) and protein-site desired distribution of properties, say, from sets
sign for Factor Xa (described in Section 5.3) of biologically active compounds. A simulated
illustrate the use and relevance of pharma- annealing algorithm (97) was used to combine
cophore-based fingerprints in library design. both components in a single optimization pro-
A pharmacophoric bias has been a major com- cedure. The choice was based on the ease of
ponent of many library designs ( I l l ) , used in implementation and the ability to include
the context of focused or biased libraries. multiple components in the objective (23d),an
Their broad applicability is important, with important goal in many recent designs, if only
the same descriptors being used for diverse to modulate physicochemical properties to
library design, screening set selection, and fo- druglike ranges. In this example a small, fully
cused library design. This provides a consis- enumerated virtual library of 86,140 amide
tent approach that extends to protein-site compounds was constructed from carboxylic
based pharmacophores as discussed above. acids and primary amines present in the ACD
Their ability to determine the similarities and (Available Chemicals Directory). The prod-
differences between structurally diverse mol- ucts of the optimized and random starting re-
ecules and sites is very powerful. An ensemble actant sets were compared using average
pharrnacophore data set measure is often nearest-neighbor distances, and the Hopkins'
used, which attempts to condense the individ- statistic (113), which evaluates the degree of
ual molecule pharmacophore fingerprints into clustering in a data set, together with the four-
a single measure that describes the important point pharmacophore fingerprint diversity.
features of the data set as a whole (36, 37, The potential utility for very large virtual li-
78a,b). braries, where precalculation of all the phar-
McGregor et al. (112) have recently pub- macophore fingerprints would not be feasible,
lished a version of pharmacophore finger- was illustrated by calculating four-point phar-
printing (the PharmPrint method) applied to macophore fingerprints for virtual library
QSAR and focused library design that uses a compounds on the fly. The fingerprints were
limited basis set of 10,549 three-point phar- calculated during the optimization procedure
macophores. They included the usual six phar- and stored in a compact encoded form, with
previously calculated fingerprints reused as addition, the consideration of relatively rigid

-
needed (calculation times 1-5 s with confor- substituents reduces the number of structures
mational sampling per structure on an SGI to analyze by up to 10'' compared to that of a
RlOOOO machine). Diversity was evaluated for full product-based analysis. This permits a
the BCUT chemistry space using the ratio of thorough conformational sampling of very
filled to possible filled cells for the virtual li- large virtual libraries that would be too slow
brary. Four-point pharmacophore diversity on enumerated structures. A Euclidean prop-
was evaluated by the number of unique phar- erty space for diversity analysis is possible be-
macophores and the total number of pharma- cause of the small number of pairwise sub-
cophores in the product subset, with the goal stituent similarities, enabling options not
to optimize both the pharmacophoric unique- possible by counting set bits in a library union
ness of each compound selected and the total fingerprint. The database of oriented substitu-
number of pharmacophores exhibited. En- ent fingerprints is transferable between li-
couraging results were obtained, with addi- braries, within the restrictions of the noted
tional work necessary to develop a more gen- approximations. A major limitation in using
eral function. OSPPREYS is that it can be applied only
within a combinatorial library, and not be-
4.8 Oriented-Substituent Pharmacophores
tween libraries. OSPPREYS is well suited to
OSPPREYS is a pharmacophore diversity de- maximizing the diversity of scaffolds indepen-
scriptor developed specifically for combinato- dently, and can be used to build a screening file
rial library design by Martin and Hoeffel(43). based on such diversity.
Advantage is made of the common scaffold, so
4.9 Integration
calculations are performed on the sets of sub-
stituents. This enables a more detailed phar- The previous sections have outlined the basic
macophoric description of the library products methodology that has been developed in the
than through calculations that could be prac- areas of molecular diversity, similarity analy-
tically performed on the enumerated prod- sis, and library design. Traditionally, use of
ucts. By avoiding the problems of having to these methods was limited to a small number
analyze many products with many conforma- of exponents within a computational chemis-
tions per product, and an explicit dependency try group because it involved bringing to-
on the scaffold, a higher spatial resolution gether a diverse set of software tools and data
could be obtained. The analysis of enumerated sources. Combinatorial and high throughput
combinatorial libraries by pharmacophoric chemistry is now well integrated into the re-
methods is generally limited to smaller virtual search process and there is a need for bench
libraries, with three- or four-point pharma- chemists to have access to such tools. The
cophores, and limited conformational sam- Cousin system developed at Upjohn has been
pling, requiring new calculations for every li- in use since 1981 (114) and has recently
brary. The oriented-substituent pharmaco- evolved to the ChemList system. The system
phores (OSPs) were developed as a compro- includes tools for the browsing of dissimilar
mise approach between reactant and product- compounds from substructure searching, use-
based methods to rectify these limitations. To ful in reactant selection, for example. Gobbi et
recapture most of the orienting information al. (115) have described the development of
that is lost in fragmenting the enumerated the CICLOPS system in use at Novartis. The
products into substituents, two additional system provides functionality for designing
points are added to each ordinary one-, two-, and registering libraries and associated tasks
and three-point substituent pharmacophore, such as accessing reactant availability. Tools
necessitating approximations through the are provided for filtering reactant lists and se-
combinatorial conformer and the template lecting a diverse subset of reactants if re-
alignment assumptions. The OSPPREYS quired. The system is PC-based and is built
analysis does, however, account for up to nine- around the Daylight chemical information
point pharmacophore similarity in the prod- system (13b) and associated tool kits with cus-
ucts of a library with three diversity sites. In tom Windows clients to control the process.
4 Combinatorial Library Design 225
Databases (e.g.
compounds (e.g.substructure search) corporate registry,
I ACD)
t I
Refine list by functional groups, Other data sources

availability, target knowledge etc. (e.g.oracle; chemical
literature)
] Enumerate virtual library 1

4
Calculate profile
I Refine set based on profile 1

( Further selectionldesign methods 1
Figure 5.16. The workflow used within ADEPT (A Daylight Enumeration and Profiling Tool;
GlaxoWellcome, UK) for compound selection and library design. [Reproduced from A. R. Leach and
M. M. Hann, Drug Discovery Today,5 , 3 2 6 4 3 6 (2000),with permission of Elsevier Science.]
The ADEPT (A Daylight Enumeration and similar system has been implemented at Ver-
Profiling Tool) suite of programs developed at tex (118a). A key component of this system is
GlaxoWellcome (116) is a Web-based system the REOS filtering tool (118b), which applies
providing access to a wide range of library de- filters on molecular weight, lipophilicity, un-
sign functionality, again based around the wanted substructures, rotatable bond counts,
Daylight tool kit. Figure 5.16 provides an out- and so forth to remove "obviously bad" com-
line of the process workflow. Reactant lists are pounds.
generated from searches in databases of in- 4.1 0 Structure-Based Library Design
house and commercially available monomers.
Avariety of filters can be applied to reduce the Structure-based library design uses 3D struc-
size of the lists. These include filters on molec- tures of the biological targets to direct the de-
ular weight, rotatable bond count, and sub- sign and selection of templates/scaffolds and
structure filters to remove unwanted func- of reactants that will produce compounds that
tionality. After library enumeration, various can fit into the target and thus are likely to
property histograms are calculated. This al- bind and have biological activity. The experi-
lows the user to further refine the reactant mental structural information can be derived
choice. by a structural biology approach, using X-ray
A product-based library design algorithm, crystallography or NMR spectroscopy. Com-
PLUMS (117), has been developed to ensure putational models can be built and used (e.g.,
that combinatorial constraints are satisfied in homology modeling techniques for closely re-
the design. The algorithm successively related proteins), but an experimental structure
moves the monomer that adds least value to is always preferred. A structural biology ap-
the library as governed by two terms, the ef- proach can also be used to identify molecules
fectiveness (number of molecules meeting or fragments thereof that bind to a target. For
user-defined criteria such as property ranges, example, NMR screening (3) can be used to
fit to pharmacophore or dock to protein site) identify potential scaffolds or reactants for a
and efficiency (ratio of effectiveness to library combinatorial library that bind to a target site
size). The algorithm is sufficiently fast to and is able to detect very low affinity binding
work within the Web-based environment of (in the millimolar range, compared to the low
ADEPT. Figure 5.17 shows screen shots from micromolar range from biological screening);
ADEPT, illustrating how a library can be spec- this can be done without the need to deter-
ified and the resulting product histograms. A mine the 3D structure of the target.
otatable bonds 1
"
B 4 B 12 16 I
..... . . .......................................
I
ply m k w n value: A P Pd
~ m value:
.............................................. ...............................................................
oitfulm Weight
Nimbrr 0 8 ~ l u c s :342
Ixrnunum value: 287.42
Aaxlmum value: 748.35
Hean: 425.754
I /Standard deviation: 6 8 . 9 3 .
Figure 5.17. Screen-shots from ADEPT.(a)A simple two-component library composed of an ami-
nothiazole template and a series of piperidines specified with ADEPT.(b) Histograms of rotatable
bonds and molecular weight for the enumerated virtual library, aiding the medicinal chemist in the
design of the library. [Reproduced from A. R. Leach and M. M. Hann, Drug Discovery Today, 5,
326-336 (2000), with permission of Elsevier Science.]
Structure-based drug design (SBDD) is the tions are described: (1)where the 2D structure
topic of another chapter, and key issues such of some actives (diverse angiotensin I1 antag-
as the scoring functions for the ligand-recep- onists) is known, with the goal to design a li-
tor interaction are not discussed further here. brary that best resembles the actives; and (2)
The ability to combine SBDD with combinato- to simulate the situation where an active site
rial chemistry enables a focused design ap- (stromelysin-1 in this case) is available and
proach that can explore a range of ideas, re- the requirement is to design a library of struc-
ducing the dependency on SBDD limitations tures likely to bind to it.
(structural information, scoring, conforma- Tondi (123) discusses several examples in
tional sampling, etc.). The ability to obtain the which structure-based drug design and combi-
X-ray or NMR structure of new potent mole-
natorial library synthesis have worked suc-
cules complexed with their targets can also be cessfully together in a complementary way.
critical for the next iteration, in that compu-
These include the discovery of:
tational structure-based design methods may
be unable to predict alternative and new bind- 0 Potent nonpeptide inhibitors of cathepsin D
ing modes, especially because the protein site (124), which uses CombiBUILD (125), a de-
is normally kept rigid and unpredicted confor-
rivative of the DOCK (126a,b) approach,
mational changes can take place during the with this structure-based selection ap-
binding process. A review by Stahl (119) dis- proach yielding seven times as many hits as
cusses the technology that directly uses recep- a diversity-based procedure.
tor three-dimensional structures, discussing
0 Thrombin inhibitors (127), where B6hm et
relevant topics such as scoring functions, re-
ceptor-ligand docking, and practical applica- al. used LUDI to dock and score computa-
tions. Bohm and Stahl (120) have reviewed tionally available primary amines and then
structure-based library design in terms of mo- score the virtual library generated from
lecular modeling merging with combinatorial benzaldehydes with the top-scoring hit.
chemistry. 0 Novel inhibitors of matrix metalloprotein-
The synergy between combinatorial chem- ases (128):Rockwell et al. (128a) used a com-
istry and de nouo design has been discussed by binatorial library at the beginning of the
Leach et al. (121). They present an approach work to suggest leads suitable for further
wherein a template (corresponding to the cen- optimization that required a conformational
tral core of a combinatorial library) is posi- change at the binding site, and a structure of
tioned within an acyclic carbon chain whose the complex to enable iterative optimiza-
length and bond orders are systematically var- tion; Szardenings et al. (128b) used SBDD to
ied. The conformational space of each result- design the starting scaffold, with synthesis
ing structure (core plus chain) is explored, to guiding the introduction of diversity.
determine whether it is able to link together 0 Thymidylate synthase inhibitors (1291, us-
two or more strongly interacting functional ing DOCK to identify the starting lead.
groups or pharmacophores located within a
protein binding site. In a second phase, 2D The CombiDOCKprogram (1264, based on
queries are derived from the molecular skele- DOCK, enables the evaluation of very large
tons and used to identify possible reactants virtual libraries by using structure-based com-
from a database that would enable the all-car- binatorial docking. Multiple docked orienta-
bon linking chains to be replaced by more syn- tions of the scaffold are used to evaluate reac-
thetically feasible groups. tants separately at each of the substitution
Sheridan et al. (122) have published on de- positions. The total docking score for each
signing targeted libraries with genetic algo- product is rapidly estimated by summing the
rithms, extending earlier work, to use the GA contributions from reactants at each position
with 3D scoring methods and showing that the (which are attached as in the final product to
approach of assembling libraries from frag- the docked scaffold, which may be a computa-
ments in high scoring molecules is a reason- tionally convenient anchor fragment formed
able one. Example applications to two situa- during the reaction rather than a syntheti-
cally used chemical). Further checks are made with the cytochrome P450 metabolizing en-
for the highest scoring structures (e.g., for zymes are also now becoming available.
steric interactions between reactants at the
different substitution positions). This approx-
imation produces an enormous speed-up over 5 EXAMPLE APPROACHES
docking all the individual compounds, which,
from a time perspective, rapidly becomes pro- 5.1 General Target Class-Focused
hibitive for large combinatorial libraries. Approaches
From the scores it is possible to select combi-
nations of reactants that produce compounds
complementary to the protein binding site. 5.1 .I Defining the Chemical/Biological Space.
The design of target class (gene family) librar-
Combinatorial restraints can be applied as re-
ies or compound subsets requires the defini-
quired to obtain the most efficient use of reac-
tion of a biologically relevant chemical space.
tants and robotics, with an evaluation of any This "biological" space can then be used for
reduction in the inclusion of higher scoring the design and selection of biasedlfocused li-
compounds. braries and compound subsets. Many ap-
Different strategies for combining diversity proaches can be taken, adapting the use of a
and structure-based design in site-focused li- wide variety of similarity/diversity descriptors
braries and the DOCK-based CombiBUILD al- (discussed in Section 2.1) to the identification
gorithm are discussed in a review (125),as an of properties associated with a particular tar-
example of how lead compounds can be rapidly get class or subset thereof. The goal is to iden-
identified by combining diversity with struc- tify a feature or set of features that, ideally, is
ture-based design in site-focused libraries. specific, but more generally "enriched" for the
Lamb et al. (130) have published on the target(s) of interest. A common approach is to
design, docking, and evaluation of multiple li- identify chemical substructures that are char-
braries against a family of targets, using a sim- acteristic for the target class, and use these for
ilar divide-and-conquer algorithm for side the design. The simplest approach is to include
chain selection that enables the exploration of such substructures in the library, but the co-
large lists of reactant substituents with linear occurrence of other features is often needed,
rather than combinatorial time dependency. and the quantification of this provides an en-
The method consists of three main stages: (1) hanced design. An example of this combined
docking the scaffold, (2)selecting the best sub- approach is discussed in the next section, us-
stituents at each site of diversity, and (3) coming the pharmacophore fingerprints expressed
paring the resultant structures within and be- relative to "privileged" substructures. This
tween the libraries. The scaffold docking provides a convenient cell-based partitioning
procedure, in conjunction with a novel vector- approach. Alternatively, it is possible to iden-
based orientation filter, was shown to be effec- tify properties that are enriched for a particu-
tive for several protease targets, reproducing lar target class, without reference to any
experimental binding modes. particular substructures: 1D (e.g., physico-
The application of the powerful combina- chemical), 2D (e.g., ISIS keys, BCUTs),and 3D
tion of SBDD and combinatorial chemistry is (e.g., pharmacophore fingerprints) properties
not limited to lead discovery or the optimiza- can all be used. BCUTs have been used within
tion of potency, but also to the optimization of a target (to identify a receptor-relevant sub-
the selectivity (using knowledge of the struc- space, in which actives cluster), to differenti-
tures of related targets) and pharmacokineticl ate within a target class (e.g., ion channel
druglike properties of a molecule. For exam- openers vs. blockers) and for general target
ple, the structure of a ligand-receptor complex class analysis. BCUT chemical space provides
can clearly indicate areas where chemical a way to quantifj. the "diversity" of certain
modifications could be made to modulate these properties within actives for a target class, as
other properties, without directly affecting well as to identify any particular combination
binding/potency. Models/structures of ligands of properties that actives share. BCUTs have
5 Example Approaches
been used to select representative

. subsets The Ugi reaction (131), a four-component
from libraries biased to a target class (59a). condensation reaction, was chosen and more
than 100,000 compounds were synthesized.
5.1.2 7-Transmembrane C-Protein-Coupled Privileged substructures such as biphenyl tet-
Receptors. Examples of a product-based com- razole were used, for example, at the amine
binatorial library design that use four-point position (see Fig. 5.20). Other GPCR privi-
pharmacophore fingerprints in a "relative" di- leged substructures such as diphenyl meth-
versity mode have been described for the de- ane, biphenyl tetrazole, and indole were used
sign of combinatorial libraries that contain to focus the pharmacophore descriptors (see
"privileged" substructures focused on 7-TM Figs. 5.18 and 5.21). GPCR ligands reported to
GPCRs. These are a large family of very im- be active at receptors with peptidic endoge-
portant biological targets lacking high resolu- nous ligands were identified from the MDDR
tion experimental 3D structures of the human (13a). These compounds were used to provide
targets; therefore most design has focused the reference data for the design by calculat-
around the ligands. The occurrence of coming the union pharmacophore fingerprint of
mon "privileged" substructures for 7-TM compounds containing the privileged sub-
GPCRs, often spanning several targets, pro- structure (see Fig. 5.22 for an example struc-
vides a useful focused design method. Some ture). A virtual combinatorial library was then
example structures are shown in Fig. 5.18. created, and for a particular reactant position,
A useful modification was made to the stan- the privileged pharmacophore fingerprints
dard pharmacophore descriptor evaluation were calculated for each candidate reactant
(37, 80) by forcing one of the points in the over all the products that would be generated
pharmacophoric description to be aprivileged if it were used in the library. Either previously
substructure. This provides a novel quantifi- selected or a representative set of reactants
cation of all the 3D pharmacophoric shapes, were used for the other three components to
and thus important 3D information relevant generate the virtual Ugi products.
to the biological activity of the ligands, relative The combinatorial librarv" was then de-
to the substructure. This builds on the ability signed by comparing for each reactant the fin-
of 3D pharmacophoric descriptors to separate gerprint generated from the resultant prod-
*
chemical structure from ligand binding prop- ucts with the fingerprint for the known drug
erties, and enables a fingerprint to be gener- ligands (MDDR-fingerprint). Reactants were
ated that describes the possible pharmaco- selected by identifying, on a position-by-posi-
phoric shapes from the viewpoint of that tion basis, reactants that gave products that
special pointlsubstructure (see Fig. 5.19). A matched the greatest number of these MDDR-
relative or internally referenced measure of di- exhibited privileged pharmacophores. The de-
versity is thus created, enabling new design sign goal was recalculated after the selection
and analysis methods (see Section 2.2.5). The of each reactant, by removing the pharma-
goal of the published method was to design cophores matched by the products generated
novel structures, accessible through combina- by that reactant from the target list. Subse-
torial chemistry, that have one or more privi- quent reactants were thus picked based on
leged substructure reactantslcores, and are their ability to match the remaining pharma-
enriched in the relative 3D pharmacophoric cophores. The approach used was to select the
shapes of known ligands. The method identi- first reactant as the one that would mve - li-
fies patterns with other key features that need brary compounds with the most number of
to be present with the privileged substructure, privileged pharmacophores in common with
such as acids and bases. The optimization can the drug set. The process was continued until
also include an enrichment in pharmacophoric no more reactants could be found that contrib-
shapes containing the privileged substructure uted a nontrivial number of new privileged
that are not in existing structures, enabling pharmacophores. Optimization methods such
the exploration of new 3D pharmacophoric dias the HARPick approach (described in Sec-
versity focused around a feature known to be tion 4.3) could be used to enable other proper-
important for biological activity. ties, such as flexibility and physicochemical
:ample Approaches
Substructure
featurelmotif
e.g. acid, base
0=
H-bond donor I Acceptor
Acid I Base
Aromatic ring I Hydrophobe
Figure 5.19. Example of a "privileged" four-point pharmacophore. Here biphenyl tetrazole, a sub-
structure seen in a number of GPCR inhibitors, is specifically defined as a pharmacophore feature,
using a centroid dummy atom. Only pharmacophores that include this type are included in the
fingerprint, thus providing a relative measure of diversitylsimilarity with respect to the privileged
feature.
Properties, to be optimized also. The total The example here used only a binary finger-
nurnber of pharmacophores (this time without print, but even more powerful results can be
refc?renceto the privileged substructures) can obtained when a count for each potential phar-
alsc be monitored and optimized. Example re- macophore is included. The authors showed
sullts from one of the Ugi library optimizations that for these designed Ugi libraries the same
are shown in Fig. 5.23. Ugi chemistry could indeed yield significant
I
1 l'his design illustrates an advantage of a new diversity for multiple 14,000 compound
titioning (cell-based) approach. The phar- libraries, but that after three libraries dimin-
cophore fingerprint can be used to monitor ishing returns were obtained. They used the
gress, to quantify how much of the desired understandable nature of the pharmacophore
I has been accomplished, and to evaluate descriptor by analyzing the remaining MDDR-
?ther a given chemistry can yield further pharmacophore fingerprint to show that most
conlpounds that match the design criteria of the remaining pharmacophores to be
- .
andIlor explore new pharmacophoric space. matched contained acids andlor bases. A mod-
ified chemistry approach was therefore devel-
oped using protected acids (t-butyl esters) and
bases (BOC protected) in the Ugi reaction.
The unmatched cells in the MDDR-fingerprint
can be related back to the compounds that
Biphenyl tetrazole Diphenylmethane

897 compounds 487 compounds
across 3 MDDR across 59 MDDR
activity indexes activity indexes
Figure 5.20. Example of the Ugi chemistry with
bipt~enyltetrazole incorporated as a "privileged" Figure 5.21. Examples of 7-TM GPCR "privi-
grotip at the m i n e position. leged" motifs found in the MDDR database.
generated them, enabling a truly iterative de-

sign of further libraries. In Fig. 5.24 the in-
creasing size of the pharmacophore finger-
print from four consecutive Ugi libraries is
illustrated, together with the distribution of
pharmacophoric features in MDDR pharma-
cophores that had not been matched.
Another example for GPCR library design
N- N MDDR 140603 is the use of BCUT metrics as the basis for
target class-focused approaches to accelerated
@ "Privileged feature @ Hydrophobic feature
drug discovery. A particularly interesting ex-
@ Aromatic ring centroid
ample is work done by Wang and Saunders
Total 4-point pharrnacophores:3601 (32e,i) at Neurocrine Bioscience in their effort
-with "privileged" feature: 1569 to discover novel nonpeptidic ligands for a par-
(using 10 distance ranges)
ticular member (GPCR-1) of the GPCR-PA+
Figure 5.22. Example of pharmacophore feature family of receptors activated by peptides car-
assignments involving the biphenyl tetrazole "priv- rying an obligatory positive charge. They and
ileged" substructure and the total four-point poten- their colleagues performed a thorough search
tial pharmacophores calculated for a GPCR antago- of the literature and identified a few hundred
nist. Note that just the subset (40%) of the total ligands of the various members of the GPCR-
pharmacophores that contained the "privileged" PA+ family. Knowing that it is usually not
substructure was used for the library design. useful to follow up hits or leads showing very
poor affinity, they eliminated ligands with less
(4 New privileged 4-point pharmacophores

added for products from each reagent selection
20,000
Figure 5.23. (a and b) Contribu-
10,000
tions per acid reactant of pharma-
cophores for optimization in the 0
U; reaction (with biphenyl tetra- 1 3 5 7 9 11 13 15 17 19 21
zole as the "privileged" motif at
the amine position). The order
f
Reagent 1
t
12
f
22
shown is the final selected order of
reactants, based on obtaining the (b) Cumulative total of 4-point pharmacophores
maximum number of new privi- after each reagent selection
leged pharmacophores per addi-
tional reactant. Histogram a
shows the number of new phar-
macophores added by each new
selected reactant in the "privi-
leged" pharmacophoric space de-
fined by known GPCR compounds
containing the biphenyl tetrazole;
shown in histogram b is the
matching increase in the total
number of pharmacophores for
the library for each new selected
reactant. Reagent 1 12 22
5 Examlple Approaches 233
2,200,000
H-bond donor +
1,800,000
1,400,000
1,000,000
600,000
200,000
0
1 2 3 4
Library
Fi gure 5.24. On the left is shown the cumulative (black) total number of four-point pharmaco-
phores from consecutive 14,000 sets of Ugi libraries designed for 7-TM GPCR targets, together with
thc:total number of pharmacophores in each library (in gray). Note the diminishing yield of new
ph armacophores with later libraries, indicating that a change in strategy is needed. On the right are
sh1own the features present in the resultant unrepresented pharmacophores (i.e., found in 7-TM
GI'CR biphenyl tetrazole-containingcompounds in MDDR but not in synthesized libraries), indicat-
in$:a strategy change to include more acids and bases together with the biphenyl tetrazole.
than 3-00f l affinity for the corresponding re- PA+ receptors. This was done to further
ceptor.. Significantly, they also eliminated li- convince their colleagues, as explained below.
gands with better than 1 a affinity for the All 2000 compounds were screened for activity
comesiponding receptor. This very unusual against the GPCR-1 receptor. Those testing
step P(as taken in an effort to convince their positive were retested in a secondary, func-
collea;gues that the method they intended to tional assay. All but two compounds having
use was not reliant on knowing the answer better than 100 nM affinity for the GPCR-1
ahead of time. This left 187 ligands with affin- receptor are colored blue and/or are located
I
!
ities n~ostlybetween 10 and 70 for various within the blue oval. All but one compound
I memElers of the GPCR-PA+ family of recep- having better than 10 nM affinity for the
I
I
tors. 1Jsing these compounds, they perceived a
three-dimensional BCUT subspace within
GPCR-1 receptor are colored red and/or are
located within the red oval. All compounds
.
their Itorporate chemistry space that clusters with better than 2 nM affinity are colored
the ligands of individual members of the green and are located within the two small
GPCE:-PA+ family and appears to be appro- green ovals within the larger green oval, con-
priate for this target class. The positions of all sistent with the two crude clusters of GPCR-1
187 li;zands in the 3D chemistry space shown ligands seen in Fig. 5.25. The fact that these
in Fig .5.25 were originally indicated by open two small ovals each contain products from
cyan circles. All ligands of some but not all several different libraries (scaffolds) suggests
recept,ors were then color-coded as indicated. the possible existence of two binding modes
Many red GPCR-2 and yellow GPCR-4 ligands for this receptor. It is also significant to note
are hidden under the green GPCR-1 ligands. that, although the authors intentionally syn-
The gray oval provides a crude indication of thesized compounds within the entire region
the region of chemistry space of interest for of interest for GPCR-PA+ receptors, the only
GPCE:-PA+ receptors. compounds showing significant affinity for the
Figpre 5.26 indicates the positions of GPCR-1 receptor were located close to the
roughly 2000 Neurocrine compounds selected known GPCR-1 ligands (compare with Fig.
from 14 different combinatorial libraries 5.251, thus supporting the use of BCUT coor-
based on 14 different and proprietary scaf- dinates (on receptor-relevant axes) as a valid
folds. Rather than selecting compounds only approach to virtual high throughput screen-
near t he known ligands of GPCR-1, their re- ing. The tight clustering of GPCR-PA+ li-
ceptoro f interest, Wang and Saunders also se- gands in both figures clearly suggests that
lected compounds spanning the entire GPCR- BCUT metrics represent, albeit in a relatively
Figure 5.25. The 3D subspace most receptor relevant for members of the GPCR-PA+ family of
receptors. Points indicate coordinates of 187 published ligands of various GPCR-PA+ receptors.
Some have been color-coded by receptor for illustrative purposes.See Refs. 32e,i and text for further
details. See color insert.
crude fashion, the same sort of information as pharmacophoric methods to the design of fo-
would be represented in a description of the cused libraries was demonstrated in this case,
pharmacophore for the receptor of interest. where the aim was to design the library to-
ward a known lead or leads. The authors also
5.2 Property-Biased Design investigated the design of libraries with im-
The use of pharmacophoric descriptors in en- proved pharmacokinetic properties. Simple
hancing the hit-to-lead properties of lead opti- and rapidly computable descriptors applicable
mization libraries has been described (76). to the prediction of drug transport properties
Pharmacophore fingerprints, based on the were used, and the results illustrate a common
Chem-XIChemDiverse multiple pharmaco- problem: to obtain the best results it may be
phore descriptors, were used and several is- necessary to synthesize libraries in a noncom-
sues in the design of lead optimization librar- binatorial manner. A Monte Carlo search pro-
ies were addressed. The applicability of cedure was devised to enable the selection of a
Figure 5.26. The same 3D subspace as in Fig. 5.25, rotated slightly to provide a better viewing
perspective. Points indicate coordinates of about 2000 combinatorial products selected from 14
different libraries. Color-coding indicates affinity for the GPCR-1 receptor. See Refs. 32e,i and text for
further details. See color insert.
nelar-combinatorialsubset in which all library fingerprint analysis used to quantify the

mlembers satisfy the design criteria. By includ- added diversity gained by using two indepen-
ingcalculated log P, molecular weight, and po- dent synthetic routes.
lar surface area in the design of a combinato-
5.3 Site-Based Pharmacophores
rii11 library, it was shown that the compounds
with improved absorption characteristics (as Pharmacophore fingerprints generated from
de!terminedby experimental Caco-2 measure- complementary site points can be used to di-
m ents) could be obtained. rect combinatorial library design and to inves-
The use of computational methods such as tigate selectivity. An example of the pharma-
actant clustering and library profiling to cophore fingerprinting method for selectivity
aximize reactant diversity and optimize studies has been validated (37a,b) in studies of
iarmacokinetic parameters has been de- three closely related serine proteases: throm-
ribed (1321, with four-point pharmacophore bin, Factor Xa, and trypsin. Site points were
positioned in the active site of each protein The active site of the Factor Xa serine pro-
using the results of GRID (42) analyses (see tease (134) has been used for combinatorial
Fig. 5.5), and receptor-based four-point phar- library design (37c,d) using the DiR approach.
macophore fingerprints were generated. GRID analyses using probes for hydrogen
Fingerprints were also generated using full bond donors, acceptors, bases, acids, and hy-
conformational flexibility for some highly se- drophobes resulted in 23 complementary site
lective and potent thrombin and Factor Xa in- points being added (see Fig. 5.5). The shape of
hibitors. Receptor-based similarity was inves- the active site was defined using 162 protein
tigated as a function of common potential atoms. To ensure that a relevant area of the
three- and four-point pharmacophores for binding site was being explored (based on
each ligandreceptor pair. The results indi- knowledge of X-ray protein-ligand complexes),
cated that the use of just the common poten- site pharmacophores were forced to contain a
tial four-point pharmacophores could give in- hydrophobe or aromatic ring centroid point
formation pertaining to relative enzyme from both the S1 and S4 regions of the binding
selectivity; when three-point pharmacopliores pocket. By using this focused approach, a "di-
were used, however, poor resolution of en- versity" of matched site pharmacophores was
zyme selectivity was observed. The thrombin obtained, representing a sampling of "reason-
inhibitor thus exhibited greater similarity able" binding modes related to those experi-
with the complementary four-point pharma- mentally observed and, thus, presumably hav-
cophore fingerprints of the thrombin active ing a higher probability of giving rise to
sites than with the potential pharmacophore biological activity. This focused approach re-
keys generated from the other enzymes; a sim- duced the total number of site pharmaco-
ilar result was found for the Factor Xa inhibi- phores from 5393 to 775 [using the seven dis-
tors with the Factor Xa site. tance ranges setting (37a) and considering all
Clearly, the inclusion of the shape of the distances in the 1-15 A range]. The approach
binding site should improve the resolution, was validated by the identification of feasible
and the DiR (Design in Receptor) approach binding models (374, similar to that experi-
(133) refines the process, requiring that the mentally observed for a known Factor Xa in-
pharmacophoric match fits the shape of the hibitor. The Ugi four-component condensa-
target site (i.e., is sterically compatible with tion reaction (131)(see Fig. 5.20) was used fpr
the site). This clearly provides much addi- the study and is capable of producing very
tional information at the expense of greatly large numbers of different structures from
increased calculation time. Within the DiR ap- commercially available reactants. An example
proach, two-, three-, and four-point potential of the power of the method was given, whereby
site pharmacophores can be used. This pro- products were selected semimanually from a
vides interesting new library design possibili- small virtual library of 432 products (37c,d).
ties, in that it is possible to evaluate which Products were constructed from the four reac-
ligands are able to fit in the site by matching at tant sets: carboxylic acids (R, x 3), amines (R,
least one set of pharmacophoric features, and x 2), aldehydes (R, x 3), and isonitriles (R, x
to quantify which pharmacophore hypotheses 24). The pharmacophore-based site analysis
are matched. A subset of ligands can then be showed the optimum positions of substitution
designed that match as many different phar- and chain length for benzamidine-containing
macophoric hypotheses as possible, and the bi- fragments (targeted to the aspartate-contain-
ological screening of the resultant compounds ing S1 pocket) and the optimum lengths of
can determine which bind best. Alternatively, other hydrophobic reactants (targeted to the
pharmacophore constraints can be applied to a S4 pocket) to produce compounds that would
shape-driven searching approach, and Good et sample the maximum number of binding
al. (34) have shown the effectiveness of this modes. In this case the groups were always
with the DOCK virtual screening/docking ap- forced to be in the S1 and S4 pockets to main-
proach, in which the addition of pharmaco- tain "reasonable" binding modes, although
phore constraints improved both the enrich- this restriction could be excluded to probe
ment and speed of the process. even further potential binding modes. Thus,
References
as the identity of the matched site pharrna- toxicophores) space. Different targets and dif-
cophore(s)was known for each compound, tar- ferent expected routes of administration will
get site-based diversity of binding modes could require different constraints, and an element
be explored in the design process. An opti- of diversity (with constraints toward a drug-
mized selection of reactants was possible, and occupied chemical space) will remain impor-
the value to the design of reactants with dif- tant, to enable the most effective use of com-
ferent chain lengths could be evaluated. binatorial library chemistry and to discover
new leads for both established and new tar-
gets.
6 CONCLUSIONS AND FUTURE
DIRECTIONS
REFERENCES
Similarity and diversity metrics have been 1. M. Johnson and G. Maggiora, Concepts and Ap-
successfullyused for a variety of tasks, includ- plications ofMolecular Diversity, John Wiley &
ing virtual screening, subset selection, and Sons, New York, 1990.
combinatorial library design. Databases of vir- 2. (a) P. Willett, J. M. Barnard, and G. M. Downs,
tual compounds (e.g., from validated combina- J. Chem. Znf Comput. Sci., 38,983 (1998); (b)
torial chemistry protocols and reactants) can P. Willett, Curr. Opin. Biotechnol., 11, 85
be used for both virtual screening and library (2000); (c) P. Willett, Ed., Perspectives in Drug
design (virtual screening on virtual libraries Discovery Design, Vols. 718, Kluwer Academic,
with additional combinatorial constraints). Dordrecht/Norwell, MA, 1997; (d) J. S. Mason
The ability to exploit rapidly large virtual li- and M. A. Hermsmeier, Curr. Opin. Chem.
braries of compounds that could be made by Biol., 3,342 (1999).
validated combinatorial chemistry protocols 3. (a) J. M. Moore, Curr. Opin. Biotechnol., 10,54
provides very powerful virtual screening and (1999); (b) P. J. Hajduk, T. Gerfin, J.-M.
library design approaches. Future directions Boehlen, M. Haeberli, D. Marek, and S. W.
Fesik, J. Med. Chem., 42,2315 (1999); (c) C. A.
for library design will involve the application
Lepre, Drug Discovery Today, 6, 133 (2001);
of such approaches in a fully integrated fash- (d) J. Fejzo, C. A. Lepre, J. W. Peng, G. W.
ion (e.g., the ADEPT tool described in Section Bemis, Ajay, M. A. Murcko, and J. M. Moore,
4.10) and further enhancements to the con- Chem. Biol., 6, 755 (1999). .
aints necessary to achieve druglike com- 4. J. S. Mason and S. D. Pickett, Perspect. Drug
unds (e.g., 80% compliance to the Rule of 5, Discov. Des., 718,85 (1997).
redictive models for metabolism- and toxici-
5. (a) R. D. Brown, Perspect. Drug Discov. Des.,
-related issues). Where the goal is lead gen- 718, 31 (1997); (b) Y. C. Martin, R. D. Brown,
n (e.g., to enrich the compound screen- and M. G. Bures in E. M. Gordon and J. F.
e for high throughput screening), a focus Kenvin, Jr., Eds., Combinatorial Chemistry
be on target classes (gene families) of in- Molecular Diversity Drug Discovery, Wiley-
rest, and the generation of compounds with Liss, New York, 1998, pp. 369385; (c) M. G.
like properties, such as a lower molecular Bures and Y. C. Martin, Curr. Opin. Chem.
t. The move away from combinatorial Biol., 2, 376 (1998); (d) Y. C. Martin, M. G.
es to sparse arrays and noncombinato- Bures, and R. D. Brown, Pharm. Pharmacol.
cheny-picked) libraries (90) will con- Commun., 4, 147 (1998).
enabling more effective designs with 6. (a) D. K. Agrafiotis in P. v. R. Schleyer, N. L.
rol of associated properties. However, as Allinger, T. Clark, J. Gasteiger, P. A. Kollman,
e property constraints are applied to the H. F. Schaefer 111, and P. R. Schreiner, Eds.,
The Encyclopedia of Computational Chemis-
rary designs for leadlike/druglike proper-
try, Vol. 1, John Wiley & Sons, Chichester, UK,
es, the need to include positive design ele- 1998, pp. 742-761; (b) D. K. Agrafiotis andV. S.
ents to ensure good biological activity is em- Lobanov, J. Chem. Znf Comput. Sci., 39, 51
ed. The goal for drug discovery is thus (1999); (c) D. K. Agrafiotis, J. C. Myslik, and
tify targets and to generate compounds F. R. Salemme, Annu. Rep. Comb. Chem. Mol.
are at the intersection of chemical, biolog- Diversity, 2, 71 (1999); (d) D. K. Agrafiotis,
, and druglike property (e.g., absorption, V. S. Lobanov, D. N. Rassokhin, and S. Izrailev,
Methods Princ. Med. Chem., 10 (Virtual Quincy, M A , available from eduSoft L.C.
Screening for Bioactive Molecules), 265 (2000). http://www.eslc.vaviotech.com.
7 . R. A. Lewis, S. D. Pickett, and D. E. Clark in 21. (a) R. A. Lewis, J. S. Mason, and I. M. McLay,
K. B. Lipkowitz and D. B. Boyd, Eds., Reviews J. Chem. Inf. Comput. Sci., 37,599 (1997).
i n Computational Chemistry, Vol. 16, Wiley- 22. D. J. Cummins, C. W . Andrews, J. A. Bentley,
VCH, John Wiley & Sons, New York, 2000, pp. and M. J. Cory, J. Chem. Znf. Comput. Sci., 36,
1-5 1. 750 (1996).
8. D. C. Spellmeyer and P. D. J. Grootenhuis, 23. (a)W . G. Glen,W . J. Dunn 111, and D. R. Scott,
Annu. Rep. Med. Chem., 34, 287 (1999). Tetrahedron Comput. Methodol., 2,349 (1989);
9. H. Matter and M . Rarey in G. Jung, Ed., Com- ( b ) C. Cheng, G. Maggiora, M. Lajiness, and
binatorial Chemistry, Wiley-VCH Verlag M. J. Johnson, J. Chem. Znf. Comput. Sci., 36,
GmbH, Weinheim, Germany, 1999, pp. 409- 909 (1996);(c)M. Hassan, J. P. Bielawski, J. C.
439. Hempel, and M. Waldman, Mol. Div., 2, 64
10. Y . C. Martin, J. Comb. Chem., 3,231 (2001). (1996);( d )D. K. Agrafiotis, J. Chem. Znf. Com-
11. (a) R. S. Pearlman, Network Sci., 2, (617) put. Sci., 37,841 (1997).
(1996), available at: http://www.netsci.org/ 24. R. E. Cahart, D. H. Smith, and R. Venkat-
Science/Combichem/feature08.html; ( b ) R. S. araghavan, J. Chem. Znf. Comput. Sci., 25,64
Pearlman and K. M. Smith, Perspect. Drug (1985).
Discov. Des., 9, 3391355 (1998);(c)R. S. Pearl- 25. R. Nilakantan, N. Bauman, J. S. Dixon, and R.
man and K. M. Smith, Drugs Future, 23,885 Venkataraghavan, J. Chem. Inf. Comput. Sci.,
(1998); ( d ) R. S. Pearlman and K. M. Smith, 27, 82 (1987).
J. Chem. Znf. Comput. Sci., 39,28 (1999). 26. (a) S. K. Kearsley, S. Sallamack, E. M. Fluder,
12. J. S. Mason, A. C. Good, and E. J . Martin, Curr. J. D. Andose, R. T . Mosley, and R. P. Sheridan,
Pharm. Des., 7, 567 (2001). J. Chem. Znf. Comput. Sci., 36, 118 (1996);(b)
13. (a) MDL Information Systems Inc., San Lean- R. P. Sheridan, M. D. Miller, D. J. Underwood,
dro, CA, URL: http://www.mdli.com; ( b )C. A. and S. K. Kearsley, J. Chem. Znf. Comput. Sci.,
James, D.Weininger, and J. Delaney, Daylight 36,128 (1996);(c)R. P. Sheridan, J. Chem. Znf
Theory Manual, version 4.72, Daylight Chem- Comput. Sci., 40, 1456 (2000).
ical Information Systems, Inc., URL: http:// 27. (a)G. Schneider, W . Neidhart, T . Giller, and G.
www.daylight.com/dayhtml/doc/theory/theory. Schmid, Angew. Chem. Znt. Ed. Engl., 38,2894
toc.htrnl; (c) UNITYISLN manual available (1999); (b)G. Schneider, 0. Clement-Chomi-
from Tripos, Inc., 1699 South Hanley Road, enne, L. Hilfiger, P. Schneider, S. Kirsch, HIJ.
Suite 303, St. Louis, MO 63144, URL: http:// Bohm, and W . Neidhart, Angew. Chem. Znt.
www.tripos.com. Ed. Engl., 39,4130 (2000).
14. (a)R. D. Brown and Y . C. Martin, J. Chem. Znf. 28. D. Gorse and R. Lahana, Curr. Opin. Chem.
Comput. Sci., 37,1(1997);(b) R. D. Brown and Biol., 4,287 (2000).
Y . C. Martin, J. Chem. Znf. Comput. Sci., 36,
29. C. A. Lipinski, F. Lombardo, B. W . Dominy,
572 (1996).
and P. J. Feeney, Adv. Drug Deliv. Rev., 23, 3
15. D. R. Flower, J. Chem. Znf. Comput. Sci., 38, (1997).
379 (1998).
30. F. R. Burden, J. Chem. Znf. Comput. Sci., 29,
16. M. RandiE, J. Am. Chem. Soc., 97,6609 (1975). 225 (1989).
17. (a)L. B. Kier, L. H. Hall,W . J. Murray, and M. 31. DiverseSolutions was developed by R. S. Pearl-
RandiC, J. Pharm. Sci., 64, 1971 (1975); ( b ) man and K. M. Smith at the University of
L. B. Kier, L. H. Hall, and W . J. Murray, Texas, Austin, and is distributed by Tripos,
J. Pharm. Sci., 64,1974 (1975). Inc., St. Louis, MO.
18. L. H. Hall and L. B. Kier, J. Mol. Graph. Mod- 32. (a) H. Gao, J. Chem. Znf. Comput. Sci., 41,402
ell., 20, 4 (2001). (2001);( b ) D. Stanton, J. Chem. Znf. Comput.
19. M. RandiC, J. Mol. Graph. Modell., 20, 19 Sci., 3 9 , l l (1999);(c)D. Schnur, J. Chem. Znf
(2001). Comput. Sci., 39,36 (1999);(c)D. Schnur and
20. (a) L. H. Hall and L. B. Kier in K. B. Lipkowitz P. Venkatarangan i n A. K. Ghose and V . N .
and D. B. Boyd, Eds., Reviews in Computa- Viswanadhan, Eds., Combinatorial Library
tional Chemistry,Vol. 2,VCH Publishers, New Design and Evaluation, Marcel Dekker, New
York, 1991, pp. 367-422; ( b ) MOLCONN-Z, York, 2001, pp. 473-501; ( d ) B. R. Beno and
Hall Associates Consulting, 2 Davis Street, J. S. Mason, Drug Discovery Today, 6, 251
References
(2001); (e) X.-C. Wang and J. Saunders, Ab- DBprez, J. Med. Chem., 44,3378 (2001); (b) R.
stracts of Papers, 222nd ACS National Meet- Poulain, D. Horvath, B. Bonnet, C. Eckhoff, B.
ing, Chicago, IL, August 26-30, 2001 (2001), Chapelain, M.-C. Bodinier, and B. DBprez,
MEDI-012; (f) E. L. Stewart, P. J. Brown, J. A. J. Med. Chem., 44, 3391 (2001).
Bentley, and T. M. Willson, Abstracts of Pa- 42. (a) P. J. Goodford, J. Med. Chem., 28, 849
pers, 222nd ACS National Meeting, Chicago, (1985); (b) D. N. A. Bobbyer, P. J. Goodford,
IL, August 2640,2001 (2001), COMP-182; (g) and P. M. McWhinnie, J.Med. Chem., 32,1083
Y. Gao andV. Goodfellow, Abstracts of Papers, (1989); (c) The GRID program is developed and
221st ACS National Meeting, San Diego, CA, distributed by Molecular Discovery Ltd.
2001 (2001) MEDI-235; (h) X.4. Wang and J.
Saunders, Abstracts of Papers, 221st ACS Na- 43. E. J. Martin and T. J. Hoeffel, J. Mol. Graph.
tional Meeting, San Diego, CA, 2001 (20011, Modell., 18,383 (2000).
MEDI-207; (i) J. Saunders, Proceedings of the 44. A. R. Leach, D. V. S. Green, M. M. Hann, D. B.
IBC Conference on Drug Discovery by Design, Judd, and A. C. Good, J. Chem. Znf. Comput.
Boston, MA, November 5-8,2001; (j) B. Pirard Sci., 40, 1262 (2000).
and S. D. Pickett, J. Chem. Znf: Comput. Sci., 45. (a) K. A. Andrews and R. D. Cramer, J. Med.
40,1431 (2000) . Chem., 43,1723 (2000); (b) R. D. Cramer, M. A.
33. (a) A. C. Good and J. S. Mason, Reviews in Poss, M. A. Hermsmeier, T. J. Caulfield, M. C.
Computational Chemistry, Vol. 7, VCH, New Kowala, and M. T. Valentine, J. Med. Chem.,
York, 1995, pp. 67-127; (b) G. W. A. Milne, 42,3919 (1999); (c) R. D. Cramer, R. D. Clark,
M. C. Nicklaus, and S. Wang, SAR QSAR En- D. E. Patterson, and A. M. Ferguson, J. Med.
viron. Res., 9 , 2 3 (1998); (c) W. A. Warr and P. Chem., 39,3060 (1996).
Willett, Design of Bioactive Molecules, Arneri- 46. (a) H. Matter and T. Potter, J. Chem. Znf Com-
can Chemical Society, Washington, DC, 1998, put. Sci., 39, 1211 (1999); (b) V. J . Van Geer-
pp. 73-95. estein, H. Hamersma, and S. P. Van Helden in
34. A. C. Good, J . S. Mason, and S. D. Pickett, H. Van de Waterbeemd, B. Testa, and G. Folk-
Methods Princ. Med. Chem., 10 (Virtual ers, Eds., Computer-Assisted Lead Finding
Screening for Bioactive Molecules), 131 (2000). and Optimization, Verlag Helvetica Chimica
35. S. D. Pickett, J. S. Mason, and I. M. McLay, Acta, Basel, Switzerland, 1997, pp. 159-178.
J. Chem. Inf: Comput. Sci., 36,1214 (1996). 47. M. Hahn, J. Chem. Znf: Comput. Sci., 37, 80
36. E. K. Davies in I. M. Chaiken and K. D. Janda, (1997).
Eds., Molecular Diversity a n d Combinatorial
48. 0. F. Giiner, M. Waldman, R. Hoffmann, and
Chemistry: Libraries and Drug Discovery,
J.-H. Kim in 0. F. Giiner, Ed., Pharmacophore
American Chemical Society, Washington, DC,
Perception, Development and Use in Drug De-
1996, pp. 309-316.
sign, International University Line, La Jolla,
37. (a) J. S. Mason, I. Morize, P. R. Menard, D. L. CA, 2000, p. 213.
Cheney, C. R. Hulme, and R. F. Labaudiniere,
J.Med. Chem., 42,3251 (1999); (b) J. S. Mason 49. R. Carbo, L. Leyda, and M. Arnau, Znt. J.
and D. L. Cheney, Proc. Pac. Symp. Biocom- Quantum Chem., 17,1185 (1980).
put., 4, 456 (1999); (c) J. S. Mason, and D. L. 50. A. C. Good, E. E. Hodgkin, and W. G. Richards,
Cheney, Proc. Pac. Symp. Biocomput., 6, 576 J. Chem. Inf: Comput. Sci., 32,188 (1992).
(2000); (dl J. S. Mason and B. R. Beno, J.Mol. 51. (a)D. J. Wild and P. Willett, J. Chem. Znf: Com-
Graph. Modell., 18, 438 (2000). put. Sci., 36, 159 (1996); (b) D. A. Thorner,
. (a)M. J. McGregor and S. M. Muskal, J. Chem. D. J. Wild, P. Willett, and P. M. Wright,
Znf: Comput. Sci., 39, 569 (1999); (b) M. J. J. Chem. Znf: Comput. Sci., 36,900 (1996).
McGregor, and S. M. Muskal, J. Chem. Znf 52. A. Schuffenhauer, V. Gillet, and P. Willett,
Comput. Sci., 40, 117 (2000). J. Chem. Znf: Comput. Sci., 40, 295 (2000).
. E. K. Bradley, P. Beroza, J. E. Penzotti, P. D. J. 53. R. D. Cramer 111, D. E. Patterson, and J. D.
Grootenhuis, D. Spellmeyer, and J. L. Miller,
J. Med. Chem., 43,2770 (2000). Bunce, J. Am. Chem. Soc., 110, 5959 (1988).
0. D. Horvath in A. K. Ghose and V. N. Viswa- 54. G. Cruciani, P. Crivori, P.-A. Carrupt, and B.
nadhan, Eds., Combinatorial Library Design Testa, THEOCHEM, 603, 17 (2000).
and Evaluation, Marcel Dekker, New York, 55. (a)W. Guba and G. Cruciani in K. Guberrtofte
2001, pp. 429-472. and F. S. Jorgensen, Eds., Molecular Modeling
. (a) R. Poulain, D. Horvath, B. Bonnet, C. Eck- a n d Prediction of Bioreactivity, Plenum, New
hoff, B. Chapelain, M.-C. Bodinier, and B. York, 2000, pp. 89-95; (b) P. Crivori, G. Cru-
ciani, P.-A. Carrupt, and B. Testa, J. Med. D. B. Turner, S. M. Tyrrell, and P. Willett,
Chem., 43,2204 (2000). J. Chem. Inf. Comput. Sci., 37,18 (1997).
56. M. Pastor, G. Cruciani, I. McLay, S. Pickett, 70. S. D. Pickett, C. Luttmann, V . Guerin, A.
and S. Clementi, J. Med. Ckern., 43, 3233 Laoui, and E. James, J. Chem. Inf. Comput.
(2000). Sci., 38, 144 (1998).
57. E. J. Martin, J. M. Blaney, M. A. Saini, D. C. 71. J. Mount, J. Ruppert, W . Welch, and A. Jain,
Spellmeyer, A. K. Wong, and W . H. Moos, J. Med. Chem., 42, 60 (1999).
J. Med. Chem., 38, 1431 (1995). 72. M. Snarey, N. K. Terrett, P. Willett, and D. J.
Wilton, J. Mol. Graph., 15, 372 (1997).
58. C. A. James, D. Weininger, and J. Delaney,
Fingerprints-Screening and Similarity, Day- 73. M. Waldman, H. Li, and M. Hasan, J. Mol.
light Theory Manual v4.72, Daylight Chemical Graph. Modell., 18,412 (2000).
Information Systems, Inc., URL: http://www. 74. M. Hann, B. Hudson, X . Lewell, R. Lifely, L.
daylight.com/dayhtml/doc/theory/theory.toc. Miller, and N. Ramsden, J. Chern. Inf. Comput.
html. Sci., 39, 897 (1999).
59. (a)P. R. Menard, J. S. Mason, I. Morize, and S. 75. (a) D. E. Clark and S. D. Pickett, Drug Discov-
Bauerschmidt, J. Chem. Inf. Cornput. Sci., 38, ery Today, 5, 49 (2000);( b ) P. J. Eddershaw,
1204 (1998);(b) P. R. Menard, R. A. Lewis, and A. P. Beresford, and M. K. Bayliss, Drug Dis-
J. S. Mason, J. Chem. Inf. Comput. Sci., 38,497 covery Today, 5,409-414 (2000);(c)H. van de
(1998). Waterbeemd, D. A. Smith, K. Beaumont, and
D. K. Walker, J. Med. Chem., 44, 1313 (2001).
60. P. Willett, Similarity and Clustering in Chem-
76. S. D. Pickett, D. E. Clark, and I. M. McLay,
ical Information Systems, Research Studies
Press, Letchworth, UK, 1987. J. Chem. Inf. Comput. Sci., 40,263 (2000).
77. D. E. Clark, J. Pharrn. Sci., 88,807 (1999).
61. R. A. Jarvis and E. A. Patrick, ZEEE Trans.
78. (a)A. C. Good and R. A. Lewis, J. Med. Chem.,
Cornput., C-22,1025 (1973).
40, 3926 (1997); (b)R. A. Lewis, A. C. Good,
62. (a) P. Willett,V . Winterman, and D. Bawden, and S. D. Pickett in H. V a n de Waterbeemd, B.
J. Chem. Znf.Comput. Sci., 26,109 (1986);( b ) Testa, and G. Folkers, Eds., Computer-As-
J . B. Dunbar, Perspect. Drug Discou. Des., 718, sisted Lead Finding and Optimization,Verlag
51 (1997). Helvetica Chimica Acta, Basel, Switzerland,
63. T . N. Doman, J. M. Cibulskis, M. J. Cibulskis, 1997, pp. 135-156; (c)V . J. Gillet, P. Willet, J.
P. D. McCray, and D. P. Spangler, J. Chem. Inf. Bradshaw, and D. V . S. Green, J. Chem. Znf:
Comput. Sci., 36, 1195 (1996). Comput. Sci., 39, 169 (1999).
64. (a) J. M. Barnard and G. M. Downs, J. Chem. 79. G. Grassy, A. Yasri, R. Lahana, J. Woo, S. iyer,
Inf. Comput. Sci., 37,141 (1997);( b )J. M. Bar- M. Kaczorek, R. Folc'h, and R. Buelow, Nat.
nard, and G. M. Downs, Perspect. Drug Discov. Biotechnol., 16, 748 (1998).
Des., 718, 13 (1997). 80. J . S. Mason i n P. M. Dean and R. A. Lewis,
65. G. M. Downs, P. Willett, and W . Fisanick, Eds., Molecular Diversity Drug Design, Klu-
J. Chern. Inf. Comput. Sci., 34, 1094 (1994). wer Academic, Dordrecht, Netherlands, 1999,
pp. 67-91.
66. (a) M. Lajiness, Perspect. Drug Discov. Des.,
81. (a) A. C. Good and I. D. Kuntz, J. Camput.-
7/8,55 (1997);( b )V . J. Gillet and P. Willett i n
Aided Mol. Des., 9 , 373 (1995);( b ) M. J. Ash-
A. K. Ghose and V . N. Viswanadhan, Eds.,
ton, M . Jaye, and J. S. Mason, Drug Discovery
Combinatorial Library Design and Eualua-
Today, 1, 71 (1996).
tion, Marcel Dekker, New York, 2001, pp. 379-
398; (c)R. D. Clark, J. Chem. Inf:Comput. Sci., 82. J. H . V a n Drie and R. A. Nugent, SAR QSAR
37, 1181 (1997); (dl T . Potter and H. Matter, Enuiron. Res., 9 , 1 (1998).
J. Med. Chem., 41,478 (1998). 83. A. C. Good, Internet J. Chem., 3 (20001,http://
www.ijc.com/article/2OOOv3/9/.
67. B. D. Hudson, R. M . Hyde, E. Rahr, and J.
84. D. E. Patterson, R. D. Cramer, A. M. Ferguson,
Wood, Quant. Struct.-Act. Relat., 15, 285
R. D. Clark, and L. E. Weinberger, J. Med.
(1996).
Chem., 39,3049 (1996).
68. R. E. Higgs, K. G. Bemis, I. A. Watson, and 85. A. C. Good, J. S. Mason, D. V . S. Green, and
J . H.Wikel, J. Chern. Inf. Comput. Sci., 37,861 A. R. Leach in A. K. Ghose and V . N. Viswa-
(1997). nadhan, Eds., Combinatorial Library Design
69. (a)J. D. Holliday, S. S. Ranade, and P. Willett, and Evaluation, Marcel Dekker, New York,
Quant. Struct.-Act. Relat., 14, 501 (1995); ( b ) 2001, pp. 399-428.
rences
C. D. Eldred and B. D. Judkins, Prog. Med. D. E . Clark, Ed., Evolutionary Algorithms in

Chem., 36,29 (1999). Molecular Design, Wiley-VCH, New York,
(a)V . J. Gillet, P. Willett, and J. Bradshaw, 2000.
J. Chem. Znf. Comput. Sci., 38,165 (1998);( b ) R. P. Sheridan and S. K. Kearsley, J. Chem.
E. A. Jamois, M. Hassan, and M. Waldman, Znf. Comput. Sci., 35, 310 (1995).
J. Chem. Znf. Comput. Sci., 40, 63 (2000);(c)
R. S. Pearlman and K. M. Smith, Symposium J. Singh, M. A. Ator, E. P. Taeger, M . P. Allen,
on Combinatorial Chemistry, Abstracts of Pa- D. A. Whipple, J. E. Soloweji, S. Chowdhray,
pers, 217th ACS National Meeting, Anaheim, and A. M. Treasurywala, J . Am. Chem. Soc.,
CA, March 1999, MEDI-012. 118, 1669 (1996).
(a)J. Sadowski i n A. K. Ghose and V . N. Viswa- R. D. Brown and Y . C. Martin, J. Med. Chem.,
nadhan, Eds., Combinatorial Library Design 40,2304 (1997).
and Evaluation, Marcel Dekker, New York, (a)C. M. Fonseca and P. J. Fleming in S. For-
2001, pp. 291-300; ( b ) A. K. Ghose, V . N. rest, Ed., Genetic Algorithms: Proceedings of
Viswanadhan, and J. J. Wendoloski, J . Comb. the Fifth International Conference, Morgan
Chem., 1, 55 (1999); (c) Ajay, W . P. Walters, Kaufmann, San Mateo, CA, 1993, pp.
and M . A. Murcko, J. Med. Chem., 41, 3314 416-423; ( b )C. M. Fonseca and P. J. Fleming
(1998);( d )J. Sadowski and H . Kubinyi, J. Med. in K. De Jong, Ed., Evolutionary Computation,
Chem., 41,3325 (1998). Vol. 3, The Massachusetts Institute of Tech-
T . Mitchell and G. A. Showell, Curr. Opin. nology, Cambridge, M A , 1995, pp. 1-16; (c)
Drug Discov. Dev., 4, 314 (2001). V . J. Gillet, W . Khatib, P. Willett, P. Fleming,
J . R. Everett, M. Gardner, F. Pullen, G. F. and D.V . S. Green, J . Chem. Znf. Comput. Sci.,
Smith, M. Snarey, and N . Terrett, Drug Dis- 42,375-385 (2002).
covery Today, 6 , 779 (2001). A. Polinsky, R. D. Feinstein, S. Shi, and A.
(a) J. H. V a n Drie and M. S. Lajiness, Drug Kuki in I. M. Chaiken and K. D. Janda, Eds.,
Discovery Today, 3,274 (1998);( b )R. A. Lewis Molecular Diversity and Combinatorial Chem-
in P. M . Dean and R. A. Lewis, Eds., Molecular istry: Libraries and Drug Discovery, American
Diversity Drug Design, Kluwer Academic, Dor- Chemical Society, Washington, DC, 1996, pp.
drecht, Netherlands, 1999, pp. 221-248 and 219-232.
other chapters therein. D. S. Thorpe, A. E. Chan,A. Binnie, L. C. Chen,
A. Robinson, J. Spoonamore, D. Rodwell, S.
G. Jones, P. Willett, R. Glen, A. Leach, and R.
Taylor in A. Parill and M . Reddy, Eds., Ratio- Wade, S. Wilson, M. Ackerman-Berrier, H. .
nal Drug Design, ACS Symposium Series 719, Yeoman, S. Walle, Q. W u , and K. F. Wertman,
American Chemical Society, Washington, DC, Biochem. Biophys. Res. Commun., 266, 62
1999. (1999).
R. T . Koehler and H. 0 . Villar, J. Comput. (a)G. M. Downs and J. M. Barnard, J. Chem.
Chem., 21, 1145 (2000). Znf.Comput. Sci., 37, 59 (1997);( b )J. M. Bar-
nard, G. M. Downs, A. von Scholley-Pfab, and
R. D. Brown, M. Hassan, and M. Waldman, J. R. D. Brown, J. Mol. Graph. Modell., 18, 452
Mol. Graph. Modell., 18, 427 (2000). (2000);(c)G. M . Downs and J. M. Barnard, 2nd
(a)S. J. Teague, A. M. Davis, P. D. Leeson, and Joint Sheffield Conference on Chemoinformat-
T. Oprea, Angew. Chem. Znt. Ed. Engl., 38, ics: Computational Tools for Lead Discovery,
3743 (1999);( b )T . Oprea, et al., J. Chem. Znf. April 9-11, 2001. Presentation available at:
Comput. Sci., 41, 1308 (2001). http://cisrg.shef.ac.uk/shef200l/talks/downs.
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, ppt; (dl Accelerys, San Diego, C A (previously
Science, 220, 671 (1983). MSI, Molecular Simulations Inc.), http://
www.accelrys.com.
(a) L. Weber, Drug Discovery Today, 3, 379
(1998);( b )W . H. Press, S. A. Teukolsky, W . T . D. K. Agrafiotis and V . S. Lobanov, 2nd Joint
Vetterling, and B. P. Flannery, The Art of Sci- Sheffield Conference on Chemoinformatics:
entific Computing, Cambridge University Computational Tools for Lead Discovery, April
Press, New York, 1984, pp. 444-455; (c)D. K. 9-11, 2001. Abstract available at: http://cisrg.
Agrafiotis,J . Chem. Znf. Comput. Sci., 37, 576 shef.ac.uWshef2001/abstracts.htm.
(1997);( d )W . Zheng, S. J. Cho, C. L. Waller, N. E. Shemetulskis, J. B. Dunbar, Jr., B. W .
and A. Tropsha, J. Chem. Znf. Comput. Sci., 39, Dunbar, D. W . Moreland, and C. Humblet,
738 (1999). J. Cornput.-Aided Mol. Des., 9,407 (1995).
108. J. H. Voigt, B. Bienfait, S. Wang, and M. C. 124. E. K. Kick, D. C. Roe, A. G. Skillman, G. Lin,
Nicklaus, J. Chem. Znf. Comput. Sci., 41, 702 T . J. A. Ewing, Y . Sun, I. Kuntz, and J. A.
(2001). Ellman, Chem. Biol., 4,297 (1997).
109. M. J. McGregor and P. V . Pallai, J. Chem. Znf. 125. D. C. Roe in P. M. Dean and R. A. Lewis, Eds.,
Comput. Sci., 37, 443 (1997). Molecular Diversity Drug Design, Kluwer Ac-
110. (a)R. B. Nilakantan, N. Bauman, K. S. Haraki, ademic, Dordrecht, Netherlands, 1999, pp.
and R. J. Venkataraghavan, J. Chem. In6 141-173.
Comput. Sci., 30,65 (1990);( b )R. B. Nilakan- 126. (a) I. D. Kuntz, J. M. Blaney, S. J. Oatley, R.
tan, N . Bauman, and K. S. Haraki, J. Cornput.- Langridge, and T . E. Ferrin, J. Mol. Biol., 161,
Aided Mol. Des., 11,447 (1997). 269 (1982); DOCK is developed and distrib-
111. E. J. Martin and R. E. Critchlow, J. Comb. uted by the Kuntz Group, Dept. o f Pharmaceu-
Chem., 1,32 (1999). tical Chemistry, 512 Parnassus, University of
112. (a)M. J. McGregor and S. M. Muskal, J. Chem. California, San Francisco, C A 94143-0446,
Inf. Comput. Sci., 39, 569 (1999); ( b ) M. J. URL: http://www.cmpharm.ucsf.edu/kuntz;
McGregor and S. M. Muskal, J. Chem. Znf. ( b )T . J. A. Ewing and I. D. Kuntz, J. Comput.
Comput. Sci., 40, 117 (2000). Chem., 18, 1175 (1997); (c)Y . Sun, T . J. A.
Ewing, A. G. Skillman, and I. D. Kuntz,
113. B. A. Hopkins, Ann. Bot., 18,213 (1954). J. Cornput.-Aided Mol. Des., 12,597 (1998).
114. (a)T . R. Hagadone and M. S. Lajiness, Tetra- 127. (a) S. F. Brady, et al., J. Med. Chem., 41, 401
hedron Comput. Methodol., 1, 219 (1988);( b ) (1998);( b ) H. J. Bohm, D. W . Banner, and L.
T . R. Hagadone, J. Chem. Znf. Comput. Sci., Weber, J. Cornput.-Aided Mol. Des., 13, 51
32,515 (1992). (1999).
115. A. Gobbi, D. Poppinger, and B. Rohde, Per-
128. (a)A. Rockwell, M. Melden, R. A. Copeland, K.
spect. Drug Discov. Des., 718, 131 (1997).
Hardman, C. P. Decicco, and W . F. DeGrado,
116. A. R. Leach, J. Bradshaw, D. V . S. Green, and J. Am. Chem. Soc., 118,10337 (1996);( b )A. K.
M. M. Hann, J. Chem. Inf. Comput. Sci., 39, Szardenings, D. Harris, S. Lam, L. Shi, D.
1161 (1999). Tien,Y .Wang, D.V . Patel, M. Navre, a n d D. A.
117. G. Bravi, D.V . S. Green, M. M. Hann, and A. R. Campbell, J. Med. Chem., 41,2194 (1998).
Leach, J. Chem. Znf. Comput. Sci., 40, 1441 129. D. Tondi, U . Slomiczynska, M. P. Costi, D. M.
(2000). Watterson, S. Ghelli, and B. K. Shoichet,
118. (a)W . P. Walters, Presented at the Daylight Chem. Biol., 6 , 319 (1999).
User Group Meeting, MUG'99.1999. Available
online at http://www.daylight.com/meetings/
130. M. L. Lamb, K. W . Burdick, S. Toba, M. w.
Young, A. G. Skillman, X . Zou, J. R. Arnold,
mug991Walters/index.html;( b )W . P. Walters,
and I. D. Kuntz, Proteins Struct. Funct. Genet.,
M. T . Stahl, and M. A. Murcko, Drug Discovery
42,296 (2001).
Today, 3,160 (1998).
119. M. Stahl, Methods Princ. Med. Chem., 10 (Vir- 131. I. Ugi and C. Steinbruckner, Chem. Ber., 94,
tual Screening for Bioactive Molecules), 229 734 (1961).
(2000). 132. T . F. Herpin, G. C. Morton, A. K. Dunn, C.
120. H. J. Bohm and M. Stahl, Curr. Opin. Chem. Fillon, P. R. Menard, S. Y . Tang, J. M. Salvino,
Biol., 4, 283 (2000). and R. F. Labaudiniere, Mol. Diversity, 4, 221
(2000).
121. A. R. Leach, R. A. Bryce, and A. J. Robinson, J.
Mol. Graph. Modell., 18, 358 (2000). 133. C. M. Murray and S. J. Cato, J. Chem. Inf.
122. R. P. Sheridan, S. G. SanFeliciano, and S. K. Comput. Sci., 39,46 (1999).
Kearsley, J. Mol. Graph. Modell., 18, 320 134. A. Tulinsky, K. Padmanbhan, K. P. Padmanb-
(2000). han, C. H. Park,W . Bode, R. Huber, D. T . Blan-
123. D. Tondi and M. P. Costi i n A. K. Ghose and kenship, A. D. Cardin, and W . Kisiel, J. Mol.
V . N. Viswanadhan, Eds., Combinatorial Li- Biol., 232,947 (1993).
brary Design and Evaluation, Marcel Dekker, 135. M. Rarey and M. Stahl, J. Cornput.-Aided Mol.
New York, 2001, pp. 563-603. Des., 15,497-520 (2001).
CHAPTER SIX
Virtual Screening
INGOMUEGGE
ISWANENYEDY
Bayer Research Center
West Haven, Connecticut
Contents
1 Introduction, 244
2 Concepts of Virtual Screening, 244
2.1 Druglikeness Screening, 245
2.1.1 Counting Schemes, 245
2.1.2 Functional Group Filters, 246
2.1.3 Topological Drug Classification, 247
2.1.3.1 Artificial Neural Networks and
Decision Trees, 247
2.1.3.2 Structural Frameworks and Side
Chains of Known Drugs, 248
2.1.4 Pharmacophore Point Filter, 249
2.2 Focused Screening Libraries for Lead
Identification, 250
2.2.1 Targeting Protein Families, 251
2.2.2 Privileged Structures, 251 .
2.3 Pharmacophore Screening, 252
2.3.1 Introduction to Pharmacophores, 252
2.3.2 Databases of Organic Compounds, 254
2.3.3 2D Pharmacophore Searching, 255
2.3.4 3D Pharmacophores, 255
2.3.4.1 ~ i ~ a n d - ~ aPharmacophore
sed
Generation, 255
2.3.4.2 Manual Pharmacophore
Generation, 256
2.3.4.3 Automatic Pharmacophore
Generation, 256
2.3.4.4 Receptor-Based Pharmacophore
Generation, 259
2.3.5 Pharmacophore-Based Virtual
Screening, 259
Structure-Based Virtual Screening, 260
2.4.1 Protein Structures, 261
2.4.2 Computational Protein-Ligand
Docking Techniques, 262
2.4.2.1 Rigid Docking, 262
Chemistry and Drug 19isco17ery 2.4.2.2 Flexible Ligands, 263
me 1: Drug Discovery 2.4.3 Scaring of Protein-Ligand Interactions,
Abraham 264
0 2003 John Wiley & Sons.. Inc. 2.4.3.1 Force Field (FF)Scoring, 264
Virtual Screening
2.4.3.2 Empirical Scoring, 264 3.2 Discovery of Novel Matriptase Inhibitors

2.4.3.3 Knowledge-Based Scoring, 264 through Structure-Based 3D Database
2.4.3.4 Consensus Scoring, 265 Screening, 269
2.4.4 Docking as Virtual Screening Tool, 266 4 Conclusions, 271
2.5 Filter Cascade, 267 5 Acknowledgments, 274
3 Applications, 267
3.1 Identification of Novel DAT Inhibitors
through 3D Pharmacophore-Based
Database Search, 267
1 INTRODUCTION sulted in a 1arge number of novel putative

drug targets. Improved screening techniques
Virtual screening, sometimes also called in also make it possible to look at entire gene
silico screening, is a new branch of medicinal families, at orphan targets, or at otherwise un-
chemistry that. represents a fast and cost- characterized putative drug targets. In this
effective tool for computationally screening environment of data explosion, rational design
compound databases in search for novel drug techniques have experienced a comeback (9).
leads. The roots for virtual screening go back Although the exponentially growing number
to structure-based drug design and molecular of solved protein structures at high resolution
modeling. In the 1970s researchers hoped to makes it possible to embark on structure-
find novel drugs designed rationally using a based design for many drug targets, virtual
fast growing number of diverse protein struc- screening-the computational counterpart to
tures being solved by X-ray crystallography (1, high throughput screening-has become a
2) or nuclear magnetic resonance (NMR) spec- particularly successful computational tool for
troscopy (3). However, only very few drugs lead finding in drug discovery. Whereas pro-
have resulted from those early efforts. Exam-
"
prietary screening libraries typically hold
ples include captopril as angiotensin-convert- about lo6 compounds, this is only a tiny frac-
ing enzyme inhibitor (4) and methotrexate as tion of the conceivable chemical space for
dihvdrofolate
" reductase inhibitor (5).
. . The rea- which estimates range between lo6' and 10100
sons for this somewhat disappointing drug compounds (10,ll). The question is, of cou&e,
yield lie in the low resolution of the protein which subset of this enormous space should be
structures as well as limitations in computer synthesized and screened? Virtual screening
vower and methods. Researchers have often attempts to answer this question by evaluat-
tried de n o w to design the final drug candidate ing large virtual libraries of up to 1012 com-
on the computer screen. The compounds sug- pounds through the use of a cascade of various
gested have often been difficult to synthesize; screening tools to reduce the chemical space.
initial failure in exhibiting potency has often This chapter describes the different concepts
resulted in the termination of structure-based and tools used today for virtual screening.
projects. At the end of the 1980s rational drug They reach from the assessment of the overall
design techniques became somewhat discred- "druglikeness" of a small organic molecule to
ited because of the high failure rate in drug its ability to specifically bind to a given drug
discovery projects. target. The interested reader is also referred
In the 1990s drastic changes occurred in to a selection of recent books and reviews on
the way drugs are discovered in the pharma- the subject of virtual screening (10, 12-18).
ceutical industry. High throughput synthesis
(6, 7) and screening techniques (8) changed
the lead identification process that is now gov- 2 CONCEPTS O F VIRTUAL SCREENING
erned not only by large numbers of com-
pounds processed but also by fast prosecution The basic goal of virtual screening is the re-
of many putative drug targets in parallel. The duction of the enormous virtual chemical
characterization of the human genome has re- space of small organic molecules, to synthesize
2 Concepts of Virtual Screening 245
andlor screen against a specific target protein, Table 6.1 Mica1 Ranges for Parameters
to a manageable number of compounds that to
exhibit the highest chance to lead to a drug Parameter Minimum Maximum
candidate (10, 19). The major sources of infor- LogP -2 5
mation to guide virtual screening for a partic- Molecular weight 200 500
ular target are derived from the following Hydrogen bond acceptors 0 10
questions: Hydrogen bond donors 0 5
Molar refractivity 40 130
1. What does a drug look like in general? Rotatable bonds 0 8
Heavy atoms 20 70
2. What is known about compounds that in- polar surface area [A2] o 120
teract with the receptor? Net charge -2 +2
3. What is known about the structure of the "Data taken from ref. 21.
target protein and the protein-ligand
interactions?
2.1 .I Counting Schemes. Database collec-
In the following subsections we address tions of known drugs [e.g., CMC (301, WDI
these three points, outlining concepts of as- (311, or MDDR (3211 are typically used to ex-
sessing the overall druglikenss of molecules, tract knowledge about structure and proper-
the concentration of subsets of molecules in ties of potential drug molecules. Key physico-
focused libraries, and the identification of spe- chemical properties such as molecular weight,
cific leads through structure-based virtual charge, and lipophilicity (33, 34) of drug col-
screening techniques. lections are profiled to extract simple counting
rules for relevant descriptors of ADMET-re-
lated parameters. ~ x a m ~ iinclude
es Lipinski's
2.1 Druglikeness Screening
"rule-of-five" (33), which limits the range for
Many drug candidates fail in clinical trials be- molecular weight (MW 5 500), computed oc-
cause of reasons unrelated to potency against tanol-water partition coefficient (Clog P 5 5),
the intended drug target. Pharmacokinetics and hydrogen-bond donors and acceptors
and toxicity issues are blamed for more than (OHs + NHs I5; Ns + 0s 5 10). Other au-.
half of all failures in clinical trials. Therefore, thors limit the number of rotatable bonds (RB
the first part of virtual screening evaluates the 5 8) or rings in a molecule (number of rings
druglikeness of small molecules, mostly inde- 5 4) (34). Table 6.1 shows a list of typical
pendent of their intended drug target (there boundaries of counting parameters. Figure 6.1
are specific drug classes such as those acting in illustrates the profiling procedure for these
the central nervous system that require spe- counting parameters using polar surface area
c drug profiles). Druglike molecules exhibit (PSA) (35) as a descriptor. Collections of 776
vorable absorption, distribution, metabo- orally administered CNS drugs and 1590
m, excretion, and toxicological (ADMET) orally administered non-CNS drugs that
ameters (20-24). They are synthetically reached phase I1 efficacy studies were ana-
asible and possess pharmacophore features lyzed for their PSA. It was found that 90% of
hat offer the chance of specific interactions the non-CNS compounds have a PSA below
th the intended protein target. Druglike- 120 A2;90% of CNS drugs have a PSA below
ss is currently assessed using the following 80 A2.Although it is possible that drugs have
s of methods: simple counting methods, higher PSA values and are still orally bioavail-
ctional group filters, topological filters, and able or penetrate the blood-brain barrier (as
armacophore filters. Computational tech- the result of active transport or other rea-
ques used to identify druglikeness include sons), the profile suggests that it is much less
ural networks (25-27), recursive partition- likely. It is therefore a reasonable assumption
approaches (25, 28), and genetic algo- in a virtual screening approach to discrimi-
thms (29). These methods are further dis- nate against compounds outside the most pop-
ulated descriptor space (in this case, PSA
Virtual Screening
Figure 6.1. Distribution of po-

lar surface area for 776 orally ad-
ministered CNS drugs (black
bars) and for 1590 orally admin-
istered non-CNS drugs (white
bars) that have reached clinical
phase I1 efficacy studies (35). Polar surface area [ A ~ ]
< 120 A'), especially if the compound lies out- imation for eliminating potentially toxic com-
side the optimal region for several descriptors pounds. Better descriptions of toxicity may be
(e.g., MW > 500 and Clog P > 5). provided by structure-based methods to assess
Simple descriptors as described above are toxicity of compounds. They draw primarily
quickly calculated and counted. Therefore, af- from mutagenicity, carcinogenicity, and acute
ter typically removing compounds with atoms toxicity databases assembled, for instance, by
other than C, N, 0 , S, H, P, Si, C1, Br, F, and I, the National Toxicology Program (37) and the
counting schemes present the first filter in vir- Toxic Effect of Chemical Substances database,
tual screening approaches. RTECS (38). CASETox (39), TOPKAT (401,
and DEREK (41) are commercial software
2.1.2 Functional Croup Filters. Reactive,
toxic, or otherwise unsuitable compounds,
such as natural product derivatives, are re-
moved using specific substructure filters. Fig-
ure 6.2 shows a subset of substructures that
lead to the dismissal of compounds in virtual
screening. Typical reactive functional groups Sulfonyl halides Acyl halides Alkyl halides
include, for example, reactive alkyl halides,
peroxides, and carbazides. Unsuitable leads
may include crown ethers, disulfides, and ali-
phatic methylene chains seven or more long.
Unsuitable natural products may include qui-
nones, polyenes, or cycloheximide derivatives.
Anhydrides Halopyrimidines Epoxides
A list of such fragments coded in Daylight
SMARTS is given, for example, by Hann and
coworkers (36). It should be noted, however,
that natural product derivatives are not al-
ways unsuitable leads. Aldehydes lmines Thioesters
Screening out compounds that contain cer-
tain atom groups associated with toxicity pro- Figure 6.2. Selection of reactive functional groups
vides a practical and fast way to reduce large that should be removed from a virtual screen (exam-
databases; however, it is only a crude approx- ples taken from Ref. 212).
2 Concept:s of Virtual Screening 247
Molecular 91 Ghose & crippen Neural net

structure atomtypes
'U g (Output = 1)
)n-drug (Output = 0)
Wij \\ 1 /Hidden laver
Figure 6.3. Neural network
architecture for prediction of
druglikeness.
products that can be used to evaluate virtual structure. Ninety-one statistically significant
compoun~ dfor
s potential toxicity. atom types correspond to 91 input neurons of
the neural net. Typically, five neurons in the
2.1.3 Topological Drug Classification. It is hidden layers are used in the net design (25,
generallj assumed that compounds with 27). The single neuron output layer can vary
structunil similarity to known drugs may ex- between 0 (nondrugs) or 1(drugs).Trained on
hibit dru.glike properties themselves, such as 5000 drugs taken from the WDI and 5000 com-
oral bios~vailability,low toxicity, membrane pounds labeled nondrugs taken from the ACD,
permeability, and metabolic stability. Follow- the resulting neural net was shown to cor-
ing this iidea, drug databases and reagent da- rectly classify about 80% of other drugslnon-
tabases sluch as the ACD (42) as negative con- drugs (27).
trol (assuming they do not contain many Recursive partitioning, also known as the
drugs) hiave been analyzed to find structural decision tree approach, is another powerful
features of drugs and nondrugs. Neural net- method to extract knowledge from a database.
work aplyoaches have been devised (25, 27) Wagener and Geerestein have explored the
that can discriminate between drugs and non- WDI and ACD databases to train a decision
drugs urlith about 80% certainty. Recursive tree for the discrimination of drugs and non-
partitioning approaches classify drugs and drugs (28). Figure 6.4 shows a partial decision -
nondrug3 with similar accuracy. tree derived by the authors. One rule derived
2.1.3.1 Artificial Neural Networks and De- from this partial tree is, for example, if a com-
cision Trc:es. Figure 6.3 shows an example of a pound possesses no alcohol and a tertiary ali-
simple neural network that uses Ghose and phatic amine but no methylene linker between
Crippen i&tomtypes (43) to code the molecular a heteroatom and a carbon atom, it is not
Alcohol
I
Tertiary am
Figure 6.4. Partial decision tree from Wagener
and Geerestein (28). C(n)spXdescribes a carbon
1 amine I Nondrug I with hybridization spx and formal oxidation
number n. X refers to a heteroatom; R refers to
any group linked through a carbon. The tree
starts at the top left corner. Here is an example
Phenol; en101; carboxyl
\r.. of how to read the tree: If a compound contains
CHX yes_ El an alcohol, it is classified as a drug. If it does not
contain an alcohol, the presence of a tertiary
amine is checked. If it contains a tertiary amine
and also contains (does not contain) a CH, group
with attached heteroatom as well as another R
group, it is classified as drug (nondrug).
Virtual Screening
Framework Side chains
Figure 6.5. Dissection of a drug mol-

ecule into framework and side chains. h
druglike. It is interesting to note that just by dissected drug molecules from the Compre-
testing the presence or absence of hydroxyl, hensive Medicinal Chemistry (CMC) (30) da-
tertiary and secondary arnines, carboxyl, tabase into side chains and frameworks (con-
phenol, or en01 groups, 75% of all druglike taining ring systems and linkers). They found
structures in the MDDR and CMC can be that only 32 frameworks described the shapes
recognized. of half the 5120 drugs in the CMC containing
Once they are trained, neural networks 1170 scaffolds. Figures 6.5 and 6.6 show the
and decision trees are very fast filter tools in process of reducing a drug molecule to its
virtual screening approaches. They are there- framework and a list of the most frequently
fore applied early in the virtual screening filter occurring frameworks in the CMC. Side
cascade. chains most frequently occurring in drug mol-
2.1.3.2 Structural Frameworks and Side ecules have also been analyzed (45). It has
Chains of Known Drugs. Databases have been been found that of the 15,000 side chains can-
mined to find structural motifs and pharma- tained in the CMC, about 11,000 belong to one
cophore features of small molecules that of only 20 side chains, including (starting with
characterize drugs. Bemis and Murcko (44) the most frequent): carbonyl, methyl, hy-
Figure 6.6. Most frequently occurring

frameworks in drugs (numbers indicate
percentages of occurrence in CMC data-
base). Data are taken from Bemis and
Murcko (44).
2 Concepts of Virtual Screening
"
Urea
I
----
, ,
-;-N+-;-
I . ,
Aromatic nitrogen-
aliphatic carbon
Aromatic carbon- Sulfonamide Figure

- 6.7. %tros.ynthetic reac-
aromatic carbon tions in RECAP.
droxyl, methoxy, chloro, methylamine, pri- of a relatively small number of frameworks

mary amine, carboxylic acid, fluoro, and sul- (ring structures and linkers), an even
fone. Most molecules possess between one and smaller number of side chains, and a small
five side-chains; more than 20% of the drugs number of polar groups characterize drugs
stored in the CMC have two side chains per very well. Although drugs and nondrugs are
not completely distinguishable, it has been
For the analysis of virtual libraries accord- observed that drugs differ somewhat from
ing to the presence or absence of druglike nondrugs in their possession of hydrophobic
frameworks, side-chains or structural motifs moieties that are well functionalized. Non-,
can be used for virtual screening. This idea has drugs often contain underfunctionalized hy-
been extended in RECAP (retrosyntheticcom-
drophobic groups (Fig. 6.8). Recent work to
binatorial analysis procedure), a technique
characterize the druglikeness of molecules
that identifies common motifs in drugs based
focuses more on the presence of key func-
on fragmenting molecules around bonds
formed by common reaction (46) (Fig. 6.7). Ex- tional groups in molecules.
trading rules from RECAP for virtual screen- A simple pharmacophore point filter has
ingrepresents a possible way of addressing the been introduced recently (47). It is based on
questions of ease of synthesis of compounds. A the assumption that druglike molecules
imilar approach to assess the occurrence of should contain at least two distinct pharma-
cophore groups (47). Four functional motifs
nted by Wang and Ramnarayan, who devel- have been identified that guarantee hydrogen-
bonding capabilities that are essential for the
bility (MLCC) between a drug database specific interaction of a drug molecule with its
a test molecule as a measure for druglike- biological target (Fig. 6.9). These motifs can be
combined to functional groups that are also
referred to here as pharmacophore points;
they include: amine, amide, alcohol, ketone,
sulfone, sulfonamide, carboxylic acid, carbam-
.1.4 Pharmacophore Point Filter. The to- ate, guanidine, amidine, urea, and ester. The
gical drug fragmentation approaches following main rules apply to the pharma-
cussed above suggest that the occurrence cophore point filter (PF1):
Virtual Screening
50.0 o
Figure 6.8. Number of pharma-

cophore points in drug databases
(MDDR + CMC) and reagent da-
tabases (ACD). Number of pharmacophore points
0 Pharmacophore points are fused and type databases reveal that about two thirds of
counted as one if they are separated by less drugs and nondrugs can be classified correctly
than two carbon atoms. by PF1. This performance is not as impressive
0 Molecules with less than two and more than as that of neural networks. However, as a filter
seven pharmacophore points fail the filter. for virtual screening, pharmacophore point fil-
Amines are considered pharmacophore ters offer some advantages. First, the occur-
points but not azoles or diazines. rence and count of pharmacophore points can
be evaluated on the building-block level of a
Compounds with more than one carboxylic
virtual combinatorial library. No enumeration
acid are dismissed.
is necessary as for druglike neural nets. Sec-
Compounds without a ring structure are ond, the results of the pharmacophore point
dismissed. filter can be easily interpreted. Third, the set-
0 Intracyclic m i n e s in the same ring are tings of the filter can be easily adjusted (e.g.,
fused to one pharmacophore point. PF1 for non-CNS drugs, PF2 for CNS drugs).
The requirement of two distinct pharmaco- 2.2 Focused Screening Libraries for Lead
phore points neglects at least one very impor- Identification
tant class of drugs: biogenic amine-containing
CNS drugs. Therefore, a second pharmaco- Without the knowledge about specific drug
phore filter has been designed that requires targets it is sometimes useful to apply virtual
only one pharmacophore point in small mole- screening for the design of focused libraries of
cules of the type amine, amidine, guanidine, or a few thousand compounds rather than to find
carboxylic acid (PF2). a small number of hits to be tested against a
An analysis of drug databases and reagent- specific target. To save resources it may some-
times be more prudent not to run the entire
HTS file against a target protein; instead, a
focused library with higher chances of con-
taining hits may be scrutinized. Those focused
libraries may be designed to target specific
protein families such as GPCRs, kinases, or
Figure 6.9. Functional motifs of drugs used to nuclear hormone receptors. They can also be
build pharmacophore points. enriched with privileged structures that occur
more often in drug molecules andlor were compound correlated very well with the
found to inhibit members of the protein fam- GPCR-likeness of the most GPCR-like build-
ing block it contained. This offers an impor-
tant advantage for the design of combinatorial
2.2.1 Targeting Protein Families. Target libraries because, for large virtual libraries,
class-directedlibraries can be built from avail- the computer costs for enumeration go with
able compounds or be synthesized in combina- the power of the number of R groups and thus
torial fashion. The design of target class-di- very quickly becomes impractical. For in-
rected libraries relies on the identification of stance, for a 3-R-group library with 1000
structural motifs in small molecules or in building blocks each, the enumerated library
building blocks for combinatorial libraries would contain 1 billion compounds to be ana-
that can be linked to increased activity for the lyzed, whereas the building block-level analy-
target class. Functional groups that show the sis needs to examine only 3000 compounds.
propensity to hit a certain target class can be Figure 6.10 shows a list of amine building
found by examining ligands from the litera- blocks extracted from the ACD that were
ture. Recurring motifs for GPCRs include, for found to be most GPCR-like by the neural net.
ple, piperazines, morpholines, and pip- Not every portion of a GPCR-like molecule
nes; for kinases they include, for example, has to be GPCR-like. The presence of one
erobicyclic compounds or pyrimidines. GPCR-like moiety (building block or core
ompounds bearing those structural motifs structure) is sufficient to make a compound
e thought to have a generally higher chance GPCR-like. Therefore, the neural network of-
o be active against the respective target fers two different strategies for the design of
ses. A more rigorous approach to identify GPCR-like libraries: (1) GPCR-like core +
e "GPCR-likeness" of compounds or build- druglike building blocks (need not be GPCR-
gblocks can be provided by a statistical anal- like); (2) non-GPCR-like core + GPCR-like
's of druglike databases. Neural networks building blocks. Virtual screening of a data-
e been shown to be particularly useful in base of existing compounds using the de-
iassifying chemical matter, such as CNS-ac- scribed neural net can be applied to assemble a
e compounds (26,48). focused screening library. Alternatively, com-
A neural network approach similar to that binatorial libraries can be designed.
f Sadowski and Kubinyi (27) has been de-
recently to address the "GPCR-like- 2.2.2 Privileged Structures. Privileged struc-
ness" of small molecules as well as building tures are structural types of small molecules
locks for combinatorial libraries (49). A feed- that are able to bind with high affinity to multi-
ard neural net was trained using 5000 ple classes of receptors (50). An enrichment of
nds from the MDDR that target libraries with privileged structures may in-
and 5000 compounds that target other crease the chance of finding active compounds.
em classes. Using the "activity-class" field Examples of privileged structures include ben-
e database, about 20,000 GPCR-like and zazepine analogs found to be effective ligands
,000 non-GPCR-like have been identified by for an enzyme that cleaves the peptide angioten-
ries such as 5HT, leukotriene, and PAF. sin I, whereas others are effective CCK-A recep-
resulting neural net classifies GPCR-like tor ligands. Cyproheptadine derivatives were
pounds correctly with 80% certainty. A n found to have peripheral anticholinergic, antise-
ent test of compounds in our propri- rotonin, antihistaminic, and orexigenic activity.
atabase that were found to hit GPCRs Hydroxamate and benzamidine derivatives
other targets showed a correct prediction of have been shown to be privileged structures for
CR-like compounds in 70% of the cases. metalloproteases and serine proteases, respec-
en several virtual combinatorial libraries tively. For the class of 7-transmembrane G-pro-
e analyzed, it turned out that the property tein-coupled receptors a large number of privi-
eing GPCR-like could be attributed to the leged structures has been found including, for
R-likeness of the building blocks alone; example, diphenylmethane, diazepine, benzaz-
is, the GPCR-likeness of the enumerated epine, biphenyltetrazole, spiropiperidine, in-
252 Virtual Screening
Figure 6.10. Selection of GPCR-like amines from the ACD.
dole, and benzylpiperidine (51). Some ubiqui- sional (3D) conformations of each molecule]
tously privileged structures have recently been (10). Another interesting aspect of pharma-
identified (52).They include carboxylicacids, bi- cophores in virtual screening is 3D-pharma-
phenyls, diphenylmethane, and, to a lesser ex- cophore diversity. Although the diversity con-
tent, naphthyl, phenyl, cyclohexyl, dibenzyl, cept for virtual compounds in general is not
benzimidazole, and quinoline. applicable because of the enormity of the
chemical space, diversity in pharmacophore
2.3 Pharmacophore Screening space is a feasible concept. Virtual libraries
In cases where no structural information can therefore be optimized for covering a wide
about the target protein is given, pharmaco- pharmacophore space.
phore models can provide powerful filter tools
for virtual screening (53). Even in cases where 2.3.1 Introduction to Pharmacophores. In
the protein structure is available, pharma- 1894 Emil Fischer proposed the "lock-and-
cophore filters should be applied early because key" hypothesis to characterize the binding of
they are generally much faster than docking compounds to proteins (54). This can be con-
approaches (discussed below) and can, there- sidered the first attempt to explain binding of
fore, greatly reduce the number of compounds small molecules to a biological target. Proteins
subjected to the more expensive docking appli- recognize substrates through specific interac-
cations. For example, a pharmacophore model tions. It is a challenge for the medicinal chem-
consisting of three pharmacophore points can ist to synthesize compounds that can capture
be tested against about lo6 compounds in a the 3D arrangement of functional groups in a
few minutes of computer time [disregarding small molecule that forms the pharmacophore
the time it takes to generate three-dimen- and that is responsible for substrate binding
! Concepts of Virtual Screening
F82
Hydrophobic
inieractions
Figure 6.11. Pharmacophore de-

rived based on the interactions be-
HBD tween human cyclin-dependent ki-
nase 2 and the adenine-derived
inhibitor H717 as observed in the
X-ray structure of the complex
(PDB entry 1G5S). Dashed lines
highlight hydrogen-bonding inter-
actions. HBD, hydrogen-bond do-
nor; HBA, hydrogen-bond accep-
tor. The hinge region is linking the,
Hydrophobic N- and C-terminal domains of a ki-
nase.
,othe protein. The first definition of the phar- bic region through the cyclopentyl group, and
nacophore formulated by Paul Ehrlich was "a to Asp145 and Asn132 through hydrogen
nolecular framework that carries (phoros) the bonds. The pharmacophore that reflects these
ssential features responsible for a drug's interactions has a hydrogen-bond donor and a
pharmacon) biological activity" (55). This hydrogen-bond acceptor pair that ensures
lefinition was slightly modified by Peter Gund binding to the hinge region, a hydrophobic
o "a set of structural features in a molecule group that corresponds to the cyclopentyl
hat is recognized at a receptor site and is rebinding site, and a hydrogen-bond donor that
iponsible for that molecule's biological activ- ensures binding to Asp145 and/or Asn132.
ty" (56).An example is shown in Fig. 6.11. An Note that in addition to distances that de-
hay structure of CDK2 complexed with the scribe the 3D relationship among pharma-
~denine-derivedinhibitor H717 (57-59) has cophore points, angles, dihedrals, and exclu-
Ieen solved. Interactions that are essential to sion volumes are also used. Each additional
iubstrate and inhibitor binding to the enzyme restraint can reduce the number of hits, thus
d l form the pharmacophore that should be making the compound selection easier for
:aptured by inhibitors binding the same way testing. Pharmacophore hypotheses for
1717 does. As shown in Fig. 6.11, the inhibitor searching can be generated using structural
~indsto the hinge region (Phe82 and Leu83) information from active inhibitors, ligands, or
hrough two hydrogen bonds, to a hydropho- from the protein active site itself (60, 61).
Virtual Screening
CACTVS
Daylight
CACTVS
Figure 6.12. Examples of SMILES notations Daylight

for two compounds obtained using CACWS and
Daylight.
2.3.2 Databases of Organic Compounds. property information of their compounds.

Virtual screening is used in general for select- Sometimes compounds are coded in linear rep-
ing potentially active compounds from data- resentations such as the SMILES (69, 70) np-
bases of compounds available either in-house tation. The SMILES codes obtained using
or from a vendor. Because virtual screening is CACTVS and Daylight programs for 4-benzyl
not accurate enough to identify only active pyridine and R-cocaine are shown in Fig. 6.12.
compounds as hits, it is less risky to screen The primary source of 3D experimental
databases with existing compounds rather structures of organic molecules is the Cam-
than synthesize a new library. Nevertheless, bridge Structural Database (71). Alterna-
virtual libraries that can be synthesized tively, 2D databases of organic compounds can
through combinatorial chemistry and/or rapid be converted into 3D databases using several
analoging can easily be generated using in software programs (72). Each program starts
silico methods. These libraries are more often with generating a crude structure that is sub-
generated for lead optimization and synthesis sequently optimized using a force field. CON-
prioritization (62, 63). CORD (73) applies rules derived from experi-
There is a wealth of databases that code mental structures and a univariate strain
available compounds typically in the two-di- function for building an initial structure. CO-
mensional standard data (2D-SD) format in- RINA (74) generates an initial structure by
cluding connectivity from MACCS (32, 64). use of a standard set of bond lengths, angles
The most common databases are the Available and dihedrals, and rules for cyclic systems.
Chemicals Directory (ACD) (42), Spresi (65), RUBICON (75) invokes distance geometry
Chemical Abstracts Database (66), and the techniques to generate 3D structures based on
National Cancer Institute Database (67, 68). connectivity tables. This program also uses
Many vendors of chemicals also provide bond lengths and angle tables to build a ma-
searchable databases with 2D-structure and trix containing the upper and lower bounds
for distances between all atoms in the mole- has also been the subject of several studies. Xue
cule. OMEGA (76) uses a torsion-driven ap- et al. showed that compounds with similar activ-
proach for building conformers. It generates ity could be identified using mini-fingerprints
low energy conformers for each molecule by (87-89), physicochemical property descriptors
assembling it from fragments and searching (go), or latent semantic structure indexing (91,
through possible orientations of the subunit 92). In addition, similarity searches can be com-
added. WIZARD (77) and COBRA (78), AIMB bined with superstructure searches for limiting
(79) and MIMUMBA (80) employ artificial in- the number of compounds selected. Flexible
telligence techniques for generating a set of match searches are used for identifying com-
user-specified low energy conformations for a pounds that differ from the query structure in
compound. MOLGEO (81) uses a depth-first user-specified ways. In addition, isomer, tau-
approach for generating 3D structures based tomer, and parent molecule searches may be
on connectivity using bond length and bond done to find in a database isomers, tautomers, or
angle tables. IDEALIZE (82) is a molecular parent molecules of the query.
mechanics program that minimizes 2D struc-
tures to generate the corresponding 3D struc-
ture. 2.3.4 3D Pharmacophores
2.3.4.1 Ligand-Based Pharmacophore Gen-
2.3.3 2D Pharmacophore Searching. Search- eration. Ligand-based pharmacophores are
g 2D databases is ofgreat importance for ac- typically used when the crystallographic, solu-
tion structure, or modeled structure of a pro-
tein cannot be obtained. When a set of active
compounds is known and it is hypothesized
for synthesis or analogs of a lead com- that all compounds bind in a similar way to the
protein, then common groups should interact
h a 2D database to identify compounds of with the same protein residues. Thus, a phar-
macophore capturing these common features
r a compound is present in the data- should be able to identify from a database
searches identify larger mol- novel compounds that bind to the same site of
the user-defined query, irre- the protein as the known compounds do. The
ive of the environment in which the query process of deriving a pharmacophore, called
structure occurs (83) (Fig. 6.13). Further- pharmacophore mapping, consists of three
re, substructure searching can identlfy all steps: (1) identifying common binding ele-
ase that share the same ments that are responsible for biological activ-
ity; (2) generating potential conformations
s can be used for gener- that active compounds may adopt; and (3) de-
structure-activity relationships ( S m ) , termining the 3D relationship between phar-
before synthetic plans are made for lead macophore elements in each conformation
ast, superstructure generated. To build a pharmacophore based
are used to find smaller molecules that on a set of active compounds, two methods are
dded in the query (Fig. 6.14). One prob- usually applied. One method is to generate a
arises from substructure searches is set of minimum energy conformations for
the number of compounds identified can each ligand and search for common structural
into the thousands. A solution to this features. Another method is to consider all
is ranking the compounds based on possible conformations of each ligand to eval-
ty to a reference compound. Similarity uate shared orientations of common func-
tional groups. Analyzing many low energy
between com- conformers of active compounds can suggest a
ds in the database and in the query (85,861 range of the distance between key groups that
rs used in simi- will take in account the flexibility of the li-
searches is provided by Willett et al. (86). gands and of the protein. This task can be per-
nd structural similarity, activity similarity formed either manually or automatically.
Virtual Screening
ACD database
Figure 6.13. Compounds identified from the ACD database through substructure search.
3.1g.2 Manual Pharmacophore Genera- and every " conformation considered led to the
tion. Manual pharmacophore generation is distance ranges among pharmacophore points
used when there is an easy way to identify the shown in Fig. 6.16. Because proteins are flex-
common features in a set of active compounds ible, pharmacophores should also have some
and/or there is experimental evidence that flexibility built in, thus justifying the use of
some functional groups should be present in distance ranges.
the ligand for good activity. An example is the 2.3.4.3 Automatic Pharmacophore Genera-
development of a pharmacophore model for tion. Pharmacophore generation through
dopamine-transporter (DAT) inhibitors (Fig. conformational analysis and manual align-
6.16). In the first step common structural fea- ment is a very time-consuming task, especially
tures were identified in the selected five DAT when the list of active ligands is large and the
inhibitors (93-95) (Fig. 6.16, circles). Four out elements of the pharmacophore model are not
of five compounds were structurally rigid, obvious. There are several programs, HipHop
whereas the khydroxy piperidinol was flexi- (961, HypoGen (97), Disco (98), Gasp (99), Flo
ble. A systematic conformational search for (loo), APEX (101), and ROCS (1021, that can
4-hydroxy piperidinol identified 10 possible automatically generate potential pharma-
conformations. Measuring distances among -
cophores from a list of known inhibitors. The
pharmacophore elements in every inhibitor performance of these programs in automated
Figure 6.14. Compounds identified from
the ACD database through superstruc-
ture search.
Figure 6.15. Compounds identified

from the ACD database through sim-
ilarity search using 60% similarity as
threshold. The lower the specified
similarity the higher the number of
hits identified.
Virtual Screening
Figure 6.16. Manual pharmacophore mapping by measuring distances between pharmacophore

points in every compound and conformation considered. Pharmacophore elements are highlighted
with circles. All structures were built and minimized using QUANTA. Conformers of 4-hydroxy
piperidinol were generated using the Grid Scan method from QUANTA, followed by clustering, to
identify unique conformers.
pharmacophore generation varies depending rigid, Carlson et al. (110) proposed using mo-
on the training set. The use of these programs lecular dynamics simulation for generating a
for pharmacophore generations was recently set of diverse protein conformations to include
reviewed in detail (103). Here we focus on protein flexibility in the pharmacophore de-
common features of these programs. All pro- velopment. In this case distance ranges be-
grams use algorithms that identify common tween pharmacophores are obtained by exam-
pharmacophore features in the training set ining several conformations of the protein.
molecules; they use scoring functions to rank This technique is similar to the one used for
the identified pharmacophores. The following the generation of flexible pharmacophores
features are identified in each molecule: hy- (Fig. 6.16), based on active compounds, when
drogen-bond donors, hydrogen-bond accep- several conformations of the compound and/or
tors, negative and positive charge centers, and many compounds are considered for pharma-
surface accessible hydrophobic regions that cophore mapping.
can be aliphatic, aromatic, or nonspecific.
Most of the programs consider ligand flexibil- 2.3.5 Pharmacophore-Based Virtual Screen-
ity when generating pharmacophores because ing. ~harmaco~hoEe-based virtual screening
compounds might not bind to the protein in is the process of matching atoms and/or func-
the minimum energy conformation. tional groups and the geometric relations be-
2.3.4.4 Receptor-Based pharmacophore Gen- tween them to the pharmacophore in the
eration. If the 3D structure of a receptor is query. Examples of programs that perform
known, a pharmacophore model can be de- pharmacophore-based searches are 3Dsearch
rived based on the receptor active site. Bio- (Ill),Aladdin (53),UNITY (1121,MACCS-3D
chemical data can be used for identifying key (113), Catalyst (114), and ROCS (102). There
residues that are important for substrate are also web-based applications (115,116) that
and/orinhibitor binding. This information can can perform pharmacophore searches. Usu-
be used for building pharmacophores target- ally pharmacophore-based searches are done
ing the region defined by key residues or for in two steps. First, the software checks
choosing among pharmacophores generated whether the compound has the atom types
by an automated program. This can greatly and/or functional groups required by the phar-
improve the chance of finding small molecules macophore; then it checks whether the spatial
that inhibit the protein because the search is arrangement of these elements matches the
focused on a region of the binding site that is query. The fastest approach used in the
cial for binding substrates and inhibitors. matching step is considering rigid compounds.
ligands bind to proteins through non- Because molecules that are not rigid might
d interactions such as hydrogen bonds have a conformation that matches the phar-
d hydrophobic interactions. Programs such macophore, flexibility of the ligands should be
LUDI (104-106) or POCKET (107) can use considered. Flexible 3D searches identify a
e structure of the protein to generate inter- higher number of hits than rigid searches do
ion sites or grids to characterize favorable (117). However, flexible searches are more
sitions that ligand atoms should occupy. time consuming than rigid ones. There are
our types of interaction sites are character- two main approaches for including conforma-
ed: hydrogen-bond donors, hydrogen-bond tional flexibility into the search: one is to gen-
cceptors, and hydrophobic groups that can be erate a user-defined number of representative
pophilic-aliphatic or lipophilic-aromatic. conformations for each molecule when the da-
I-generated interaction maps for Cerius2 tabase is created; the other is to generate con-
dure-Based Focusing (108) do not differ- formations during the search. By use of the
late between aliphatic and aromatic inter- first approach, any rigid search program can
ion sites. This is based on the observation be used for doing a flexible search; however,
Burley and Petsko (109) that, besides aro- generating the database takes more time and
ic side chains, aliphatic and aromatic side disk space. The second approach gives more
ns also pack closely to form the hydropho- flexibility to the user, given that a larger num-
bic core of proteins. Because proteins are not ber of conformations can be generated for each
260 Virtual Screenin
molecule during the search. In this case the a specific target protein.' Computational met1
database search requires more computer re- ods that predict the 3D structure of a proteir
sources; however, this approach will not miss ligand complex are often referred to as molec
conformations that fit the query but were not ular docking approaches (Fig. 6.17) (124
stored in the database. Pharmacophore que- Protein structures can be employed to doc
ries that define distance ranges between phar- ligands into the binding site of the protein an
macophore elements compensate for possible to study their interactions (125). For virtu;
conformational changes in the receptor site screening, the crucial task at hand is the far
upon ligand binding. Also, these flexible phar- and reliable ranking of a database of putatih
macophore queries compensate for the differ- protein-ligand complexes according to the:
ence between using multiconformer databases binding affinities. Depending on ligand an
and generating conformers during the search. protein flexibility, sampling depth, and opt
ROCS is using a shape-based superposition mizing schemes, docking programs used toda
for identifying compounds that have similar (Table 6.2) can facilitate this task within a fe.
shape. Grant and Pickup (118) showed that minutes or sometimes seconds per processc
using atomic-centered Gaussians instead of a and molecule. Virtual screening as a computi
spherical function can dramatically reduce the tion task can be trivially run using parall1
time required for a shape alignment of two computing because the protein-ligand dockir
molecules. This improved routine allows the events are completely independent of eac
program to perform shape-based database other. Although docking has initially been dl
searches at an acceptable speed (300-400 veloped as a specialist modeling tool run a
conformers/s). computer workstations, nowadays inexpei
There are several methods for generating sive Linux clusters or distributed computir
conformers during i n silico screening. Torsion over networked PCs can be used for virtu;
optimization (119) is used for minimizing the screening. This increases the in silic
root-mean-square (rms) deviation between throughput into the realm of 100,000 con
the constraints from the pharmacophore and pounds per day on a Linux cluster, therel:
the corresponding distances in the compound. reaching the speed of today's high throughpi
The "directed tweak" (120) algorithm also screens. Energy functions that evaluate t1
uses torsion o~timizationfor minimizing - the
sum of the squared deviations between dis-
tances in the pharmacophore and the corre-
sponding ones in the compound. Chem-
DBS-3D (121) generates low energy
conformations that can match the pharma-
cophore using rules similar to those in WIZ-
ARD (77). The distance geometry algorithm
(122) uses bond length and bond angle infor-
mation for building a matrix containing upper
and lower limits of distances between atoms in
the organic compound. These distances can be
used for building the conformation that fits
the pharmacophore query. The systematic
search method (123) is feasible for molecules
with few rotatable bonds and thus has limited
applicability.
2.4 Structure-Based Virtual Screening
In direct analogy to high throughput screen-
ing, docking and scoring techniques can be aP- Figure 6.17. Crystal structure (PDB entry la4q)
plied to computationally screen a database of the neuraminidase inhibitor zanamivir bound in tl
hundreds of thousands of compounds against active site (213).
Table 6.2 Selection of Available Protein-Ligand Docking Software for Structure-Based

Virtual Screening
DockingISampling
Docking Program Method Scoring Method
GLIDE (www.schrodinger.com) Rigid protein; multiple Empirical scoring, including
conformation rigid penalty term for
docking; grid-based unformed hydrogen
energy evaluation bonds; force-field scoring
DOCK (wwwxmpharm. Rigid protein; flexible Force-field scoring; chemical
ucsf.edu/kuntz/dock.html) ligand docking scoring, contact scoring
(incremental
construction)
FlexX (cartan.gmd.de/FlexX) Rigid protein; flexible Empirical scoring
ligand docking intertwined with
(incremental sampling
construction)
Dockvision Monte Carlo, genetic Various force fields
(www.dockvision.com) algorithm
DockIT (www.daylight.com/ Ligand conformations PLP, PMF
meetings/emugOO/ Dixon) generated inside
binding-site spheres
using distance
geometry
FRED (www.eyesopen. Exhaustive sampling; Chemscore, PLP,
com/fred.html) rigid protein, Screenscore, and
multiple Gaussian shape scoring
conformation rigid
docking
LigandFit (www.accelrys.com) Monte Carlo LIGSCORE, PLP, PMF,
LUDI
Gold (www.ccdc.cam.ac. Genetic Algorithm Soft core vdW potential and
uk/prods/gold/) hydrogen bond potentials
binding free energy between protein and li- computational docking experiment (127).
gand sometimes employ rather heuristic Moreover, many receptor sites are flexible;
terms. Therefore, those functions are more they often undergo conformational changes
broadly referred to as scoring functions. upon ligand binding. A good example is the
Tyr248 movement of carboxypeptidases upon
2.4.1 Protein Structures. A 3D-protein struc- substrate or ligand binding, which has pro-
ture of the receptor at atomic resolution is nec- vided the first structural perspective of Kosh-
essary to start a protein-ligand docking exper- land's induced-fit hypothesis (128, 129). Pro-
iment. The exponential growth of solved teins have to be studied carefully in every
crystal and solution structures in recent years individual case to decide how promising a vir-
provides a reliable source of protein struc- tual screen may be.
tures. The protein database (PDB) currently For many protein drug targets crystal or
holds more than 18,000 protein structures. It solution structures are not available. In such
should be noted, however, that the chances of cases homology models (130,131) and pseudo-
a successful virtual screen very much depend receptor models (132) are often used. How-
on the quality of the available structure. The ever, unless there is a very high conservation
crystal structure should be well refined; typi- of receptor site residues the use of homology
cally a resolution of at least 2.5 A is considered models for virtual screening is much riskier
to be necessary (126). Small changes in struc- than using solved structures. On the other
ture can drastically alter the outcome of a hand, the PDB contains a wealth of protein
Virtual Screening
Table 6.3 Occurrence of Selected Protein crystal structures of protein-ligand com-

Classes Currently Identified in the Human plexes, are typically about 25 kcal/mo12 higher
Genome Database (GDB) and the Protein than minimum conformations in vacuum
Database (PDB) (133). Therefore, the bioactive conformation
Class GDBa PDBb of a ligand is hard to guess and a large number
Nuclear receptors 49 42 of possible ligand conformations have to be
G-protein-coupled receptors 408 1 considered in docking. Most docking ap-
Kinase 945 625 proaches keep the receptor rigid and the li-
Protease 190 330 gand flexible during the docking. Although
Peptidase 108 128 protein flexibility is sometimes included (134-
Esterase 106 87 1361, we will not discuss protein flexibility
R.edudase 210 417 here, given that it is currently rarely used for
Synthase 191 335 virtual screening because of speed limitations.
Lyase 38 70 Some relevant concepts of docking approaches
Hydrolase 131 110
are shortly discussed below (fora broader review
Transferase 500 467
Anhydrase 27 156 we refer the reader to Ref. 125). Scoringprotein-
Sulfatase 26 6 ligand complexes will be discussed separately.
Dehydrogenase 347 338 2.4.2.1 Rigid Docking. Although ligand and
Desaturase 10 1 often also protein flexibility are crucial for pro-
Phosphatase 315 184 tein-ligand docking, the simpler rigid ligand
Phosphodiesterase 63 1 docking is sometimes useful. Ligand flexibility
Deacetylase 18 2 can, for example, be simulated by rigidly docking
Transporter 238 1 an ensemble of preassigned ligand confonna-
Channel 271 24 tions that represent the relevant conforma-
%ww.gdb.org tional space of the molecule. Algorithms such as
bwww.rcsb.orglpdb;note that the number of structures clique search techniques (137) and geometric
available from the PDB often include several structures of hashing (138) are oflen used to search for dis-
the same protein.
tance-compatible matches of protein and ligand
features (139).Possible features include comple-
structures of a wide variety of enzymes and mentary hydrogen-bonding interactions, ,dis-
receptors that can be used for homology mod- tances, or volume segments of the receptor site
eling. Homology models can be built for a large of the protein or the ligand.
number of protein classes coded in the human The program DOCK uses an algorithm for
genome (Table 6.3). Because virtual screening rigid-body docking based on the idea of search-
is so inexpensive and the possible rewards, if ing for distance-compatible matches. Starting
successful, are so high it is generally war- with the molecular surface of the protein
ranted to run a virtual screening experiment, (140-142), a set of spheres is created inside
even if the chances of success are very small, the receptor site. The spheres represent the
as is often the case when homology models are volume that could be occupied by a ligand mol-
employed. ecule (Fig. 6.18). Spheres can represent the
ligand also; a direct atom representation is
2.4.2 Computational Protein-Ligand Dock- also possible. Early versions of DOCK relied
ing Techniques. Docking ligands into a recep- solely on rigid ligand docking. Sets of up to
tor site is a geometric search problem. The four distance-compatible matches were evalu-
search has to take protein and ligand confor- ated. Each set was used for an initial fit of the
mations as well as their relative orientations ligand into the receptor site. Additional com-
into account. The receptor conformation is patibility matches were used to improve the
typically reasonably well known. However, fit. The position of the ligand was then opti-
the bioactive conformation of the ligand is mized and scored.
usually unknown. Nicklaus and coworkers Since its first introduction in 1982, the
showed that force-field energies of bioactive DOCK software has been extended in several
conformations of ligands, as represented in directions. The matching spheres can be la-
hncepts of Virtual Screening 263
Ligand flexibility can be artificially in-

cluded into docking by rigidly docking ensem-
bles of pregenerated conformations of the li-
gand into the receptor site. Rigid docking is
faster than flexible docking by use of a frag-
mentation approach. However, because com-
puting time increases linearly with the num-
ber of conformations, computing time and
coverage of conformational mace have to be
balanced. An example of rigid docking of con-
Firgure 6.18. Receptor site of thyroid receptor beta
fllled with spheres (for sake of clarity, sphere cen-
formation ensembles is given in Flexibasel
ters are depicted; actual size of spheres is larger, so FLOG (152). Distance geometry methods
thr~tspheres overlap) and thyronine. Crystal struc- (153) are used to generate a small set of di-
tu1.e taken from the PDB (ID lbsx). verse conformations for each ligand in the da-
tabase. A subset of up to 25 conformations per
molecule is selected using rms dissimilarity
beled with chemical properties (143) and dis- criteria and then docked using a rigid-body-
talice bins are used to speed up the search pro- docking algorithm.
cer3s (144,145).Recently, the search algorithm Different from the combinatorial ap-
for distance-compatible matches was changed proaches for docking mentioned above, simu-
to the clique-detection algorithm introduced lation methods start with a given configura-
,
Kuh1(139,146).Furthermore, several scor- tion of a ligand in the receptor site. Simulation
;functions are now applied in combination techniques such as simulated annealing (154)
th the DOCK algorithm (147-151). are then applied to find energetically more fa-
2.4.2.2 Flexible Ligands. Druglike mole- vorable conformations of the ligand. To speed
CUles are typically flexible, with usually up to up the docking process, docking programs
c :ht rotatable bonds (34). Energetic differ-
ences between alternative ligand conforma-
such as AutoDock (155) precalculate molecu-
lar affinity potentials of the protein on a grid.
I tioIns are often small compared to the total Molecular dynamics (MD) methods (see, e.g.,
1 birlding affinity between iigand and target refs. 156 and 157) and Monte Carlo simulation .
P'3tein. Also, for flexible ligands it is quite techniques (see, e.g., Refs. 158-162) are also
co1nmon that the bioactive conformations are frequently used in protein-ligand docking
dif Terent from the minimum energy confor- applications.
I mtitions in solution (133). Ligand flexibility is A variety of other sampling methods are
tylically handled in docking approaches by applied in docking programs, including ge-
C01nbinatorial optimization protocols such as netic algorithms, distance geometry methods,
fragmentation, ensembles, genetic algo- random searching, hybrid methods, and gen-
rit hms, or simulation techniques. eralized effective potential methods. Genetic
In fragmentation approaches, the ligand is algorithms have been employed in programs
dieisected into pieces that are either rigid or such as Gambler (163), AutoDock (155), and
thiit can be represented by small conforma- GOLD (126). PRO-LEADS uses an alternative
tional ensembles. In docking approaches, typ- search technique called "tabu search" (164).
ically a strategy called incremental construc- Starting from a random structure, new struc-
ti0n is used to assemble fragments to whole tures are created by random moves. A tabu list
mcilecules directly in the receptor site. Usu- is maintained during the optimization phase
all:y, the largest rigid moiety of the ligand and contains the best and the most recently
(solmetimes called anchor) is docked first in found binding configurations. Configurations
thc? receptor site. The remaining fragments that resemble those stored in the tabu list are
arc3 subsequently added in a buildup protocol. rejected, except they are better than the one
Aft;er each incremental buildup step, torsion scoring best. The sampling performance is im-
an;gles are sampled and the growing molecule proved because previously sampled config-
is Ininimized. urations are avoided. Finally, it should be
Virtual Screening
mentioned that multistep hybrid docking pro- have been added to FF scores. Examples in-
cedures have been developed that combine clude generalized Born/surface area ap-
rapid fragment-based searching with sophisti- proaches (176) or atomic solvation parameters
cated MC or MD simulations (165, 166). (177-179).
2.4.3.2 Empirical Scoring. Empirical scor-
2.4.3 Scoring of Protein-Ligand Interactions. ing functions are multivariate regression
The problem of sampling the correct binding methods. They fit coefficients of physically
geometry (binding mode) of a protein-ligand motivated contributions to binding free en-
complex is considered to be solved in many ergy in reproduction of measured binding af-
docking programs (167). However, to identify finities of a training set of protein-ligand com-
this correct binding mode by its lowest energy plexes with known 3D structure. As an
or score is a different matter; this is indeed the example, the docking program FlexX (180)
bottleneck of docking-scoring approaches to- uses a scoring function similar to that of Bohm
day. The most important aspect of scoring (181,182). It calculates the sum of free-energy
functions for virtual screening is speed. contributions from the number of rotatable
Therefore, accuracy requirements are low; bonds in the ligand, hydrogen bonds, ion-pair
most functions used do not conceptually de- interactions, hydrophobic and pi-stacking in-
scribe binding free energies. Therefore, these teractions of aromatic groups, and lipophilic
functions are typically not called energy func- interactions:
tions but scoring functions. Three main scor-
ing strategies are typically used in docking ap-
plications for virtual screening: force field
scoring, empirical scoring, and knowledge-
based scoring.
2.4.3.1 Force Field (FF) Scoring. Nonbonded
interaction energy terms of standard force fields
are typically used in FF scoring (e.g., in vmuo ionic-int
electrostatic terms; sometimes modified by scal-
ing constants that assume the protein to be an
+ AGaro 2 f(AR, A a )
electrostatic continuum) and van der Wads
aro-int .
(vdW) terms (168-171). DOCK and GREEN + AG~ipo 2 f*(AR)
(172) use the intermolecular terms of the AM- 1ipo.cont
BER energy function (173,174),with the excep-
tion of an explicit hydrogen bonding term (147):
where AGO, AG,,,, AG,,, AG,,, AG,, and
AGlip0are adjustable parameters that are fit-
ted; PAR, A a ) is a scaling function penalizing
deviations from the ideal geometry; and N,,, is
the number of freely rotatable bonds. The in-
teraction of aromatic groups is an addition to
Bohm's original force-field design (181, 182).
where each term is summed up over ligand The lipophilic contributions are calculated as
atoms i and protein atoms j.AGand BGare the a sum of atom-pair contacts in contrast to
vdW repulsion and attraction parameters of evaluating a surface grid as in Bohm's scoring
the 6-12 potential, rGis the distance between function. Bohm's scoring function and its
atoms i and j,q is a point charge at each of the FlexX implementation are being improved
atoms, and D is the dielectric constant. Intra- and additional terms are being tested (see,
ligand interactions are added to the score. Up e.g., Refs. 182 and 183).
to a 100-fold gain in docking time can be 2.4.3.3 Knowledge-Based Scoring. Because
achieved by precomputing these terms on a 3D the forces that govern protein-ligand interac-
grid that represents the protein during dock- tions are so complex, an implicit approach to
ing (155, 175). More recently, solvation terms capture all relevant terms of protein-ligand
Figure 6.19. PMF score

calculated for 132 protein-li-
gand complexes taken from
the PDB without overlap to
the set of 697 complexes.
-10.0 - 'O
The PMF score was derived
log Kl from Ref. 186.
binding seems very attractive. Borrowing sphere radius of 12 A (184); k , is the Boltz-
from statistical thermodynamics of liquids, mann factor; T is the absolute temperature;
mean-field approaches derived solely from and f,,,-,,,, J(r)is a ligand volume correction
structural information have been applied to factor that is introduced because intraligand
protein-ligand binding. Protein-ligand atom- interactions
..
are not accounted for (185, 186).
pair potentials can be calculated from struc- p,,,"J(r) designates the number density of
tural data (e.g., PDB), assuming that observed atom pairs of type .. g at a certain atom-pair
crystallographic protein-ligand complexes ex- distance r. hU,,"is the number density of a
hibit optimal placement. As an example, a ligand-protein atom pair of type in a refer-
knowledge-basedscoring function was derived ence sphere with radius R (184). For use in
recently using 697 protein-ligand complexes docking studies, the PMF score is combined
from the PDB as knowledge base. Using 16 with a vdW term to account for short-range
protein and 34 ligand atom types, a total of 282 interactions (187, 188). The PMF scoring
statistically significant interaction potentials function was implemented into the DOCK4.0
s was derived. The final score is program. For faster scoring it was also imple-
the sum over all protein-ligand mented on a grid similar to the force-field
m-pair interactions. score in DOCK. Flexible docking experiments
on FK506 binding protein (187), neuramini-
PMF-score = 2 AzJ(r); dase (127),and stromelysin (189) showed high
kl + r < rmtorr'J (6.3)
predictive power and robustness of the PMF
[
A,(r) = - ~ B Tln f~ol-corrJ(r)
p4(r)1
hulk
41
score. Figure 6.19 shows the predictive power
of the scoring function applied to 132 protein-
ligand complexes taken from the PDB.
ere k1 is a ligand-protein atom pair of type 2.4.3.4 Consensus Scoring. Consensus scor-
designates the distance at which ating is an approach that combines several scor-
actions are truncated (6 A for ing functions to find common hits. Such an
on-carbon interactions and 9 A other- approach seems desirable because of the miss-
e); all A,(r) are derived with a reference ing robustness of current scoring functions.
Virtual Screening
Charifson et al. (163) provided a comprehen- scoring functions and also in consensus com-
sive consensus scoring study using DOCK and bination. Consensus scoring experiments re-
Gambler, in combination with 13 scoring func- ported by Bissantz et al. found that docking1
tions: LUDI (104), ChemScore (190, 191), consensus scoring performances varied widely
Score (192), PLP (193), Merck force field among targets (198). Stahl and Rarey sug-
(1941, DOCK energy score (146, 147), DOCK gested that the combinations of FlexX and
chemical score, Flog (152),strain energy, Pois- PLP scores are ideal for consensus scoring for
son Boltzmann (195),buried lipophilic surface a variety of targets including COX-2, ER, p38
area (1961, DOCK contact score (1441,and vol- MAP kinase, gyrase, thrombin, gelatinase A,
ume overlap (197). Three enzymes were used and neurarninidase (199).
as test proteins: p38 MAP kinase, inosine
monophosphate dehydrogenase, and HIV pro- 2.4.4 Docking as Virtual Screening Tool. A
tease. By comparing the performance of sin- virtual screening protocol is schematically
gle-scoring functions with consensus scoring shown in Fig. 6.20. The necessary steps in-
schemes involving two or three scoring func- clude: protein structure preparation, ligand
tions, the authors found that false positives database preparation, docking calculation,
(inactive compounds that have high predicted and postprocessing.
scores) were significantly reduced in the latter The protein has to be prepared only once
case. The authors estimated that a consensus for a virtual screening experiment unless dif-
scoring approach would consistently provide ferent protein conformations are considered.
hit rates between 5 and 10% (5-10 out of 100 The receptor site needs to be determined and
compounds tested to show low f l activity) for charges have to be assigned. The protein
enzymes with reasonably buried binding sites. structure and the receptor site have to be mod-
A comparison of the different scoring func- eled as accurately as possible. Determining
tions revealed that ChemScore, PLP, and protein surface atoms and site points as well
DOCK energy score performed best as single- as the assignment of interaction data, such as
Protein structure
Protonationl
2 k&J
assignment charge assign.
3D structure Binding site

generation determination
- - - - -- - ............................ - - - - - - -1
I I
I I
I Conformation Task scheduling Surface I
II data assignment calculation I
I I
I
I
I
I
Interaction Docking engine
\
I I
I
I
I
scoring function lnteraction
I data assignment
\ J
data assignment
I
II
I----------------------.-------------------------I
Placement
optimization
I
1
Figure 6.20. Flowchart of
Consensus
scoring
I
I
I
docking as virtual screen-
ing tool in the example of Ranking1 Hit list
selection
FlexX.
Virtual library 1 012 compounds
ADME/tox/druglikeness filters
n 1 o9 compounds
2D similarityldissimilarity
3D conformations
II 1 o7 compounds
(10-100 per compound) 1 o7 compounds
3D pharmacophore /I 1 o5 compounds
Docking 1 o4 compounds
Scoring 1 o3 compounds
Visual inspection
Compounds assayed
V 10' - 1 o2 compounds
1 o1 - 1 o2 compounds
Figure 6.21. Virtual screening filter
cascade.
marking hydrogen-bond donors/acceptors, drogen bonds or other constraints that were

and so forth, are sometimes internally in- not met in the primary scoring function. Be-
cluded in the docking software (e.g., in FlexX) cause of the limitations of scoring functions, a
and sometimes done separately (e.g., DOCK). postscoring protocol can be used to reach con-
Because of the large number of molecules, sensus about hits (discussed above). The rec-
manual steps in the preparation of ligand da- ognition of known active ligands mixed within
tabases obviously have to be avoided. Starting the database can be used to find an appropri-
typically from 2D structures, bond types have ate threshold for separating the top-ranking
to be checked, protonation states must be de- compounds from the rest of the database.
termined, charges must be assigned, and sol-
vent molecules removed. 3D coordinates can 2.5 Filter Cascade
be generated using a program such as CON-
Virtual screening is the process of reducing a
CORD or CORINA (74) (see Section 2.3.2). given database as quickly and efficiently as
Next, site points for hydrogen-bonding inter-
possible to a small number of putative lead
actions have to be assigned and rotational bar-
compounds for a given drug discovery project.
riers must be calculated. These tasks are
The techniques described above form a cas-
sometimes included in the docking program
cade of different filter functions that are or-
(e.g., FlexX).
dered by their speed. Fast ADMET filters are
The docking calculation is typically done
followed by 2D and 3D pharmacophore filters
for one ligand at a time. Depending on optimi-
and finally by docking and scoring methods.
zation and sampling parameters as well as on
Figure 6.21 shows a scheme of a possible vir-
the flexibility of the compound, typically be-
tual screening filter cascade.
tween a few seconds and a few minutes of CPU
time are needed to dock a ligand. Because the
individual docking events are independent of APPLICATIONS
each other, they can run on parallel hardware.
Task scheduleis that distribute ligand dock- 3.1 of Novel DAT
ing on available CPUs are used in many dock-
through 3D Pharmacophore-Based
ing programs.
Database Search
Postvrocessing
- - steps
- of hits may- include re-
finement of placement using MD techniques, The dopamine transporter (DAT) is a 12-
specific pharmacophore-based filters that pe- transmembrane helix protein that plays a crit-
nalize certain features, such as unformed hy- ical role in terminating dopamine neurotrans-
Virtual Screenin
mission by taking up dopamine released into model was derived based on two known poten
the synapse. There is no experimental struc- DAT inhibitors R-cocaine and WIN-35065-
ture available for DAT. However, an extensive (Fig. 6.22) (95). The common binding elf
SAR of DAT inhibitors (mostly cocaine ana- ments of these compounds are a ring N thz
logs) is available. DAT is involved in several may be substituted, a carbonyl oxygen, and a
diseases such as drug addiction and attention aromatic ring that can be defined by the pos
deficit disorder (200). For example, ritalin tion of its center (Fig. 6.22). Because bot
[(?)-threo-methylphenidate], a DAT inhibi- compounds have some flexibility, a systemati
tor, is marketed for treating attention deficit conformational search was performed to 01
disorders in children (200, 201). Until retain all possible conformations these con
cently, all efforts in synthesizing DAT inhibi- pounds can have when bound to DAT. T
tors were focused on creating analogs around identify structurally diverse conformers, clut
the tropane, piperazine, methylphenidate, tering of the generated conformers was don1
and 2,3-dihydro-5-hydroxy-5H-imidazo[2,1- Measuring distances among chosen pharmz
a]isoindole cores. It was shown that, despite cophore elements in the generated conformel
structural differences, DAT inhibitors share led to distances shown in Fig. 6.22.
one or more common 3D pharmacophore mod- Recently, analysis of several large chemici
els (95, 202, 203). In an effort to identify new databases showed that the NCI database h~
chemical cores for developing DAT inhibitors by far the highest number of unique con
with new pharmacological profiles, a pharma- pounds (204). Thus this database provides
cophore-based 3D database search was pro- large number of unique synthetic compounc
posed (95). For this purpose a pharmacophore and natural products and is an excellent rc
Figure 6.22. Pharmacophore pro-

posed for identifying DAT inhibi-
tors. The pharmacophore was ob-
tained based on two known DAT
inhibitors, R-cocaine and WIN-
35065-2. Distance ranges between
pharmacophore points were ob-
tained through systematic search of
all possible conformations that the
two compounds may adopt when
bound to DAT.
3 Applications
3.2 Discovery of Novel Matriptase Inhibitors

I Pharmacophore through Structure-Based 3 D Database
Screening
Matriptase is a trypsinlike serine protease
that was proposed to be involved in tissue re-
modeling, cancer invasion, and metastasis
(206). Potent and selective matriptase inhibi-
tors not only would be useful for further elu-
cidation of the role matriptase has in biologi-
I Hits I cal systems but also may be used for the
treatment and/or prevention of cancers. Hepa-
tocyte growth factor activator inhibitor 1
(HAI-1) is a natural inhibitor of matriptase.
Thus, by analyzing interactions in the com-
plex of matriptase with HAI-1, crucial interac-
tions that an inhibitor should capture can be
1 Lead compounds
for optimization
identified. In consequence, the strategy for
identifying inhibitors was to first build the
matriptase-HAI-1 Kunitz domain 1 complex,
identify binding regions on matriptase, screen
Figure 6.23. Flowchart showing steps used in lead the NCI 3D database for hits that capture
identification using pharmacophore-based 3D data-
binding groups of HAI-1 to matriptase, and in
base searching.
the end, biochemical testing (Fig. 6.25). The
structure of matriptase was obtained from
source for drug lead discovery. Using the 3D PDB entry lEAW (207). Homology modeling,
pharmacophore from Fig. 6.22, the NCI 3D- as implemented in MODELLER (208, 2091,
database (67) of 206,876 "open compounds" was chosen to build the 3D structure of the
was searched using the program Chem-X Kunitz domain 1 from KSPI. The complex of
(205). The strategy used for identifying leads matriptase with HAI-1 Kunitz domain 1 was
through virtual screening is shown in Fig. built using a combination of manual docking
6.23. During the search each compound was and molecular dynamics refinement with the
first checked as to whether it had the pharma- program CHARMM (210). The obtained bind-
cophore elements and second as to whether it ing mode of HAI-1 Kunitz domain 1 to
had any acceptable conformation matching matriptase (Fig. 6.26) suggests that three re-
the distance requirements. Up to 3 million gions might be important for inhibitor bind-
conformations were examined for each coming. The S1binding site Asp185, which is char-
pound. A total of 4094 compounds, 2% of the acteristic of trypsinlike serine proteases, is the
database, were identified as "hits." This num- specificity pocket used to recognize substrates
ber was further reduced using filters such as with Arg or Lys as P 1 residue. The anionic
molecular weight, structural novelty, simplic- site, defined by Asp96, Asp60.A7and AspGO.B,
ity, diversity, and hydrogen-bond acceptor ni- is the site at which Arg258 from HAI-1 binds.
trogen. Seventy compounds were selected for A hydrophobic region defined by Ile41 and
testing in biochemical assays. Forty-four com- Tyr6O.G might also be important for specific-
pounds displayed more than 20% inhibition at ity of future matriptase inhibitors.
10 pA4 in the [3H]mazindol binding assay, Thus, the active site used for in silico
from which three compounds were chosen for .screening with the program DOCK consti-
deriving an SAR (Fig. 6.24). These results sug- tutes all three binding regions. Energy scoring
gested that the 3D pharmacophore-based da- was used for ranking docked compounds. The
tabase search is an efficient tool for identifying top 2000 compounds were considered for se-
novel DAT inhibitors. lecting potential inhibitors. Given that
Virtual Screening
Figure 6.24. Selected DAT inhibitors identified from the NCI database.
matriptase prefers positively charged residues ecule for one protein molecule (Table 6.4). It
in the P 1 position, inhibitors should also have should be noted that screening results at sin-
positively charged groups to bind efficiently to gle dose and IC,, depend on the protein con-
Asp185 from the S1 site of matriptase (Fig. centration, whereas Ki is concentration inde-
6.27). Note that a more efficient way of doing pendent. From the hits in the screening step
the virtual screening presented above is to do bis-benzamidines were chosen for Ki determi-
a pharmacophore search first followed by nation (Table 6.5) because this class of com-
docking. Thus, 69 compounds were selected pounds could bind to both the S1 site and the
for biochemical testing at 75 p M inhibitor and anionic site. These results show that combin-
matriptase concentration. Initial screening ing a pharmacophore hypothesis with a struc-
showed that 50% of compounds tested pro- ture-based database search can provide an ef-
duced more than 70%inhibition of enzymatic ficient way of identifying leads for a drug
activity when the ratio was one inhibitor mol- design project.
Figure 5.5. GRID probes on
Factor Xa site and the com-
bined resultant complementary
site points that can be used for
pharmacophore fingerprint cal-
culations (lower right).
Figure 5.6. Overview of the Gridding and Partitioning (Gap) procedure a s applied to monomers,
exemplified using phenylalanine a s a potential primary amine. This molecule thus contains two
pharmacophoric groups (the aromatic ring and the carboxylic acid). During the conformational
analysis the locations of these pharmacophoric groups are tracked within a regular grid.
[Reproduced
- from A. R. Leach and M. M. Hann, Drug Discovery Today, 5, 326-336 (2000), with
permission of Elsevier Science.]
posslble products (blue: selected without regard for economy
0 10 0 10
BCUT rnetrlc 1 BCUT metrlc 1
reactant-biased, product-based deslgn

10
0"
0 10
BCUT metrlc 1
Figure 5.13. (a)Avirtual library of 634,721 allowed combinatorial AB products (after filtering out
products that failed Lipinski's Rule of 5 "druglike" criteria) shown in a BCUT chemistry space
specifically chosen to best represent the diversity of the virtual library. (b) The maximally diverse
9600-compound subset of the virtual library, illustrating the results of purely product-based
"library design." Although providing the maximal diversity, synthesis of these 9600 AB products
would require the use of 347 A's and 1024 B's--clearly unacceptable from the perspective of syn-
thetic economy (number of reactants and robotic control). (c) The 9600-compound library resulting
from the traditional, purely reactant-based library design strategy of selecting the 80 most diverse
A's and the 120 most diverse B's. Although providing user-selected synthetic economy, the diversi-
ty of these 9600 AB products is clearly quite poor. (d) The 9600-compound library resulting from
the reactant-biased, product-based (RBPB) algorithm developed by Pearlman and Smith (see Refs.
31,87c and text). The algorithm selected a different set of 80 A's and a different set of 120 B's, thus '
providing the same level of user-selected synthetic economy, while also providing substantially
greater diversity than could be achieved using a purely reactant-based library design strategy.
Figure 5.25. The 3D subspace most recep-

tor relevant for members of the GPCR-PA+ I
family of receptors. Points indicate coordi-
nates of 187 published ligands of various
GPCR-PA+ receptors. Some have been
color-coded by receptor for illustrative pur-
I
poses. See Refs. 32e,i and text for further
details.
-
Figure 5.26. The same 3D
subspace a s in Fig. 5.25,
rotated slightly to provide a
better viewing perspective.
Points indicate coordinates of
about 2000 combinatorial
products selected from 14 dif-
ferent libraries. Color-coding
indicates affinity for the
O'O0:0 1:0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 GPCR-1 receptor. See Refs.
-
Gray: all compounds made for GPCR-I 32e,i and text for further
Blue: 4 0 0 nM Red: 4 0 nM Green: <2 nM details.
Figure 10.2. View of (4) (2,3-DPG)

binding site a t the mouth of the @-cleft
of deoxy hemoglobin.
Figure 10.3. Stereoview of allosteric binding site
in deoxy hemoglobin. A similar compound environ-
ment is observed a t the symmetry-related site, not
shown here. (a) Overlap of four right-shifting
allosteric effectors of hemoglobin: (6a)(RSR13,
yellow), (6b) (RSR56, black), (7a) (MM30, red),
and (7b)(MM25, cyan). The four effectors bind at
the same site in deoxy henoglobin. The stronger
acting RSR compounds differ from the much weak-
er MM compounds by reversal of the amide bond
located between the two phenyl rings. As a result,
in both RSR13 and RSR56, the carbonyl oxygen
faces and makes a key hydrogen bonding interac-
tion with the amine of aLys99. In contrast, the car-
bony1 oxygen of the MM compounds is oriented
away from aLys99 amine. The aLys99 interaction
with the RSR compounds appear to be critical in
the allosterie differences. (b) Detailed interactions
between RSR13 (6a) and hemoglobin, showing key
hydrogen bonding interactions that help constrain
the T-state and explain the allosteric nature of the
compound and those of other related compounds.
Figure 10.4. Stereoview of superimposed binding

b
a2K99
Figure 10.6. Stereoview of the binding site for

sites for (8b) (5-FSA, yellow) and (8a) (DMHB, (n = 3, TB36, yellow) in deoxy Hb. A similar co
magenta) in deoxy hemoglobin. A similar compound environment is observed at the symme
pound enviroment is observed at the symmetry- related site, not shown here. One aldehyde is
related site and therefore not shown here. Both lently attached to the N-terminal alVall, where
compounds form a Schiff base adduct with the the second aldehyde is bound to the opposite s
alVall N-terminal nitrogen. Whereas the m-car- unit, a2Lys99 ammonium ion. The carbo
boxylate of 5-FSA forms a salt bridge with the the first aromatic ring forms a bidentate
a2Arg141 (opposite subunit), this intersubunit bond and salt bridge with the gu
bond is missing in DMHB. The added constraint to a2Arg141of the opposite subunit. The effector
the T-state by 5-FSA that ties two subunits togeth- ties two subunits together and adds additional
er shifts the allosteric equilibrium to the right. On straints to the T-state, resulting in a shift in the
the other hand, the binding of DMHB does not add allosteric equilbrium to the right. The magnitu
to the T-state constraint. Instead, it disrupts any constraint placed on the T-state by the ms
T-state salt- or water-bridge interactions between aLys99 varies with the flexibility of the
the opposite a-subunits. The result is a left shift of Shorter bridging chains form tighter cross
the oxygen equilibrium curve by DMHB. yeild larger shifta in the allosteric equilibrium.
Figure 10.6. Binding
site for (10) (N10-
propynyl-5,8-
dideazafolate), within
the active site of
thymidylate synthase
8 from Escherichia coli.
Bound contorrnatlon ot The surface of the
inhibitor is shown in
the leR view. The red
spheres in the left
view are tightly bound
water molecules.
Figure 10.9. (b) Active site with bound (31)

[saquinavir (PDB code 1HXB)I. Note the asym-
metry of inhibitor binding. The flap water that
is shown very close to saquinavir is labeled W.
Figure 10.10. Comparison of the structures of HN-

P apoenzyme monomer (top, PDB code 3PHV) and
the wmplex between HN-P and (32) (U-85548; bot-
tom, PDB wde 8HVP). The inhibitor is shown as a
ball and stick structure. Note the rearrangement of
the flap residues; Ile50 is indicated for reference.
The van der Waals surface of Asp25 is shown in both
structures. The flap water (red ball) is also shown
between Ile50 and U-85548. In the bottom struc-
ture, the locations of the N and C termini of HN-P
are noted.
Figure 10.11. Orthogonal views of
the complex between HIV-P and (32)
(U-85548). The view in panel a is
rotated approximately 90" (around
the long axis of the protein) from the
view in panel b. Van der Waals sur-
faces of Asp25, Asp25', and the flap
water (W) are shown. In panel b, the
solvent-accessible surface of the
inhibitor is shown.
Figure 10.17. Structure of

rhinovirus capsid protein
VP1 showing the bound con-
formation of antiviral isoxa- (b) ,
zole compounds (78) [dis-
oxaril, WIN-51711: panel a,
top], (79) [WIN-54954: panel
b, middle], and (80) [ple-
conaril, WIN-63843: panel c,
bottom]. The PDB codes for
the X-ray structural model
coordinates used to create
\
these views are: lPIV (for
781, 2HWE (for 79), and
1C8M (for 80). On the left (c)
side of each panel, the
inhibitors are shown as van
der Waals surfaces, and the
protein as a ribbon diagram.
On the right side, the struc-
tures of the inhibitor alone
are shown, from the same
view, as ball and stick repre-
sentations.
&ere). Hydrogen bonds (dotted lines) are shown between the backbone amide of Met109 and the
inhibitor's pyrimidinyl nitrogen, and between the +amino of Lys53 and the inhibitor's imidazole
N3. This figure is based on the PDB coordinate set 1A9U (187).
Figure 11.6. Three density maps at tlil'kring resolutio~ls:a: 1 .:3 A: h, 2.1 A: c. :(.O A
Figure 11.7. (b) Structure of the LuxS

monomer highlighting the bound zinc ion
(magenta) and methionine (green).
1
71
**rl
D
inn A
Figure 14.6. Examples of macromolecules studied by cryo-EM and 3D image reconstruction and the
resulting 3D structures (bottom row) after q o - E M analysis. All micrographs (top row) are displayed at
above 170,000X magniscation and all models at about 1,200,000X magnification. (a) A single particle
without symmetry: The micrograph shows 70s E. coli ribosomes complexed with mRNA and fMet-
tRNA. The surface-shaded density map, made by averaging 73,000 ribosome images from 287 micro-
graphs has a resolution (FSC)of 11.5 & The 50s and 30s subunits and the tRNA are colored blue, yel-
low, and green, respectively. The identity of many of the subunits is known as some RNA double helices
are clearly recognizable by their major and minor grooves (e.g., helix 44 is shown in red). [Courtesy of
J. Frank (SUNY, Albany), using data h m Gabashvili et al. (86).1 (b) A single particle with symmetry:
The micrograph shows hepatitis B virus cores. The 3D reco~wtruction,at a resolution of 7.4 A (DPR),
was computed from 6384 particle images taken from 34 micrographs. From Bottcher et.al. (44).] (c)A
helical filament: The micrograph shows adin filaments decorated with myosin S1heads containing the
essential light chain. The 3D reconstruction, at a resolution of 30-35 is a composite in which the dif-
ferently colored parts are derived from a series of difference maps that were superimposed on f-actin.
The components include: f-actin (blue), myosin heavy chain motor domain (orange), gssential light chain
(purple), regulatory light chain (white), tropomyosin (green), and myosin motor domain N-termind
beta-barrel (red). [Courtesy of A. Lin, M. Whittaker, and R. Milligan (Scripps Research Institute,
LaJolla, CA).] (d) A 2D crystal, light-harvesting complex LHCII a t 3.44 resolution. The model shows
the protein backbone and the arrangement of chromophores in a number of trimeric subunits in the
crystal lattice. In this example, image contrast is too low to see any hint of the structure without image
processing (see also Fig. 14.3). [Courtesy of W. Kiihlbrandt (Max-Planck-Institute for Biophysics,
Frankfurt, Germany).]
Figure 15.35. GRAB peptidomimetics in action.

-
Figure 5.26. The same 3D
subspace a s in Fig. 5.25,
rotated slightly to provide a
better viewing perspective.
Points indicate coordinates of
about 2000 combinatorial
products selected from 14 dif-
ferent libraries. Color-coding
indicates affinity for the
O'O0:0 1:0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 GPCR-1 receptor. See Refs.
-
Gray: all compounds made for GPCR-I 32e,i and text for further
Blue: 4 0 0 nM Red: 4 0 nM Green: <2 nM details.
Figure 10.2. View of (4) (2,3-DPG)

binding site a t the mouth of the @-cleft
of deoxy hemoglobin.
Virtual Screening
Figure 6.27. Potential functional groups

in inhibitors necessary to block the speci-
ficity pocket (S1 binding site) of matriptase. S1 binding site
The choice reflects the observation that
matriptase prefers substrates with Lys or
Arg as P1 residue.
ity needs to be included in high-throughput easy. For instance the necessity of having cer-
docking. Scoring functions have to improve to tain features like salt bridges formed on ligand
make consistently correct predictions of puta- binding [e.g., in influenza virus neuramini-
tive protein-ligand binding affinities. Scoring dase (21111 or other prevalent information
functions, calibrated to reproduce experimen- (e.g., hinge region binding for many ATP com-
tal data, have unreliable performance outside petitive kinase inhibitors) greatly helps to re-
their training set. Thus, de novo methods us- duce the number of compounds subjected to
ing terms describing the thermodynamics of docking experiments.
binding should replace the first generation of The missing robustness of many structure-
scoring functions. In consequence, some of the based dockinglscoring techniques opens the
speed gained from low cost parallel computing questions of when should one apply it and
should be invested into higher accuracy scor- when should one retreat to pharmacophore-
ing rather than higher throughput. One way based virtual screening. In many cases it
of increasing throughput is to keep the num- makes sense to prescreen virtual libraries us-
ber of compounds docked as small as possible ing pharmacophore techniques, particularly if
by using every bit of knowledge one has to one uses shape representations of the receptor
prefilter the database, mainly based on phar- site, such as volume-exclusion spheres, a
macophore information. In some cases this is pharmacophore search can be a very effective
prefilter. Also, in cases where receptor-site
Table 6.4 Results Obtained from the Initial flexibility is problematic, pharmacophore
Screening of Compounds against Matriptase searching may be less restrictive (unless one
(206Y tries to deal with protein flexibility in the
% Inhibition Number of Compounds docking routine-a task that is not easy, usu-
ally not applied today, and another future di-
Over 95% 15
rection of development in virtual screening).
90-94% 4
70-89% 15 The above tools and pathways show a sim-
40-69% 13 ple and inexpensive way of discovering novel
Below 39% 17 lead chemical matter for drug discovery pro-
High absorbency 3 grams. However, there are many hurdles to
Increased activity 3 overcome to make virtual screening success-
"Testing was done at 75 pi14 compound and protein con-
ful. The properties of druglikeness may not be
centration. The ratio between compound and protein molar understood sufficiently enough, resulting in
concentration was 1:1. poor pharmacokinetics of the compounds: ex-
273
Table 6.5 Ki Values Obtained for Tested bis-Benzamidinesagainst Matriptase
H 2 N ~ o - ( c H 2 ) 6 - o
HN
Structure
aNr K, (nM)
924
H 2 N ~ o - ( c H 2 ) 6 - o ~ ~ ~
H2Nmy
HN
I
H 2 N ~ 0 - ( c H 2 ) 5 - O a r
HN
\ /
HN
' VHNH OH \ / NH2
H 2 N
'535
HN
H2N0(cH
HN
Br
\ /
Br
>10,000
H 2 - N \ /o
~ ~ \ /o ~ ~ , 208 "
HN
isting SARs that lead to the generation of On one hand, there are obviously many
pharmacophore models may bias the pharma- risks involved in virtual screening, many as-
cophore toward a narrow segment of com- sumptions made, and a positive outcome not
pounds; structural information of the target at all guaranteed in each and every case. On
protein is often not available; and homology the other hand, however, the overall process is
models may not be precise enough. Current extremely cost effective and fast. Even if suc-
scoring functions are often not robust enough cessful in only a few cases, virtual screening
to separate actives from inactives. Compounds can produce leads that may otherwise not
identified may not be easy to synthesize. Hits have surfaced and so add immense value to a
may not be selective or patentable. drug discovery program. Especially in cases
Virtual Screening
where high throughput screening cannot 14. H. J. Bohm and G. Schneider, Virtual Screen-
identify a viable lead chemical matter, virtual ing for Bioactive Molecules, Wiley-VCH, Wein-
screening applied to vendor databases or combi- heim, 2000.
natorial libraries to be synthesized presents a 15. A. C. Good, S. R. Krystek, and J. S. Mason,
cost-effective alternative. Mainly because of its Drug Discovery Today, 5 (Suppl.), S61 (2000).
speed, cost effectiveness, ease of setup, and in- 16. T. Langer and R. D. Hoffmann, Curr. Pharm.
creasing robustness, we expect virtual screening Design, 7, 509 (2001).
to become a mainstream approach throughout 17. B. Waszkowycz, T. D. J. Perkins, R. A. Sykes,
the pharmaceutical industry. and J. Li, IBM Systems J.,40,360 (2001).
18. A. Good, Curr. Opin. Drug Discovery Dev., 4,
301 (2001).
5 ACKNOWLEDGMENTS 19. R. F. Burns, R. M. A. Simmons, J. J. Howbert,
D. C. Waters, P. G. Threlkeld, and B. D. Gitter,
The authors thank Dr. Matthias Rarey for Exploiting Molecular Diversity, Symposium
valuable discussions. Proceedings, Vol. 2, Cambridge Healthtech In-
stitute, San Diego, 1995, p. 2.
20. W. P. Walters, Ajay, and M. A. Murcko, Curr.
REFERENCES Opin. Chem. Biol., 3,384 (1999).
1. D. J. Abraham, Intra-Science Chem. Rept., 8 , l 21. W. P. Walters and M. A. Murcko, Methods
(1974). Principles Med. Chem., 10, 15 (2000).
2. M. Perutz, Protein Structure. New Approaches 22. D. E. Clark and S. D. Pickett, Drug Discovery
to Disease and Therapy, Freeman, New York,
Today, 5 , 4 9 (2000).
1992.
23. B. L. Podlogar, I. Muegge, and L. J. Brice, Curr.
3. S. W. Fesik, J. Med. Chem., 34,2937 (1991).
Opin. Drug Discovery Dev., 4, 102 (2001).
4. D. W. Cushman, H. S. Cheung, E. F. Sabo, and
24. I. Muegge, Chem. Eur. J.,8, 1976 (2002).
M. A. Ondetti, Biochemistry, 16,5484 (1977).
25. Ajay, W. P. Walters, and M. A. Murcko, J. Med.
5. L. F. Kuyper, B. Roth, D. P. Baccanari, R. Fer-
Chem., 41,3314 (1998).
one, C. R. Bedell, J. N. Champness, D. K. Stam-
mers, J. G. Dann, F. E. Norrington, D. J. 26. Ajay, G. W. Bemis, and M. A. Murcko, J. Med.
Blaker, and P. J. Goodford, J. Med. Chem., 25, Chem., 42,4942 (1999).
1120 (1982). 27. J. Sadowski and H. Kubinyi, J. Med. Chem.,
6. M. A. Gallop, R. W. Barrett, W. J. Dower, 41,3325 (1998).
S. P. A. Fodor, and E. M. Gordon, J. Med. 28. M. Wagener and V. J. vanGeerestein, J. Chem.
Chem., 37,1233 (1994). Inf. Comput. Sci., 40,280 (2000).
7. E. M. Gordon, R. W. Barrett, W. J. Dower, 29. V. J. Gillet, P. Willett, and J. Bradshaw,
S. P. A. Fodor, and M. A. Gallop, J. Med. J. Chem. Inf. Comput. Sci., 38,165 (1998).
Chem., 37, 1385 (1994). 30. Comprehensive Medicinal Chemistry is avail-
8. M. W. Lutz, J. A. Menius, T. D. Choi, R. G. able from MDL Information Systems Inc., San
Laskody, P. L. Domanico, A. S. Goetz, and D. L. Leandro, CA 94577 and contains drugs already
Saussy, Drug Discovery Today, 1,277 (1996). on the market.
9. H. J. Bohm and G. Klebe, Angew. Chem. Int. 31. World Drug Index is available from Derwent
Ed. Engl., 35,2589 (1996). Information, London, UK. Website: www.der-
10. W. P. Walters, M. T. Stahl, and M. A. Murcko, went.com.
Drug Discovery Today, 3,160 (1998). 32. MACCS-I1 Drug Data Report is available from
11. Y. C. Martin, Perspect. Drug Discovery Des., MDL Information Systems Inc., San Leandro,
718, 159 (1997). CA 94577 and contains biologically active com-
12. A. C. Good and J. S. Mason in K. B. Lipkowitz pounds in the early stages of drug develop-
and D. B. Boyd, Eds., Reviews in Computa- ment.
tional Chemistry, Vol. 7, VCH, New York, 33. C. A. Lipinski, F. Lombardo, B. W. Dominy,
1995, p. 67. and P. J. Feeney, Adv. Drug Delivery Rev., 23,
13. G. Klebe, Virtual Screening: An Alternative or 3 (1997).
Complement to High Throughput Screening?, 34. T. I. Oprea, J. Cornput.-Aided Mol. Des., 14,
Kluwerffiscom, Leiden, 2000. 251 (2000).
J. Kelder, P. D. J. Grootenhuis, D. M. Bayada, 53. J. H. Van Drie, D. Weininger, and Y. C. Martin,
L. P. C. Delbressine, and J. P. Ploemen, J. Cornput.-Aided Mol. Des., 3,225 (1989).
Pharm. Res., 16, 1514 (1999). 54. E. Fischer, Ber. Dtsch. Chem. Ges., 27, 2985
M. Hann, B. Hudson, X. Lewell, R. Lifely, L. (1894).
Miller, and N. Ramsden, J. Chem. Znf. Comput. 55. P. Ehrlich, Ber. Dtsch. Chem. Ges., 42, 17
Sci., 39,897 (1999). (1909).
National Toxicology Program. http://ntp- 56. P. Gund, Prog. Mol. Subcell. Biol., 5, 117
server.niehs.nih.gov. (1977).
RTECS C2(96-4); National Institute for Occu- 57. F. Bernstein, T. F. Koetzle, G. J. B. Williams,
pational Safety and Health (NIOSH), U.S. De- E. F. Meyer, Jr., M. D. Brice, J. R. Rodgers, 0.
partment of Health and Human Services: Kennard, T. Schimanouchi, and M. J. Tasumi,
Washington, DC, 1996. URL: http://www. J. Mol. Biol., 112, 535 (1977).
ccohsxa. 58. H. M. Berman, J. Westbrook, Z. Feng, G. Gilli-
G. Klopman and H. S. Rosenkranz, Mutat. land, T. N. Baht, H. Weissig, I. N. Shindyalov,
Res., 305, 33 (1994). and P. E. Bourne, Nucleic Acids Res., 28, 235
K. Enslein, V. K. Gombar, and B. W. Blake, (2000).
Mutat. Res., 305,47 (1994). 59. M. K. Dreyer, D. R. Borcherding, J. A. Dumont,
Lhasa Ltd., School of Chemistry, University of N. P. Peet, J. T. Tsay, P. S. Wright, A. J. Bi-
Leeds, Leeds, UK. URL: http://www.chem. tonti, J. Shen, and S.-H. Kim, J. Med. Chem.,
leeds.ac.uWLUWdereWindex.htm1. 44, 524 (2001).
Available Chemicals Directory is available 60. G. R. Marshall, C. D. Barry, H. E. Bosshard,
from MDL Information Systems Inc., San Le- R. A. Dammkoehler, and D. A. Dunn in E. C.
andro, CA 94577 and contains specialty bulk Olson and R. E. Christoffersen, Eds., Com-
chemicals from commercial sources. Website: puter Assisted Drug Design, American Chemi-
http://www.mdli.com. cal Society, Washington, 1979, p. 205.
61. J. R. Sufrin, D. A. Dunn, and G. R. Marshall,
V. N. Viswanadhan, A. K. Ghose, G. R. Revan-
Mol. Pharmacol., 19,307 (1981).
kar, and R. K. Robins, J. Chem. Znf. Comp. Sci.,
29,163 (1989). 62. R. D. Cramer, D. E. Patterson, R. D. Clark, F.
Soltanshahi, and M. S. Lawless, J. Chem. Inf.
G. W. Bemis and M. A. Murcko, J.Med. Chem.,
Comput. Sci., 38, 1010 (1998).
39,2887 (1996).
63. D. Horvath in A. K. Ghose and V. N. Viswa-
G. W. Bemis and M. A. Murcko, J.Med. Chem., nadhan, Eds., Combinatorial Library Design
42,5095 (1999). and Evaluation. Principles, Software Tools,
X. Q. Lewell, D. B. Judd, S. P. Watson, and and Applications in Drug Discovery, Marcel
M. M. Hann, J. Chem. Znf. Comput. Sci., 38, Dekker, New York, 2001, p. 429.
511 (1998). 64. A. Dalby, J. G. Nourse, W. D. Hounshell,
, I. Muegge, D. Brittelli, and S. L. Heald, J. Med. A. K. I. Gushurst, D. L. Grier, B. A. Leland, and
Chem., 44, 1841 (2001). J. Laufer, J. Chem. Znf. Comput. Sci., 32, 244
, B. L. Podlogar and I. Muegge, Curr. Top. Med. (1992).
Chem., 1,257 (2001). 65. Spresi Chemical Database, InfoChem GMBH,
, R. M. Brunne, G. Hessler, and I. Muegge in Grobenzell, Germany and Daylight Chemical
K. C. Nicolaou, R. Hanko, and W. Hartwig, Information Systems, Irvine, CA (2002).
Eds., Handbook of Combinatorial Chemistry, 66. The Chemical Abstracts Database, Chemical
Vol. 2, Wiley-VCH, Weinheim, 2002, p. 761. Abstracts Service, 2540 Olentangy River Road,
. B. E. Evans, K. E. Rittle, M. G. Bock, R. M. PO Box 3012, Columbus, OH (2002).
DiPardo, R. M. Freidinger, W. L. Whitter, G. F. 67. G. W. A. Milne, M. C. Nicklaus, J. S. Driscoll, S.
Lundell, D. F. Veber, P. S. Anderson, R. S. L. Wang, and D. W. Zaharevitz, J. Chem. Znf.
Chang, V. J. Lotti, D. J. Cerino, T. B. Chen, Comput. Sci., 34, 1219 (1994).
P. J. Kling, K. A. Kunkel, J. P. Springer, and J. 68. G. W. A. Milne and J. A. Miller, J. Chem. Znf.
Hirshfield, J. Med. Chem., 31, 2235 (1988). Comput. Sci., 26, 154 (1986).
. J. S. Mason, I. Morize, P. R. Menard, D. L. 69. D. Weininger, J. Chem. Inf. Comput. Sci., 28,
Cheney, C. Hulme, and R. F. Labaudiniere, 31 (1988).
J.Med. Chem., 42,3251 (1999). 70. D. Weininger, A. Weininger, and J. L. Wein-
. P. J. Hajduk, M. Bures, J. Praestgaard, and inger, J. Chem. Znf. Comput. Sci., 29, 97
S. W. Fesik, J. Med. Chem., 43,3443 (2000). (1989).
Virtual Screening
71. F. H. Allen, S. Bellard, M. D. Brice, B. A. Cart- 91. R. D. Hull, E. M. Fluder, S. B. Singh, R. B.

wight, A. Doubleday, H. Higgs, T. Hum- Nachbar, S. K. Kearsley, and R. P. Sheridan,
melinkPeters, 0. Kenard, W. D. S. Mother- J. Med. Chem., 44, 1185 (2001).
well, J. R. Rodgers, and D. G. Watson, Acta 92. R. D. Hull, S. B. Singh, R. B. Nachbar, R. P.
Crystallogr. B, 35,2331 (1979). Sheridan, S. K. Kearsley, and E. M. Fluder,
72. R. L. DesJarlais in P. S. Charifson, Ed., Prac- J. Med. Chem., 44, 1177 (2001).
tical Application of Computer-Aided Drug De- 93. B. T. Hoffman, T. Kopajtic, J. L. Katz, and
sign, Vol. 1, Marcel Dekker, New York, 1997, p. A. H. Newman, J. Med. Chem., 43, 4151
73. (2000).
73. A. RusinkoIII, J. M. Skell, R. Balducci, C. M. 94. A. P. Kozikowski, M. K. E. Saiah, K. M. John-
McGarity, and R. S. Pearlman, CONCORD, son, and J . S. Bergmann, J. Med. Chem., 38,
University of Texas, Austin, TX and Tripos As- 3086 (1995).
sociates, St. Louis, MO (1988).
95. S. Wang, S. Sakamuri, I. J. Enyedy, A. P.
74. J. Gasteiger, C. Rudolph, and J. Sadowski, Tet- Kozikowski, 0. Deschaux, B. C. Bandyo-
rahedron Comput. Methodol., 3, 537 (1990). padhyay, S. R. Tella, W. A. Zaman, and K. M.
75. D. Weininger, Rubicon, Daylight Chemical In- Johnson, J. Med. Chem., 43,351 (2000).
formation Systems, Irvine, CA (1995). 96. D. Barnum, J. Greene, A. Smellie, and P.
76. Omega, Open Eye, Santa Fe, NM (2002). Sprague, J. Chem. Inf. Comput. Sci., 36, 563
77. D. P. Dolata, A. R. Leach, and K. Prout, (1996).
J. Cornput.-Aided Mol. Des., 1, 73 (1987). 97. P. W. Sprague in K. Muller, Ed., Perspectives in
78. A. R. Leach and K. Prout, J. Comput. Chem., Drug Discovery and Design, Vol. 1, ESCOM
11,1193 (1990). Science Publishers B.V., Leiden, 1995, p. 1.
79. W. T. Wipke and M. A. Hahn, Artif. Zntell. 98. Y. C. Martin, M. G. Bures, E. A. Danaher, J.
Appl. Chem., 306,136 (1986). DeLazzer, I. Lico, and P. A. Pavlik, J. Cornput.-
Aided Mol. Des., 7,83 (1993).
80. G. Klebe and T. Mietzner, J. Cornput.-Aided
Mol. Des., 8,583 (1994). 99. D. Beusen, Alignment of angiotensin I1 recep-
tor antagonists using GASP, Tripos Technical
81. E. V. Gordeeva, A. R. Katritzky, V. V. Shcher-
Notes 1(4),November 1996.
bukhin, and N. S. Zifirov, J. Chem. Znf. Com-
put. Sci., 33, 102 (1993). 100. Flo96, Thistlesoft, Colebrook, CT (1996).
82. S. K. Kearsley, D. J. Underwood, R. P. Sheri- 101. V. Golender, B. Vesterman, 0.Eliyahu, A. Kar-
d m , and M. D. Miller, J. Cornput.-Aided Mol. dash, M. Kletzin, M. Shokhen, and E. Vo'tpa-
Des., 8, 153 (1994). gel, Conference Proceedings Barcelona 10th
European Symposium on Structure-Activity
83. J. M. Barnard, J. Chem. Znf. Comput. Sci., 33, Relationships, Vol. 10, Prous Science, Barce-
532 (1993). lona, Spain, 1994, p. 246.
84. I. J. Enyedy, J. Wang, W. A. Zaman, K. M. 102. ROCS, Open Eye, Santa Fe, NM (2002).
Johnson, and S. Wang, Bioorg. Med. Chem.
Lett., 12, 1775 (2002). 103. 0. F. Guner, Pharmacophore Perception, De-
velopment, and Use in Drug Design, Interna-
85. G. M. Downs and P. Willett in K. B. Lipkowitz tional University Line, La Jolla, CA, 2000.
and D. B. Boyd, Eds., Reviews in Computa-
tional Chemistry, VCH, New York, 1996, p. 1. 104. H . J . Bohm, J. Cornput.-Aided Mol. Des., 6 ,
593 (1992).
86. P. Willett, J. M. Barnard, and G. M. Downs,
J. Chem. Znf Comput. Sci., 38,983 (1998). 105. H. Bohm, J. Cornput.-Aided Mol. Des., 6, 61
(1992).
87. L. Xue, J. W. Godden, and J. Bajorath,
J. Chem. Znf. Comput. Sci., 39,881 (1999). 106. H. Bohm, J. Cornput.-Aided Mol. Des., 8, 623
(1994).
88. L. Xue, J. W. Godden, and J. Bajorath,
J. Chem. Znf Comput. Sci., 40, 1227 (2000). 107. R. Wang, Y. Gao, and L. Lai, J. Mol. Model., 6,
498 (2000).
89. L. Xue, F. L. Stahura, J . W. Godden, and J.
Bajorath, J. Chem. Znf: Comput. Sci., 41, 394 108. Cerius2, Accelrys, San Diego, CA (2002).
(2001). 109. S. K. Burley and G. A. Petsko, J. Am. Chem.
90. S. K. Kearsley, S. Sallamack, E. M. Fluder, SOC.,108, 7995 (1986).
J. D. Andose, R. T. Mosley, and R. P. Sheridan, 110. H. A. Carlson, K. M.Masukawa, K. Rubins,
J. Chem. Znf Comput. Sci., 36, 118 (1996). F. D. Bushman, W. L. Jorgensen, R. D. Lins,
J. M. Briggs, and J. A. McCammon, J. Med. 132. A. Vedani, P. Zbinden, J. P. Snyder, and P. A.
Chem., 43,2100 (2000). Greenidge, J. Am. Chem. Soc., 117, 4987
R. P. Sheridan, R. Nilakantan, A. Rusinko 111, (1995).
N. Bauman, K. S. Haraki, and R. Venkat- 133. M. C. Nicklaus, S. Wang, J. S. Driscoll, and
araghavan, J. Chem. Znf. Comput. Sci., 29,255 G. W. A. Milne, Bioorg. Med. Chem., 3, 411
(1989). (1995).
UNITY version 6.6; Tripos Associates, St. 134. H. A. Carlson and J. A. McCammon, Mol.
Louis, MO. Website: http://www.tripos.com. Pharmacol., 57,213 (2000).
MDL Information Systems, Inc., San Leandro, 135. R. M. A. Knegtel, I. D. Kuntz, and C. M. Os-
CA. hiro, J. Mol. Biol., 266,424 (1997).
J. Greene, S. Kahn, H. Savoj, P. Sprague, and 136. E. A. Sudbeck, C. Mao, R. Vig, T. K. Venkat-
S. Teig, J. Chem. Znf. Comput. Sci., 34, 1297 achalam, L. Tuel-Ahlgren, and F. M. Uckun,
(1994). Antimicrob. Agents Chemother., 42, 3225
W.-D. Ihlenfeldt, J. H. Voigt, B. Bienfait, F. (1998).
Oellien, and M. C. Nicklaus, J. Chem. Znf. 137. C. Bron and J. Kerbosch, Commun. ACM, 16,
Comput. Sci., 42,46 (2002). 575 (1973).
X . Fang and S. Wang, J. Chem. Inf. Comput. 138. Y. Lamdan and H. J. Wolfson in Proceedings of
Sci., 42, 192 (2002). the IEEE International Conference on Com-
puter Vision, 1988, p. 238.
K. S. Haraki, R. P. Sheridan, R. Venkataragha-
van, D. A. Dunn, and R. McCulloch, Tetrahe- 139. F. S. Kuhl, G. M. Crippen, and D. K. Friesen,
dron Comput. Methodol., 3,565 (1990). J. Comput. Chem., 5, 24 (1984).
J. A. Grant, M. A. Gallardo, and B. T. Pickup, 140. F. M. Richards, Annu. Rev. Biophys. Bioeng.,
J. Comput. Chem., 17, 1653 (1996). 6, 151 (1977).
T. E. Mook, D. R. Henry, A. G. Ozkabak, and 141. M. L. Connolly, J. Appl. Crystallogr., 16, 548
M. Alamgir, J. Chem. Znf. Comput. Sci., 34, (1983).
184 (1994). 142. M. L. Connolly, J. Appl. Crystallogr., 18, 499
(1985).
T. Hurst, J. Chem. Znf. Comput. Sci., 34, 190
(1994). 143. B. K. Shoichet, R. M. Stroud, D. V. Santi, I. D.
Kuntz, and K. M. Perry, Science, 259, 1445
N. W. Murrall and E. K. Davies, J. Chem. Znf.
(1993).
Comput. Sci., 30,312 (1990).
144. B. K. Shoichet, D. L. Bodian, and I. D. Kuntz,
D. E. Clark, P. Willett, and P. W. Kenny, J.
J. Comput. Chem., 13,380 (1992).
Mol. Graph., 11, 146 (1993).
145. E. C. Meng, D. A. Gschwend, J. M. Blaney, and
D. E. Clark, G. Jones, and P. Willett, J. Chem. I. D. Kuntz, Proteins: Struct., Funct., Genet.,
Inf Comput. Sci., 34, 197 (1994). 17,266 (1993).
J. M. Blaney and J. S. Dixon, Perspect. Drug 146. T. J. A. Ewing and I. D. Kuntz, J. Comput.
Discovery Des., 15, 301 (1993). Chem., 18,1175 (1997).
I. Muegge and M. Rarey in D. B. Boyd and K. B. 147. E. C. Meng, B. K. Shoichet, and I. D. Kuntz,
Lipkowitz, Eds., Reviews in Computational J. Comput. Chem., 13,505 (1992).
Chemistry, Vol. 17, Wiley-VCH, New York,
148. E. C. Meng, I. D. Kuntz, D. J. Abraham, and
2001, p. 1.
G. E. Kellogg, J. Cornput.-Aided Mol. Des., 8,
G. Jones, P. Willett, R. C. Glen, and A. R. 299 (1994).
Leach, J. Mol. Biol., 267, 727 (1997). 149. D. A. Gschwend and I. D. Kuntz, J. Cornput.-
I. Muegge, Med. Chem. Res., 9,490 (1999). Aided Mol. Des., 10, 123 (1996).
D. E. Koshland, Proc. Natl. Acad. Sci. USA, 44, 150. B. K. Shoichet, A. R. Leach, and I. D. Kuntz,
98 (1958). Proteins: Struct., Funct., Genet., 34, 4 (1999).
D. W. Christianson and W. N. Lipscomb, Acc. 151. X. Q. Zou, Y. X. Sun, and I. D. Kuntz, J. Am.
Chem. Res., 22,62 (1989). Chem. Soc., 121,8033 (1999).
C. Sander and R. Schneider, Proteins: Struct., 152. M. D. Miller, S. K. Kearsley, D. J. Underwood,
Funct., Genet., 9, 56 (1991). and R. P. Sheridan, J. Cornput.-Aided Mol.
T. L. Blundell, B. L. Sibanda, M. J. E. Stern- Des., 8, 153 (1994).
berg, and J. M. Thornton, Nature, 326, 347 153. T. F. Havel, I. D. Kuntz, and G. M. Crippen,
(1987). Bull. Math. Biol., 45, 665 (1983).
Virtual Screening
154. S. Kirkpatrik, C. D. J. Gelatt, andM. P. Vecchi, 174. S. J. Weiner, P. A. Kollman, D. T. Nguyen, and
Science, 220,671 (1983). D. A. Case, J. Comput. Chem., 7,230 (1986).
155. D. S. Goodsell and A. J. Olson, Proteins: 175. P. J. Goodford, J. Med. Chem., 28,849 (1985).
Struct., Funct., Genet., 8, 195 (1990). 176. D. Qui, P. S. Shenkin, E. P. Hollinger, and
156. T. P. Lybrand in D. B. Boyd and K. B. Lipko- W. C. Still, J. Phys. Chem., 101, 3005 (1997).
witz, Eds., Reviews in Comptational Chemis- 177. D. Eisenberg and A. D. McLachlan, Nature,
try, Vol. 1, VCH, New York, 1990, p. 295. 319,199 (1986).
157. J . A. Given and M. K. Gilson, Proteins: Struct., 178. P. F. W. Stouten, C. Frommel, H. Nakamura,
Funct., Genet., 33,475 (1998). and C. Sander, Mol. Simul., 10,97 (1993).
158. T. N. Hart and R. J. Read, Proteins: Struct., 179. S. Vajda, Z. Weng, R. Rosenfeld, and C. DeLisi,
Funct., Genet., 13, 206 (1992). Biochemistry, 33, 13977 (1994).
159. C. McMartin and R. S. Bohacek, J. Cornput.- 180. M. Rarey, B. Kramer, T. Lengauer, and G.
Aided Mol. Des., 11, 333 (1997). Klebe, J. Mol. Biol., 261, 470 (1996).
160. A. Wallqvist and D. G. Covell, Proteins: Struct., 181. H.-J. Bohm, J. Cornput.-Aided Mol. Des., 8,
Funct., Genet., 25, 403 (1996). 243 (1994).
161. R. Abagyan, M. Totrov, and D. Kuznetsov, 182. H. J. Bohm, J. ComputAided Mol. Des., 12,
J. Comput. Chem., 15,488 (1994). 309 (1998).
162. J . Apostolakis, A. Pluckthun, and A. Caflisch, 183. B. Kramer, G. Metz, M. Rarey, and T. Len-
J. Comput. Chem., 19,21 (1998). gauer, Med. Chem. Res., 9, 463 (1999).
163. P. S. Charifson, J. J. Corkery, M. A. Murcko, 184. I. Muegge, Perspect. Drug Discovery Des., 20,
and W. P. Walters, J. Med. Chem., 42, 5100 99 (2000).
(1999). 185. I. Muegge andY. C. Martin, J. Med. Chem., 42,
164. C. A. Baxter, C. W. Murray, D. E. Clark, D. R. 791 (1999).
Westhead, and M. D. Eldridge, Proteins: 186. I. Muegge, J. Comput. Chem., 22,418 (2001).
Struct., Funct., Genet., 33, 367 (1998). 187. I. Muegge, Y. C. Martin, P. J. Hajduk, and
165. J. Wang, P. A. Kollman, and I. D. Kuntz, Pro- S. W. Fesik, J. Med. Chem., 42, 2498 (1999).
teins: Struct., Funct., Genet., 36, 1 (1999). 188. I. Muegge and B. Podlogar, Quant. Struct.-Act.
166. D. Hoffmann, B. Kramer, T. Washio, T. Stein- Relat., 20,215 (2001).
metzer, M. Rarey, and T. Lengauer, J. Med. 189. S. Ha, R. Andreani, A. Robbins, and I. Muegge,
Chem., 42,4422 (1999). J. Cornput.-Aided Mol. Des., 14, 435 (2009.
167. J. S. Dixon, Proteins: Struct., Funct., Genet., 190. M. D. Eldridge, C. W. Murray, T. R. Auton,
Suppl., 1, 198 (1997). G. V. Paolini, and R. P. Mee, J. Cornput.-Aided
Mol. Des., 11,425 (1997).
168. P. D. J. Grootenhuis and P. J. M. vanGalen,
Acta Crystallogr. D, 51, 560 (1995). 191. C. W. Murray, T. R. Auton, and M. D. Eldridge,
J. Cornput.-Aided Mol. Des., 12,503 (1998).
169. M. K. Holloway, J. M. Wai, T. A. Halgren,
P. M. D. Fitzgerald, J. P. Vacca, B. D. Dorsey, 192. R. X. Wang, L. Liu, L. H. Lai, andY. Q. Tang, J.
R. B. Levin, W. J. Thompson, L. J. Chen, S. J. Mol. Model., 4, 379 (1998).
deSolms, N. Gaffin, A. K. Ghosh, E. A. Giu- 193. D. K. Gehlhaar, G. M. Verkhivker, P. A. Rejto,
liani, S. L. Graham, J. P. Guare, R. W. Hun- C. J. Sherman, D. B. Fogel, L. J. Fogel, and
gate, T. A. Lyle, W. M. Sanders, T. J. Tucker, S. T. Freer, Chem. Biol., 2,317 (1995).
M. Wiggins, C. M. Wiscount, 0. W. Wolters- 194. T. A. Halgren, J. Comput. Chem., 17, 520
dorf, S. D. Young, P. L. Darke, and J. A. Zugay, (1996).
J. Med. Chem., 38,305 (1995). 195. B. Honig and A. Nicholls, Science, 268, 1144
170. M. K. Holloway, Perspect. Drug Discovery Des., (1995).
9/10/11, 63 (1998). 196. D. R. Flower, J. Mol. Graphics Modell., 15,238
171. N. S. Blom and J. Sygusch, Proteins: Struct., (1998).
Funct., Genet., 27,493 (1997). 197. T. R. Stouch and P. C. Jurs, J. Chem. Znf. Com-
172. N. Tomioka and A. Itai, J. Cornput.-Aided Mol. put. Sci., 26,4 (1986).
Des., 8,347 (1994). 198. C. Bissantz, G. Folkers, and D. Rognan,
173. S. J. Weiner, P. A. Kollman, D. A. Case, U. C. J. Med. Chem., 43, 4759 (2000).
Singh, C. Ghio, G. Alagona, S. Profeta, Jr., and 199. M. Stahl and M. Rarey, J. Med. Chem., 44,
P. Weiner, J. Am. Chem. Soc., 106,765 (1984). 1035 (2001).
rences
J. G. Millichap, Ann. N. Y.Acad. Sci., 205,321 207. R. Friedrich, P. Fuentes-Prior, E. Ong, G.

(1973). Coombs, M. Hunter, R. Oehler, D. Pierson, R.
Gonzalez, R. Huber, W. Bode, and E. L. Madi-
J . M. Swanson and M. Kinsbourne in G. H.
son, J. Biol. Chem., 277,2160 (2002).
Hale and M. Lewis, Eds., Attention and Cogni-
208. A. Sali, L. Potterton, F. Yuan, H. van Vlijmen,
tive Development, Vol. 1, Plenum, New York,
and M. Karplus, Proteins: Struct., Funct.,
1979, p. 249.
Genet., 23,318 (1995).
M. Froimowitz, K. S. Patrick, and V. Cody, 209. A. Sali, Proteins: Struct., Funct., Genet., 6,437
Pharm. Res., 12, 1430 (1995). (1995).
I. J. Enyedy, W. A. Zaman, S. Sakamuri, A. P. 210. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson,
Kozikowski, K. M. Johnson, and S. Wang, D. J. States, S. Swaminathan, andM. Karplus,
Bioorg. Med. Chem. Lett., 11, 1113 (2001). J. Comput. Chem., 4, 187 (1983).
211. M. N. Janakiraman, C. L. White, W. G. Laver,
J. H. Voigt, B. Bienfait, S. Wang, and M. C.
G. M. Air, and M. Luo, Biochemistry, 33,8172
Nicklaus, J. Chem. Inf Comput. Sci., 41, 702
(1994).
(2001).
212. G. M. Rishton, Drug Discovery Today, 2, 382
Chem-X (version 96), Oxford Molecular (1997).
Group, Inc., Hunt Valley, MD (company no 213. N. R. Taylor, A. Cleasby, 0.Singh, T. Skarzyn-
longer exists) (2001). ski, A. J. Wonacott, P. W. Smith, S. L. Sollis,
I. J. Enyedy, S.-L. Lee, A. H. Kuo, R. B. Dick- P. D. Howes, P. C. Cherry, R. Bethell, P. Col-
son, C.-Y. Lin, and S. Wang, J.Med. Chem., 44, man, and J. Varghese, J. Med. Chem., 41, 798
1349 (2001). (1998).
CHAPTER SEVEN
Docking and Scoring Functions1

Virtual Screening
CHRISTOPHSOTRIFFER
GERHARDKLEBE
University of Marburg
Department of Pharmaceutical Chemistry
Marburg, Germany
MARTIN STAHL
HANSJOACHIM Born
Discovery Technologies
F. Hoffmann-La Roche AG
Basel. Switzerland
Contents
1 Introduction, 282
2 General Concepts and Physical Background, 284
2.1 Protein-Ligand Interactions and the
Physical Basis of Biomolecular Recognition,
284
2.2 Docking, Scoring, and Virtual Screening: *
The Basic Concepts, 289
3 Docking, 290
3.1 General Concepts to Address the Docking
Problem, 291
3.1.1 Representation of the Macromolecular
Receptor, 291
3.1.2 Ligand Handling, 293
3.1.3 Strategies for Searching the
Configuration and Conformation
Space, 294
3.1.3.1 Geometric/Combinatorial
Search Strategies, 295
3.1.3.2 Energy Driven/Stochastic
Procedures, 296
3.2 Special Aspects of Docking, 300
3.2.1 Protein Flexibility, 300
3.2.2 Water Molecules, 302
3.2.3 Assessment of Docking Methods,
3.2.4 Docking and QSAR, 304
Chemistry and Drug Disc:every 3.2.5 Docking and Homology Modeling
me 1: Drug:Discovery 4 Scoring Functions, 306
. Abraham 4.1 Description of Scoring Functions for
02003 51
d m Wiley & SolIS, Inc. Protein-Ligand Interactions, 306
281
Docking and Scoring Functions/Virtual Screening
4.1.1 Force Field-Based Methods, 306 5 Virtual Screening, 315

4.1.2 Empirical Scoring Functions, 308 5.1 Combinatorial Docking, 317
4.1.3 Knowledge-Based Methods, 310 5.2 Seeding Experiments to Assess Docking and
4.2 Critical Assessment of Current Scoring in Virtual Screening, 318
Scoring Functions, 312 5.3 Hydrogen Bonding versus Hydrophobic
4.2.1 Influence of the Training Data, 312 Interactions, 319
4.2.2 Molecular Size, 312 5.4 Finding Weak Inhibitors, 319
4.2.3 Penalty Terms, 313 5.5 Consensus Scoring, 319
4.2.4Specific Attractive Interactions, 313 5.6 Successful Identification of Novel Leads
4.2.5 Water Structure and Protonation through Virtual Screening, 320
State, 313
6 Outlook, 321
4.2.6 Performance in Structure Prediction
7 Acknowledgments, 322
and Rank Ordering of Related Ligands,
314
1 INTRODUCTION structure-based design (3-6), as summarized

in a number of recent reviews (2, 7-9).
The action of drug molecules and the function Structure-based drug design is an iterative
of protein targets are governed by principles of process (2, lo). It requires as the starting point
molecular recognition. Binding events be- the crystal structure or a reliable homology
tween ligands and their receptors in biological model of the target protein, preferentially
systems form the basis of physiological activ- complexed with a ligand. The first step of the
ity and pharmacological effects of chemical process is a detailed analysis of the binding
compounds. Accordingly, the rational develop- site and a compilation of all aspects possibly
ment of new drugs requires an understanding responsible for binding affinity and selectivity.
of molecular recognition in terms of both These data are then used to generate new
structure and energetics (1).With respect to ideas how to improve existing ligands or to
practical applications, it requires tools that develop alternative molecular frameworks.
are based on such knowledge of mutual recog- Computational methods and molecular ~ o d -
nition between molecular structures. Docking eling play an essential role in this phase of
and virtual screening are computational tools hypothesis generation. They help to exploit in-
to investigate the binding between macromo- formation about the binding site geometry by
lecular targets and potential ligands. They constructing new molecules de novo, by ana-
constitute an essential part of structure-based lyzing known molecules with respect to their
drug design, the area of medicinal chemistry affinity and binding geometry, or by searching
that harnesses structural information for the compound libraries for potential hits to sug-
purpose of drug discovery. gest new leads. Discovered hits that are com-
mercially available or synthetically accessible
Structure-based design has become an in-
are then experimentally tested and their bind-
tegral part of medicinal chemistry. Although
ing properties examined by biochemical, crys-
the knowledge about molecular recognition tallographic, and spectroscopic methods. The
and its foundation on structural principles is 3D structure of new complexes together with
still far from being complete, it has already the acquired activity data are subsequently
fueled significant advances and contributed to used to start a new cycle of ligand design to
many success stories in drug discovery (2). improve the hypotheses stated in the previous
Convincing evidence has been accumulated round.
for a large number of targets that the protein Since the introduction of computational
three-dimensional (3D) structure can be used structure-based design techniques into the
to design small molecule ligands binding drug discovery process in the early 1980s, the
tightly to the protein. Several marketed com- impact of these methods has significantly
pounds can indeed be attributed to successful changed. Initially, computational tools were
1 Introduction 283
often applied a posteriori to rationalize and ful: several publications have reported quite
understand the binding and structure-activity impressive enrichments of active compounds
relationships in a series of inhibitors and to (11-16).
assist in the manual design of individual com- The change of focus from single molecule to
pounds. Guided by the creativity of the de- compound library design in modern structure-
signer, a novel putative ligand was con- based drug discovery is also a consequence of
structed using computer graphics. Molecular major technological advances that have dra-
mechanics calculations were performed on the matically enhanced the data throughput in a
produced protein-ligand complex to assess the variety of fields:
properties of the generated ligand in terms of
a geometric and energetics analysis. A ligand 1. Progress in gene technology, protein chem-
was assumed to bind with high affinity if sat- istry, and structure determination tech-
isfactory complementarity in shape and sur- niques have resulted in a tremendous
face properties between the protein and the increase in protein structure informa-
ligand could be detected. tion. The number of publicly available
It has been realized, however, that the de- 3D protein structures continues to grow
sign of a single, synthetically accessible, active exponentially, with further acceleration
compound is a larger challenge than antici- expected from the current initiatives of
pated. Many phenomena of molecular recogni- structural genomics (17). As a conse-
tion are not yet fully understood, nor are cur- quence, more and more design projects
rent modeling tools able to reflect and are based on structural information, and
accordingly predict them with sufficient reli- structure-based ligand design has be-
ability. Most important, a fast and accurate come routine at all major pharmaceutical
computational prediction of binding affinities companies. On the other hand, the grow-
for new inhibitor candidates is still difficult to ing amount of structural knowledge also
obtain. Although the existing tools do cer- calls for automated methods that make
tainly not allow the medicinal chemist to de- this new wealth of data accessible and
sign the one perfect ligand, they can help to available.
enrich sets of molecules with more active ones, 2. Automation and miniaturization have led
even though the known deficiencies of the to the development of high-throughput
methods can still lead to significant rates of screening(HTS),which is now a well-es-
both false positives and false negatives. A tablished process for large-scale biological
more moderate goal of current molecular de- testing. Libraries of several hundred thou-
sign is thus to improve the hit rates of mole- sand compounds are routinely screened
cules suggested for biological assaying com- against new targets, frequently on a time
pared to a mere random compound selection scale of less than 1 month.
This implies that structure-based design ap- have significantly changed with the intro-
proaches now focus on the processing of large duction of combinatorial and parallel
numbers of molecules, arranged in so-called chemistry techniques. The trend contin-
virtual libraries. These can be composed of ei- ues to move away from the synthesis of in-
ther existing chemical substances (such as, for dividual compounds toward the generation
example, compound collections of a pharma- of compound libraries, whose members
ceutical company) or hypothetical new mole- are accessible through the same type of
cules that could be synthesized by combinato- chemical reaction but different building
rial chemistry. The task is then to filter these reagents.
large libraries by eliminating the majority of Massive data processing and computa-
molecules that is rather unlikely to bind and tional tasks formerly requiring expensive
by prioritizing the remaining ones. As recent supercomputers have become generally
experience shows, this strategy can be success- feasible by advances in PC cluster com-
Docking and Scoring FunctionsNirtual Screening
puting, offering supercomputing power the context of structure-based drug design, a

at unprecedented performance-to-price comprehensive discussion of all aspects of cur-
ratios. rent docking methodologies would be beyond
its scope. Protein-protein or protein-DNA
To provide competitive advantage, structure- docking (20-24), as well as the docking of
based design tools must be fast enough to process small ligands to DNA or RNA as targets (251,
are not explicitly covered. The focus will be on
thousands of compounds per day using affordable
the docking of small molecules to protein bind-
computing resources. Several algorithms have
ing sites, with emphasis on automated proce-
been developed that allow virtual hgand screening
dures. Interactive docking tools, as provided
based on high-throughput flexible docking (18, by some modeling software packages, or frag-
19). More sophisticated methods may then be a p ment-based de novo design methods are not
plied at a later stage of refinement, where a discussed. Information about the latter is
smaller set of compounds, containing the most available through other reviews (26, 27) (see
promising hits, is subjeded to a more detailed also Ref. 28 for a list of de novo design algo-
analysis. Essential elements of all these docking rithms). Virtual screening is discussed only in
tools are scoringfunctions that translate computa- the context of docking; database screening
tionally generated protein-ligand bin- geome- techniques based on molecular similarity or
tries into estimates of affinity. pharmacophore models are not considered
Docking as a computational tool of struc- (29, 30). As a final introductory remark, it
ture-based drug design to predict protein- should be emphasized that docking methods
ligand interaction geometries and binding af- are actually tools for "ligand" (rather than
finities is the subject of this chapter. Funda- "drug") design. The identification of a tight-
mental aspects of the docking process, the binding ligand is a necessary but not sufficient
scoring of protein-ligand interactions, and the criterion toward a promising novel lead struc-
application of docking to virtual ligand screen- ture and its development into a drug. Aspects
ing are discussed. At first, a discussion of the of synthetic accessibility, bioavailability, or
underlying physical principles determining toxicity are not the primary subject of docking,
protein-ligand recognition is given (Section but because it is i m ~ o r t a nto
A
t consider these
2.1), followed by a description of the general factors at early stages of a design project, fil-
concepts of docking, scoring, and virtual ters may be applied to compound libraries be-
- -
screening (Section 2.2). Subsequently, the fore docking or to hits obtained from virtual
current approaches to the docking problem screening on an early stage. This aspect of pre-
are presented (Section 3.11, focusing on the or postprocessing is discussed only briefly.
search methods (Section 3.1.3) and the ap-
proaches used to represent protein and ligand
2 GENERAL CONCEPTS AND PHYSICAL
structures in an efficient way (Sections 3.1.1-
BACKGROUND
3.1.2). In addition, a number of special aspects
is discussed (Section 3.2), including, for exam-
2.1 Protein-Ligand Interactions and the
ple, the issues of protein flexibility (Section
Physical Basis of Biomolecular Recognition
3.2.1) or the consideration of water molecules
in the context of docking (Section 3.2.2). This The selective binding of a small-molecule li-
is followed by a section on scoring functions gand to a specific protein is determined by
used for docking. Three major classes of scor- structural and energetic factors. For ligands of
ing functions are presented (Section 4.1) and pharmaceutical interest, protein-ligand bind-
subjected to critical assessment (Section 4.2). ing usually occurs through noncovalent inter-
A final section is dedicated to virtual screen- actions. The physical basis of noncovalent
ing, illustrating general strategies (Section 51, interactions is generally well established
special problems (Sections 5.1-5.51, and repre- through the theories of electromagnetic forces
sentative applications (Section 5.6). or, on a more fundamental level, of quantum
Although the goal of this chapter is to high- mechanics. For macromolecules, liquid sys-
light the most important aspects of docking in tems, or solutions, however, direct application
2 General Concepts and Physical Background
of these first principles is significantly compli- to l/r6 results. This l/r6 dependency is also
cated by the size and complexity of the sys- encountered in interactions that arise be-
tems, in which a large number of fluctuating tween induced electric moments, such as the
particles simultaneously interact and influ- dispersion interaction based on London
ence each other. Principles from classical me- forces. The attractive interactions between
chanics and heuristic models are therefore fre- (induced) electric multipoles are generally
quently used as an approximation to describe summarized in the term van der Waals inter-
protein-ligand interactions in aqueous solu- actions. Accordingly, van der Waals forces are
weak, attractive, short-range forces that decay
The primary forces acting between a pro- with l/r6. These are normally described by in-
tein and a ligand are all of electrostatic nature. termolecular interaction potentials such as
It is the interaction between explicit charges,
the Lennard-Jones potential:
dipoles, induced dipoles, and higher electric
multipoles that leads to phenomena that are
commonly referred to as salt bridges, hydro-
gen bonds, or van der Waals interactions. In
simplified classifications, it is only the charge- where A and B are parameters depending on
charge interaction that is called electrostatic. the type of the interacting atoms. The r12
This interaction between two charges is of term reflects the short-range repulsive
long range and considerable strength. In vac- forces attributed to unfavorable spatial
uum or uniform media it can be described by overlap of electron clouds a t short distances.
Coulomb's law. In aqueous solution of biomol- An interaction deserving special attention
ecules, however, its application is complicated is that of hydrogen bonds (33,341.In principle,
because of the presence of a large number of their origin is of the same nature as the inter-
water molecules. Unless a sufficiently large actions mentioned above. A hydrogen bond is
number of water molecules is explicitly in- defined as the interaction of an electronega-
cluded in the calculations [as usually only tive atom (the hydrogen-bond acceptor) with a
tractable in computationally expensive molec- hydrogen atom covalently bonded to an elec-
ular dynamics simulations (31)], the correct tronegative atom (the hydrogen-bond donor).
treatment of electrostatic interactions in solu- The major component of a hydrogen bond is
tion requires solving the Poisson-Boltzmann the electrostatic interaction of the donor-hy-
equation, where the solvent is considered as a drogen dipole with the negative partial charge
continuous medium of high dielectric constant of the acceptor. The special characteristics
surrounding a low-dielectric solute (32). originate from the fact that the hydrogen
Electrostatic interactions, however, do not atom is very small and can bear a considerable
only occur between charge monopoles. In a positive partial charge, such that the acceptor
comprehensive treatment of electrostatics one can contact the hydrogen atom at a shorter
has to consider a full power series, and there- distance than expected from the van der Wads
fore interactions between higher electric mo- radii. Hydrogen bonds are directed interac-
ments, such as dipoles and quadrupoles, also tions showing a high angular dependency.
lay an essential role. Their interaction ener- This directionality arises from the anisotropic
are orientation dependent and become charge distribution around the acceptor atom
orter in range with increasing electric mo- (lone pairs) and the fact that the electron
ent. For example, in contrast to the llr de- shells of donor and acceptor atom start to
ndency in Coulomb's law, the energy of the overlap at these short distances unless the
teraction between a charge and a dipole de- ideal geometry is maintained. Hydrogen
s with 1/r2,the interaction between two bonds are attributed an important role with
oles with l/r3. This, however, is valid only respect to specificity of the protein-ligand in-
a fixed orientation of the dipoles. If they teraction. This is based on their directionality
mobile, as in isotropic media (liquids), the and the fact that they require a well-defined
pole-dipole interaction is thermally aver- complementarity in the complex (mutual ar-
d and an average interaction proportional rangement of hydrogen-bond donors and ac-
Docking and Scoring FunctionsIVirtual Screening
ceptors). However, the importance of hydro- binding constant Ki) is generally used to de-
gen bonds should not be overemphasized scribe the stability of complex formation:
because it is the balance between hydrogen
bonds and other forces in protein-ligand com-
plexes that must be appropriately considered
(35). From the experimentally measured equilib-
Weakly polar interactions in proteins and rium constant the binding affinity can be cal-
protein-ligand complexes are frequently phe- culated as
nomenologically analyzed and classified in
terms of the interacting partners (36). This
especially includes interactions with T-sys-
tems, such as the NH-T, OH-T, or CH-.rrinter- where R is the gas constant (8.314 JImolK)
action (37, 38), aromatic-aromatic interac- and T is the temperature [the equilibrium con-
tions (parallel T-T stacking versus edge-to- stant would actually have to be related to a
face interaction), and the cation-T interaction standard concentration to become a dimen-
(39). All of these can mostly be rationalized in sionless quantity, but in general this is not
terms of electrostatic interactions outlined explicitly considered (44,4511. Experimentally
above; that is, they involve interactions be- determined binding constants Ki (K,) are typ-
tween monopoles, dipoles, and quadrupoles ically in the range of lop2 to 10-l2 M, corre-
(permanent and induced). A more distinct sponding to a Gibbs free energy of binding of
character can be attributed to metal complex- roughly -10 to -70 kJ/mol(1,2).
ation, which can play a significant role in indi- According to the Gibbs-Helmholtz equa-
vidual cases of protein-ligand interactions, as tion, the free energy of binding consists of an
for example in metalloenzymes (2,40,41). enthalpic and an entropic contribution:
Finally, so-called hydrophobic or lipophilic
interactions are often mentioned as additional
contribution to protein-ligand interactions.
These terms are used to describe the preferen- The enthalpy and entropy of binding can be
tial association of nonpolar groups in aqueous determined experimentally, as, for example,
by isothermal titration calorimetry (46,471.
solution. It should be emphasized, however,
These data, however, are still sparse and not
that in contrast to what the name suggests,
always easy to interpret (48, 49). Substan-
there is no special hydrophobic force. Instead,
tial compensation between enthalpic and en-
one should speak of a hydrophobic effect. As tropic contributions is observed (50-52);
further mentioned below, according to the this phenomenon and its interpretations
generally accepted view, it arises primarily have recently been critically reexamined
from the entropically favorable replacement (53). Interestingly, the data also show that
and release of water molecules (42, 43). The binding can be both enthalpy-driven (e.g.,
association between the nonpolar surfaces it- streptavidin-biotin, AG = -76.5 kJ/mol, AH
self is simply based on weak London forces = -134 kJ/mol) or entropy-driven (e.g.,
(36). streptavidin-HABA, AG = -22.0 kJ/mol, AH
Thermodynamically, the strength of the in- = + 7.1 kJ/mol) (54). However, because of
teraction between a protein and a ligand is strong temperature dependencies, even this
described by the binding affinity or (Gibbs) partitioning is a question of the temperature
free energy of binding. Assuming a simple used for measuring.
equilibrium reaction of the form What are the major contributions to the en-
thalpy and entropy of binding? Direct interac-
tions between the protein and the ligand are
obviously very important for the enthalpy of
between a protein P and ligand L to give the binding. Besides that, an essential factor is
complex PL, the dissociation constant K, (or that protein-ligand interactions occur in
mneral Concepts and Physical Background 287
Figure 7.1. Overview of the receptor-ligand binding process. All species involved are solvated by
water (symbolized by gray spheres). The binding free energy difference between the bound and
unbound state is a sum of enthalpic components (breaking and formation of hydrogen bonds, forma-
tion of specific hydrophobic contacts) and entropic components (release of water from hydrophobic
surfaces to solvent, loss of conformational mobility of receptor and ligand).
aq' leous solution (cf. Fig. 7.1). The unbound and released. This leads to an entropy gain
re5kction partners are solvated and partial de- that is attributed to the fact that the water
solvation is required before complex forma- molecules are no longer positionally confined.
tion can occur. This is important for the en- In addition, there is an enthalpic contribution:
th:ilpy balance because the net energy gain water molecules occupying lipophilic binding
UPon complexation can only be the difference sites are unable to form hydrogen bonds with
be1:ween the direct protein-ligand interaction the protein, but after release they can form
en.thalpy and the desolvation enthalpies of the strong hydrogen bonds with bulk water. Be-
two molecules. In this context, the hydropho- cause the removal of hydrophobic surfaces
bicIeffect has to be considered again. Upon the from contact with water leads to negative
foramation of lipophilic contacts between apo- changes in the heat capacity (AC,), the buried
lm parts of the protein and the ligand, unfa- hydrophobic surface area has frequently been
vo:rably ordered water molecules are replaced correlated with AC, values measured upon li-
gand binding. This, however, may be an over- Thus, the electrostatic interaction of an ex-
simplification, neglecting other potential con- posed salt bridge contributes as much as a
tributions to AC, (55). As further noted by neutral hydrogen bond (5 + 1 kJ/mol accord-
Tame, enthalpy-entropy compensation and ing to Ref. 66), but the same interaction in the
the temperature dependency of AH and T A S interior of a protein can be significantly stron-
(which are both directly related to AC,,), make ger (67). Because of the complicated interplay
it ultimately impossible to consider polar or with water, a detailed analysis of the thermo-
apolar contributions as purely enthalpic or en-
-
dynamics of hydrogen bond formation can
tropic, respectively (56). sometimes yield surprising results. For a par-
Entropically unfavorable contributions ticular hydrogen bond in complexes of-the
arise from the loss of translational and rota- FK506-bindingprotein, it has been found that
tional degrees of freedom upon complexation, its formation is enthalpically unfavorable but
whereas a small gain in entropy can result entropically favorable (60). The entropy gain
from low-frequency concerted vibrations in appears to be attributable mainly to the re-
the complex. A more important factor to con- placement of two water molecules (68).
sider in an actual design process is conforma- Contributions from hydrophobic interac-
tional flexibility. Upon binding, internal de- tions have frequently been found to be propor-
grees of freedom are frozen, the ligand loses a tional to the lipophilic surface area buried
considerable amount of its flexibility, and usu- from solvent, with values in the range of 80-
ally binds in one single orientation. This is 200 J/(mol A') (69-71). The entropic penalty
also the explanation why rigid analogs of flex- for freezing a single rotatable bond has been
ible ligands show higher affinity, as, for exam- estimated to be 1.6-3.6 kJ/mol at 300 K (72,
ple, observed for cyclic derivatives of ligands 73); recent estimates derived from NMR shift
that adopt the same binding mode as the open- titrations are much lower (0.5 kJ/mol) (74),
chain derivative (57, 58). Accordingly, higher but in the systems studied the conformational
affinity also results if the protein-bound li- restriction may not have been as high as in a
gand conformation is already preorganized in protein binding site. Finally, the unfavorable
solution. entropy contribution from the loss of transla-
From a variety of experiments, quantita- tional and orientational degrees of freedom
tive estimates for some of the mentioned en- has been estimated to be around 10 kJ/mol
ergetic contributions to protein-ligand bind- (75, 76).
ing could be derived. Based on data from Despite many inconsistencies and difficul-
protein mutants, the contribution of individ- ties in interpretation, most of the experimen-
ual hydrogen bonds to the binding affinity has tal data suggest that simple additive models of
been estimated to be 5 + 2.5 kJ/mol (59-62). protein-ligand interactions might be a reason-
This is similar to what has been obtained for able starting point for the development of
the contribution of an intramolecular hydro- methods to predict binding affinities, that is,
gen bond to protein stability (63,64). The confor the derivation of empirical scoring func-
sistency of values derived from different pro- tions. Still, it has to be kept in mind that the
teins suggests some degree of additivity in the assumption of additivity in biochemical phe-
hydrogen-bonding interactions. The accurate nomena is not strictly valid (77). On the other
description of the interplay with water mole- hand, the large body of experimental data on
cules remains, however, a most challenging 3D structures of protein-ligand complexes
task. The contribution of hydrogen bonds to and binding affinities allows one to derive
the overall affinity strongly depends on local some general characteristics about protein-li-
solvation and desolvation effects and can gand interactions. Several features are com-
sometimes be very small or even adverse to monly found in complexes of tightly binding
binding, as illustrated by the comparison of ligands:
ligand pairs differing by just one hydrogen
bond (65). Charge-assisted hydrogen bonds 1. A high steric complementarity between the
are stronger than neutral ones, but also asso- protein and the ligand, an observation often
ciated with a higher desolvation penalty. described as the lock-and-key paradigm (78,
2 General Concepts and Physical Background
79). This complementarity, however, is fre- structures of a ligand and a protein, the task is
quently not the result of a match between to predict the structure of the resulting com-
rigid bodies, but rather achieved through sig- plex. This is the so-called docking problem. Be-
nificant conformational changes of both cause the native geometry of the complex can
binding partners, a phenomenon generally generally be assumed to reflect the global min-
referred to as induced fit. Additionally, elec- imum of the binding free energy, docking is
trostatic complementarity can also be in- actually an energy-optimization problem (821,
duced, for example, by strong pK, shifts upon concerned with the search of the lowest free
ligand binding that result in the release or energy binding mode of a ligand within a pro-
uptake of protons of different functional tein binding site. The macromolecular nature
groups either of the protein or the ligand. of the protein and the fact that binding occurs
2. A high complementarity of the surface in aqueous solution complicate the problem
properties. Lipophilic parts of the ligands significantly because of the high dimensional-
are generally in contact with lipophilic ity of the configuration space and considerable
parts of the protein, whereas polar groups complexity of the energetics governing the in-
are usually paired with suitable polar pro- teraction. Accordingly, heuristic approxima-
tein groups to form hydrogen bonds or tions are frequently required to render the
ionic interactions. problem tractable within a reasonable time
3. An energetically favorable conformation of frame. The development of docking methods is
the bound ligand. Significant conforma- therefore also concerned with making the
tional strain is usually not observed in li- right assumptions and finding acceptable sim-
gands binding with high affinity. plifications that still provide a sufficiently ac-
curate and predictive model for protein-ligand
interactions.
In addition to insights taken from high-&- Regardless of the nature of the interacting
ity complexes, experimental information about partners, computational docking always re-
weakly bound complexes could be equally in- quires two components, which may briefly be
structive. Such information has indeed been characterized as "searching" and "scoring"
recognized to be vital for the development of (83). "Searching" refers to the fact that any
scoring functions (80). Structural data on unfa- docking method has to explore the configura-
vorable protein-ligand interactions, however, tion space accessible for the interaction be-
are sparse, partly because structures of weakly tween the two molecules. The goal of this ex-
binding ligands are more difficult to obtain and ploration is to find the orientation and
are usually considered less interesting by many conformation of the interacting molecules cor-
structural biologists. What can be concluded responding to the global minimum of the free
from the available data is that an imperfect energy of binding. Unless the degrees of free-
steric fit at the lipophilic part of the protein-li- dom are restricted to translation and rotation
gmd interface leads to reduced binding affinity by treating both molecules as rigid bodies, a
and that unpaired buried polar groups at the full systematic search of all "dockings" is nor-
protein-ligand interface are strongly adverse to mally not feasible because of the huge number
binding. Few buried CO and NH groups in of potential solutions and the large amount of
folded proteins fail to form hydrogen bonds (81). commtational resources needed to evaluate
*
Therefore, in the ligand design process an im- them. Different strategies are therefore re-
portant prerequisite to be regarded is that polar quired, which should be accurate and efficient:
functional groups, either of the protein or the accurate in the sense that the optimization
ligand, will find suitable counterparts if they be- procedure should not miss any valuable solu-
come buried on ligand binding. tion (near-global minima), and efficient in
terms of computing time and with respect to
.2 Docking, Scoring, and Virtual Screening:
the fact that the algorithm should not spend
e Basic Concepts
unnecessary time by exploring irrelevant re-
he subject of docking is the formation of non- gions or by rediscovering previously detected
d e n t protein-ligand complexes. Given the local minima. As will be elaborated in the next
section, there are two opposing approaches to pounds to compound libraries, state-of-the-art
simplify the docking problem either by refor- docking and scoring methods have to be suffi-
mulating it to a discrete problem that can be ciently fast to be applied for virtual screening.
solved with combinatorial algorithms or by us- The general strategy of a virtual screening
ing stochastic search algorithms. process based on the 3D structure of a target
"Scoring" refers to the fact that any dock- typically involves the following steps:
ing procedure must evaluate and rank the con-
figurations generated by the search process. Analysis of the 3D protein structure.
The scoring scheme most closely related to ex- Selection of key interactions that need to
periment, the ab initio calculation of the free be satisfied by all candidate molecules.
energy of binding, is not easily accessible to Computational search in chemical data-
computation. Hence, approximate scoring bases for compounds that potentially sat-
functions must be used that model the binding isfy the key interactions, fit into the bind-
free energy with sufficient accuracy and corre- ing site, and form additional interactions
late well with experimental binding affinities. with the protein; this is done by means of
In particular, the scoring function should be docking and/or structure-based pharma-
able to discriminate between native and non- cophore searches.
native binding modes. Postprocessing by analyzing the retrieved
Scoring is actually composed of three dif- hits and removing undesirable compounds.
ferent aspects relevant to docking and design: Synthesis or ordering of the selected
compounds.
1. Ranking of the configurations generated by
Biological testing, eventually crystallo-
the docking search for one ligand interact-
graphic confirmation.
ing with a given protein; this aspect is es-
sential to detect the binding mode best ap-
proximating the experimentally observed All these stem will be discussed in some
situation. more detail in section below. Of primary inter-
est in the context of this chapter is step 3. It
2. Ranking different ligands with respect to
requires high-throughput docking with effi-
the binding to one protein, that is, priori-
cient search algorithms, and scoring functions
tizing ligands according to their affinity;
that are able to provide a good separation be-
this aspect is essential in virtual screening.
tween potentially "binding" and "nonbind-
3. Ranking one or different ligands with re- ing" ligands. The database or library that is
spect to their binding affinity to different screened should consist of a sufficiently large
proteins; this aspect is essential for the con- and diverse set of relevant compounds. Thus,
sideration of selectivity and specificity. library design is increasingly applied to ensure
that only reasonably preselected compounds
If one were able to accurately calculate the are docked (29,84,85).
free energy of binding, all three aspects would
be satisfied simultaneously. Current scoring
functions used in docking programs, however, 3 DOCKING
can usually resolve satisfactorily only the first
aspect. They provide only a rough estimate In this section, approaches to the docking
with respect to the comparison across differ- problem are presented with respect to the
ent ligand or protein systems. This is the case docking algorithm and the search aspect,
whenever the scoring scheme neglects certain Scoring is discussed separately in Section 4. It
factors that are virtually constant for different should be noted in this context, that although
binding modes with respect to one protein, but a specific docking method is frequently associ-
that matter for comparisons with other pro- ated with a certain scoring procedure, many
teins. docking methods could in principle be com-
Following the general paradigm shift in bined with a variety of different scoring func-
structure-based design from single com- tions, either for postprocessing of the results
3 Docking 291
or as objective function during the optimiza- ignore information already available from bio-
tion. Actually, such strategies are followed by chemical experiments or structural data of re-
considering multiple scoring schemes to lated complexes. If no such information is
achieve "consensus scoring" (86) or "multidi- available [a situation that we may increasingly
mensional scoring" (87). The emphasis in this be facing as a consequence of the effects of
section is on general characteristics and prin- the structural genomics initiatives (17, 96)],
ciples, rather than individual methods, al- methods to identify binding sites are required
though occasionally specific docking programs before the actual docking process can start.
have been selected as representative examples Examples are programs for geometric cavity
for a more detailed illustration of a general detection, such as LIGSITE (97) or PASS (981,
concept. The interested reader is referred to tools to infer protein function from structural
Table 7.1 for an overview of currently used homologies (99, loo), or more sophisticated
docking programs described in the literature. approaches based on a physicochemical and
In addition, a valuable source of information is geometrical characterization of binding sites
the corpus of regularly published reviews in (101). Some docking programs incorporate
the field of docking (18,19,26,27,83,88-95). routines for binding site identification as pre-
processing steps (102).
3.1 General Concepts to Address the Despite a reduction to only a specified part
Docking Problem of the protein surface, a simple representation
in terms of atomic coordinates is not practical
Essential for any docking method is a search for most docking procedures. Instead, the
algorithm that samples the configuration space available for ligand binding is frequently
space of two interacting molecules. These mol- characterized bv " other means that permit
ecules need to be represented in a way that is more efficient searches. A first alternative is
suitable for efficient handling by the search given by geometric shape descriptors, some-
algorithm. Docking methods may therefore times combined with a physicochemical de-
roughly be classified by the way the macromo- scription. Approaches of this class include mo-
lecular receptor is represented (Section 3.1.11, lecular surface cubes (103),surface normals at
by the handling of the ligand (Section 3.1.21, sparse critical points (104), and modified Lee-
and-most important-by the search algo- Richard's dotted surfaces, with each dot cod~d
rithm itself (Section 3.1.3). by chemical property and accessibility (105).A
further prominent example is the sphere
3.1.1 Representation of the Macromolecu- images of the binding site used in DOCK (106,
lar Receptor. The most straightforward ap- 107). These spheres are complementary to the
proach for representing the macromolecular molecular surface and represent a space-fill-
structure in a docking application would be by ing negative image of the binding site. An-
tomic coordinates of the entire protein. A full other important concept that goes beyond a
tomic representation, however, is generally pure geometric description and represents in-
practical because of the size and complexity teraction properties of physicochemical rele-
in structures. The structural informa- vance is the usage of interaction sites or
refore needs to be reduced to a man- points, as introduced by the program LUDI
e yet representative size and form. (108,109). These interaction sites are discrete
first step into this direction is to limit the positions and vectors in space serving as
ch area to the region surrounding the pu- dummy representations for atoms capable of
tive binding site. This is general practice in forming hydrogen bonds or filling hydropho-
tein-ligand docking (whereas in protein- bic pockets. The docking tool FlexX is based on
ein docking often the entire surfaces are this concept (110). Also, the program SLIDE
ched for appropriate matches). Scanning (111) and the new approach by Diller and
the entire surface for potential binding re- Merz (112) use interaction points for fast
ns of a small-molecule ligand would hardly docking.
feasible with most docking methods. Fur- A popular alternative to geometric or phys-
more, it would be rather unreasonable to icochemical descriptors is the grid representa-
292 Docking and Scoring FunctionsNirtual Screening
Table 7.1 Overview of Currently Used Programs for Protein-Ligand Docking

Selected References to
Class of Docking Name of Year Original Further Developments
Methoda Program Published References and Applications
Geometric/combinatorial
Shape/descriptor DOCK
matching
FLOG
ADAM
LIGIN
SANDOCK
QSDOCK
SLIDE
FRED
(Diller &
Merz)
Incremental FlexX
construction
Hammerhead
DOCK4.0
Systematic search EUDOC
(transl. + rot.)
Energy-driven/stochastic
Monte Carlo simulated AutoDock
annealing
RESEARCH
MCDOCK
Monte Carlo ICM
minimization
(Caflisch et al.)
Qxp
PRODOCK
Molecular dynamics MDD
(MD)
(Luty et al.)
(Vieth et al.)
q-jumping MD
Genetic algorithm GOLD
AutoDock3.0
GAMBLER
DARWIN
Tabu search PRO LEADS
Tabu search + genetic SFD&
algorithm
Eigenvector following Low Mode
Search
Mining Minima Mining
algorithm Minima
"The classification provided in the first column can only be approximate for programs that offer a variety of different
functionalities or follow multistep strategies.
tion of protein structures. The general princi- ularly spaced, orthogonal grids are calculated
ple of this approach is that the protein is before the actual docking process. At every
represented by a set of affinity grids or maps grid point, some sort of scoring value or inter-
that cover the entire search region. These reg- - action energy of a probe atom with the entire
3 Docking
protein is calculated, providing a map of pseu- ferent conformers using distance geometry
do-affinities for each atom type or interaction and docking each conformer in a rigid-body
type possibly present in the ligands to be fashion. A similar approach has also been ob-
docked. These maps then serve as look-up ta- tained with the DOCK program (122). To
bles for the calculation of the interaction en- avoid redundancy in the docking, a common
ergy or scoring value during the docking pro- rigid fragment is identified, which is docked
cess. Examples of docking programs using this only once for the entire set of pregenerated
approach are AutoDock (113-115), ICM (82, conformers. The flexible portions of the mole-
116,117),or ProDock (118, 119). cule that determine the different conforma-
It should be noted that most of the men- tions are subsequently scored based on the
tioned representations of protein structure preplacement of the rigid fragment. Yet other
imply that the protein remains rigid during examples for rigid docking of multiple con-
the docking process. As a matter of fact, dock- formers are provided by the programs FRED
ing under the assumption of a rigid protein is from OpenEye Scientific Software, which per-
still common practice in standard applica- forms a fast exhaustive search over all possible
tions. Although an acceptable simplification orientations (123), and SYSDOC (124) or EU-
under certain circumstances, it can represent DOC (125), which use fast affine transforma-
a serious limitation if only unbound protein tion to perform systematic searches over the
structures are available. As a consequence, the translational and rotational degrees of free-
inclusion of protein flexibility in the docking dom of the ligand.
process is an active area of research, and a Although this multi-conformer docking can
separate section is dedicated to this issue (cf. be efficient and accurate for molecules with a
Section 3.2.1). limited number of discrete, low-energy confor-
mations, it is less suited for larger and highly
3.1.2 Ligand Handling. For the ligand, a flexible molecules, simply because the number
complete representation in atomic coordi- of possible conformations increases dramati-
nates is perfectly feasible. Ligand atoms may cally. Another way of partially accounting for
be used directly for matching with binding site conformational flexibility in whole-molecule
descriptors or in the calculation of interaction rigid-body docking is to subject the initial
energies in the case of energy-driven proce- matches to some kind of optimization that al-
dures. The central problem is conformational lows for conformational relaxation. This could
flexibility. Predicting the binding conforma- be done with some standard energy minimiza-
tion of a ligand is in fact a major component of tion technique (126,127) or other procedures
the docking problem, given that this confor- that resolve clashes of the initial placement by
mation can simificantly
- - differ from that rotation about single
- bonds, as done, for exam-
adopted in other environments. ple, in the docking SLIDE.(^^^).
Two general strategies for ligand handling A more rigorous treatment of ligand flexi-
may be distinguished: whole-molecule ap- bility in whole-molecule docking is performed
proaches and fragment-based methods. In the by sampling ligand conformation space during
&st case, the ligand is docked as an entire docking (variant 2 in Fig. 7.2). It normally re-
molecule. This is rather straightforward if the quires ligand conformational energies to be
gand is treated as a rigid body and only trans- evaluated besides intermolecular interaction
ional and rotational degrees of freedom are energy. Molecular mechanics force fields are
nsidered. Such rigid docking was common frequently applied for this purpose. Although
radice in early docking algorithms (106, a more exhaustive sampling of accessible con-
20). A straightforward extension to account formations within the binding site is definitely
r flexibility is to separately dock precalcu- achieved, an obvious disadvantage is the
d conformers of a given molecule (variant higher computational demand and possibly a
in Fig. 7.2). Explicit docking of multiple con- reduced efficiency of the algorithm because of
rmers has, for example, been obtained with lengthy exploration of local minima.
e FLOG program (121). FLOG deals with An interesting variant of whole-molecule
nformational flexibility by generating dif- representations is the use of internal coordi-
294 Docking and Scoring FunctionsIV~rtualScreeniing
Alternative strategies for flexible ligand docking
1. Separate conformer generation

and rigid-body docking
2. Simultaneous optimization
of orientation and conformation
(simulatedannealing, GA)
3. Placement of anchor fragment

followed by
incremental construction
Figure 7.2. Strategies for flexible ligand docking.
nates instead of Cartesian coordinates (82). Methods of this class have been reviewed t
Internal coordinates help to reduce the num- tensively (26,27). However, the approach E
ber of variables defining the conformation of also been applied for docking (128) and co
the molecular system. In Cartesian space, pared to the whole-molecule docking approa
three functionally equivalent variables per (129).
atom are required. Internal coordinates, in- The other variant of fragment-based liga
stead, consist of bond lengths, bond angles, docking is used in incremental constructi
and torsion angles. Because bond lengths and algorithms (110,130),sometimes also referr
angles can be considered rigid to a good ap- to as "anchor and grow" (131). These sear
proximation, only the torsion angles matter as strategies are further described below. Th
variables to map conformation space. An effi- dissect the ligand into modular portions a
cient implementation of docking algorithms rebuild it incrementally within the bindi
operating on internal coordinates has been ob- site starting from the docking position oj
tained, for example, with the ICM method (82, suitable base fragment. The advantage is tl
116, 117). many potential combinations are eliminat
Fragment-based techniques are an alterna- early in the construction, but success critica
tive to whole-molecule docking (variant 3 in depends on the selection and placement oft
Fig. 7.2). Here, the molecule is dissected into base fragment.
fragments that can be docked individually in a
rigid fashion. The fragments can either be 3.1.3 Strategies for Searching the Confil
docked separately and then reconnected, or ration and Conformation Space. Search str
the ligand is built up incrementally following a egies of automated docking procedures m
certain fragmentation scheme. The first vari- roughly be classified as geometric or combir
ant is very common to programs dedicated to torial on the one hand and energy driven
de novo design rather than pure docking. stochastic on the other, although ultimat~
3 Docking
all methods try to optimize a function that ible geometry and chemistry. This mapping is
models to some extent the free energy of bind- used to generate initial placements of mole-
ing. cules in the binding site and followed by a se-
3.1.3.1 Geometric/Combinatorial Search ries of steps that refine the initial position,
Strategies. Most of the early docking methods resolve collisions, and consider flexibility of
were entirely based on the concept of shape both the ligand and the protein side chains (cf.
complementarity. Until today this is the fun- note on hybrid approaches below). Similarly,
damental idea in most protein-protein docking the rapid docking approach for library priori-
programs. The observation that protein-li- tization developed by Diller and Merz (112) is
gand complexes frequently show a remarkable based on rigid-body triplet matching of ligand
shape fit of both binding partners has stimu- atoms onto precalculated hot spots; subse-
lated the conception of surface or descriptor quently, pruning is performed to remove any
matching as docking search technique. The positions with significant steric clash, and the
molecules are represented by geometric remaining matches are subjected to energy
andlor physicochemical descriptors and vari- minimization.
ous alignment procedures are applied to Pure descriptor matching is efficient for
match complementary parts of ligand and rigid-body docking only. Flexible docking, in
protein. An example is the original DOCK fact, is always faced with the additional prob-
method, where the ligand is superimposed lem of a combinatorial explosion of possible
onto a negative sphere image of the binding conformers depending on the number of rotat-
pocket, using a distance matching algorithm able bonds. Systematic searches or explicit
followed by least-squares fitting (106, 132). consideration of each possible conformation
Other examples are the least-squares fitting would therefore require enormous computing
procedure described by Bacon and Moult to resources. A popular way to address this prob-
achieve matches between complementary sur- lem within the class of geometric/combinato-
face patterns (133), or the hierarchical search rial docking methods is incremental construc-
of geometrically compatible triplets of surface tion (110, 130, 131, 137). The ligand is
normals on the molecules to be docked, as pro- dissected into fragments and incrementally
posed by Wallqvist and Cove11 (134). The pro- reconstructed in the binding site starting from
gram ADAM performs a complete combinato- a suitably docked base fragment. To avoid
rial search over all possible matches between dead-end solutions during construction, mu)-
hydrogen bond patterns (135).Recently, a new tiple placements of the base fragment have to
matching algorithm based on so-called qua- be considered. In addition, it can be useful to
dratic shape descriptors has been described perform different fragmentations and hence
(QSDock);along with the presentation of their to use different base fragments as starting
method, the authors also provide an extensive points, especially for long and highly flexible
discussion of shape-based docking algorithms molecules. The docking itself, that is, the
placement of the base fragment and the at-
Another recent example of descriptor tachment of remaining portions, is guided by
atching is SLIDE, developed as a tool for li- some descriptor matching procedure.
d database screening by docking (111). An example of an incremental construction
binding site is represented by a template method is the program FlexX (110, 130, 138,
orable interaction points onto which li- 139). Conformational flexibility is considered
atoms are matched during the search. using a discrete set of preferred torsion angles
of serving as a purely geometric de- about acyclic single bonds, together with mul-
ption, these points address four different tiple conformations for ring systems. These
s of interactions (hydrogen-bond donor, torsion angle preferences are taken from a li-
ptor, donorlacceptor, or hydrophobic in- brary compiled from torsional fragments
action center). The search is then per- extracted from the Cambridge Structural Da-
med such that all triangles of appropriate tabase (140). The model of molecular interac-
ms in the ligand are exhaustively mapped tions is based on similar rules as implemented
nto triangles of template points with compat- in LUDI, originating from a composite crystal-
field analysis (141). For each group capable of sp3 atoms and six between sp2 and sp3 atoms.
forming an interaction, a special contact ge- The partial constructs are then locally opti-
ometry is defined: the group is placed to a cen- mized to minimize the sum of intra- and inter-
ter about which an interaction surface is de- molecular energies and pruned back to an ap-
fined, usually as part of a sphere. Two groups proximately constant size of configurations.
form an interaction if the interaction center of Pruning is necessary to cope with combinato-
one group coincides with the interaction sur- rial explosion. It is performed on the basis of
face of a counter group. To start with the ac- the score and the orientation, such that both
tual docking process, the ligand is fragmented the best scoring and most deviating orienta-
into components by dissecting at all single tions are retained from each expansion cycle.
bonds that are not part of a cycle. Out of these Finally, after complete reconstruction of the
components suitable base fragments are se- ligand, the pruned set of binding configura-
lected. The base fragment is the first portion tions is again subjected to local energy
of the ligand to be placed into the binding site. minimization.
This is done by superimposing either triples This anchor-and-grow implementation in
or pairs of interaction centers constructed DOCK represents a combination of a geomet-
around the base fragment with triples or pairs ric and energy-based approach to docking, due
of compatible interaction points generated in to the intermediate steps of energy minimiza-
the binding region. Normally, a large number tion. As already encountered for SLIDE (11 l),
of initial placements is generated, which is such multistep or hybrid approaches are com-
then reduced either by clustering similar solu- monly found in current docking protocols.
tions or because of clashes with the protein. DOCK in general is a prototype of such a pro-
Next, the incremental construction of the en- gram, originally based solely on rigid geomet-
tire ligand is initiated. Starting with the dif- ric descriptor matching, later enhanced with a
ferent base placements, the ligand is built up variety of additional features. For example,
by stepwise linking of the components in com- some degree of flexibility has been introduced
pliance with the torsional database. After into the rigid docking procedure by dissecting
hooking up additional fragments, new interac- the ligand into a small set of rigid fragments
tions are searched and a scoring function is that are docked separately and then recon-
used to select the best partial solutions, which nected (128). The concept of geometric shape
are expanded in the following step. This is complementarity has been extended to con-
done until the last fragment has been added sider physicochemical complementarity by as-
and placed to result in the complete ligand. signing properties to binding-site spheres and
The generated ligand positions are finally allowing them to match only those ligand
stored and ranked according to the predicted atoms that are of complementary character,
binding affinity. an approach referred to as "sphere coloring"
An anchor-and-grow algorithm has re- (142, 143). Rigid-body minimization has been
cently also been incorporated into DOCK introduced as refinement after the initial de-
(131). Here, after identification of rotatable scriptor-matching step (126) or in the variant
bonds, the ligand is fragmented into rigid seg- of on-the-fly optimization using force-field en-
ments, the largest segment is identified as the ergies precomputed on a grid (127). In sum-
anchor, and the remaining segments are orga- mary, the combination of different approaches
nized as layers surrounding the anchor. Then and algorithms to overcome the limitations of
the anchor is docked using geometrical match- every single approach has provided us with
ing. Based on the obtained anchor positions, steadily improving solutions to the docking
the conformational search is initiated by add- problem.
ing segments from the innermost layer and 3.1.3.2 Energy Driven/Stochastic Proce-
proceeding outward. This addition is done ac- dures. As mentioned above, docking is essen-
cording to the accessible torsion angle values tially an energy optimization problem because
along the newly added bond. The default is to the native binding mode of a ligand can in gen-
use two alternative settings for bonds between eral be expected to correspond to the global
two sp2 hybridized atoms, three between two minimum of the binding free energy (82). Ac-
3 Docking
cordingly, finding this binding mode by dock- docking procedures, they are applied in com-
ing corresponds to the identification of the bination with other techniques.
global minimum of the free-energy function. Monte Carlo methods consist of two essen-
Because the actual free energy of binding is tial components that are repetitively applied:
not accessible to computation, approximate a random walk of the ligand through the re-
energy evaluations or scoring functions are ceptor-near space (i.e., the random displace-
used to guide the search. These functions are ment along translational, rotational, and/or
required to model the free-energy surface in torsional degrees of freedom), and the evalua-
anappropriate way: although the absolute val- tion of the new configuration based on the Me-
ues are not of relevance for the structural as- tropolis criterion (144). This criterion decides
pect of docking, it is essential that the global whether a new position is accepted and hence
minimum of a relative free-energy function on the configuration from where the search
models accurately enough the position of the
will proceed. If the energy of the new docked
global minimum on the real free energy sur-
position (E,,,) is more favorable (lower) than
face. (It is worth mentioning in this context
the energy of the previous position (E,,,), the
that in purely geometrical or descriptor-based
docking procedures, the central assumption is new position is accepted. If it is less favorable,
that the degree of surface complementarity or the probability P for its acceptance is given by
matching between descriptors is proportional
to the interaction energy.)
With a suitable energy function available,
docking can be performed by global minimiza- where k is the Boltzmann constant and T is
tion of the energy with respect to the position, the effective temperature. To turn this sam-
orientation, and conformation of the ligand. pling technique into an efficient optimization
However, this apparently straightforward ap- method applicable to docking, it has to be com-
proach bears two fundamental problems, in- bined either with a temperature lowering pro-
herently related to characteristics of the en- tocol or with some local minimization steps.
ergy landscape of protein-ligand interactions: The former approach is known as Monte Carlo
the high dimensionality, which precludes a simulated annealing, the latter as Monte
systematic, exhaustive search; and the rug- Carlo minimization.
gedness of the surface, reflected by a large In simulated annealing, the effective t e h -
number of local minima. Because of this last perature T is initially set to a high value and
aspect, standard energy minimization tech- gradually lowered, after a predefined number
niques alone are not useful for docking appli- of Monte Carlo steps has been performed at a
cations because they can guide the search only given temperature. At high temperatures, a
to the next local minimum. They are used, broad region of configuration space is sam-
however, in combination with other tech- pled: energy barriers can be surmounted be-
niques and play a valuable role at certain cause of the high acceptance probability for
stages of the docking process, primarily to re- less favorable placements. As the temperature
h e docked positions and conformations by ex- is lowered, this becomes less probable and the
loring the local energy landscape in the vicin- configuration is optimized more locally. Given
ity of this position. the stochastic nature of the process, multiple
To address the docking problem, tech- independent runs are required to assess con-
ques for a more global exploration of the en- vergence (this equally applies to many of the
ergy landscape are required. A variety of methods further described below). Examples
methods is available, frequently used in the of docking programs using Monte Carlo simu-
context of other modeling applications and op- lated annealing as a search strategy are
zation problems as well. Three major AutoDock (113-115), RESEARCH (145,1461,
ses may be distinguished: Monte Carlo and MCDOCK (147).
chniques, molecular dynamics simulations, In Monte Carlo minimization, an addi-
d genetic algorithms. Many different vari- tional step is inserted after the random walk
ts exist for all of them and frequently, in before Metropolis evaluation. This step is a
local energy minimization, using techniques coupling to different thermal baths for both
such as steepest descent or conjugate gradi- types of motion of the ligand and the receptor
ent. Full local minimization after each ran- is performed. Because the temperature and
dom-walk step has been reported to improve the time constants of coupling to the baths can
the efficiency of the procedure (148, 149). A be varied arbitrarily, it is possible to increase
docking procedure that uses global Monte the kinetic energy of the center of mass of the
Carlo minimization is the ICM program of ligand without increasing the temperature of
Totrov and Abagyan (82, 116, 117). ICM de- the internal motions of receptor and ligand.
scribes both the relative positions of two mol- This allows for complete control of the search
ecules and their conformations by a uniform rate. The technique was applied to the docking
set of internal variables and uses precalcu- of phosphocholine to antibody McPC603,
lated grids of the interaction energies to speed starting from distinct positions well separated
up calculations. Trosset and Scheraga use from the actual binding site. After appropriate
Monte Carlo minimization in their ProDock sampling, the average structure of the com-
program; computational efficiency is en- plex in the binding region was found to closely
hanced by a grid-based energy evaluation resemble the crystal structure. Still, the
using Bezier splines, which enables one to method remains computationally expensive,
evaluate gradients and hence to perform min- and thus it is not yet suited for a large-scale
imization on a 3D grid (118, 119). Further application to practical drug design docking
Monte Carlo minimization docking proce- problems.
dures have been reported by Caflisch et al. Other docking applications of MD have
(150,151). Also, the QXP program of McMar- been reported as well. In a comparison of a
tin and Bohacek relies on Monte Carlo tech- CHARMM-based MD docking algorithm with
niques combined with energy-minimization a Monte Carlo and a genetic algorithm, Vieth
procedures (152). et al. have observed a comparatively good per-
Molecular dynamics (MD) simulations rep- formance of the MD search for the five ana-
resent another technique to sample configura- lyzed test cases (166). Pak et al. have recently
tion space (31, 153-157). Based on Newton's presented a docking approach based on so-
equation of motion and principles of statistical called q-jumping MD (167, 168); its basic idea
thermodynamics, the standard application of is to apply a smoothed generalized effective
this technique is to analyze flexibility and dy- potential to enhance conformational sampling
namic properties of molecular systems and to by MD. Luty et al. have combined a grid rep-
calculate free energies in a theoretically rigor- resentation for the bulk portion of the recep-
ous manner (158-163). With respect to pro- tor with MD simulations of the ligand in the
tein-ligand docking, MD simulations could in flexible binding site (169). Multiple-copy si-
principle be used to simulate the actual bind- multaneous search methods (MCSS) can help
ing process, thus providing a "realistic" view to speed up energy-based searches. They use
of how the docking process proceeds, although numerous ligand copies that are transparent
this is computationally still out of reach. In to each other, but subject to the full force of
fact, standard MD requires massive computa- the protein (170, 171). Finally, short MD sim-
tional resources, which limits its application ulations are occasionally used at some stage of
to a small number of selected systems. In the a docking procedure, primarily with the pur-
context of docking, the problem is that stan- pose of local refinement, as for example in the
dard MD is slow in exploring global features multistep docking strategy of Wang et al.,
(crossing of large barriers and exploration of where the last step is an MD-based simulated
multiple binding sites); accordingly, MD is es- annealing (129).
sentially limited to the simulation and refine- The third major class of search methods are
ment of already bound complexes. Di Nola et genetic algorithms (GAs), which are widely
al. have addressed this problem in their MDD used for docking purposes. GAs are stochastic
(MD docking) algorithm (164, 165). This optimization methods inspired by the con-
method separates the ligand's center of mass cepts of evolution (172-174). The optimization
motion from its internal motions. A separate problem is generally formulated in the lan-
guage of genetics. Initially, a random popula- eration, a user-defined fraction of the popula-
tion is generated in which each member corre- tion is subjected to such a local minimization.
sponds to a potential solution of the problem. This hybrid algorithm was found to be more
A member of the population is represented by efficient than a traditional GA, also imple-
its chromosome, in which the variables to be mented in AutoDock. A conceptually similar
optimized are encoded. This means that each strategy has recently been implemented into
mosome contains a number of genes, the docking program DARWIN (178). Here, a
re the genes correspond to the value of a standard GA is combined with a gradient min-
ain variable or set of variables. In the case imization search strategy through an inter-
of docking, the variables for translation and face to the CHARMM molecular mechanics
rotation, as well as the torsion angles of the program (179). Further GA-based docking
gand, are encoded in the chromosome. Ge- methods can be found in the literature (86,
netic operators are then applied to the initial 180-183).
population to generate a new population. In Another class of evolutionary algorithms
nerd, these operators are "crossover," by that has occasionally found application in the
ch genes from two distinct chromosomes context of docking is known as evolutionary
interchanged to generate two new individ- programming (184). Its main difference with
als, and "mutation," by which a given gene is respect to GAS is that there is no recombina-
andomly modified. For each newly generated tion (crossover) operator, such that evolution
vidual the chromosome is decoded (geno- is wholly dependent on mutation. Gehlhaar et
e +phenotype) and the fitness of the indi- al. (185) and Westhead et al. (186) have dem-
al is evaluated. In the context of docking, onstrated the applicability of evolutionary
fitness is the interaction energy or dock- programming to the docking problem, although
g score. Individuals with better scores re- in a comparative study other algorithms were
ive a higher chance for being selected as found to be more effective (186). A new variant
ers of the new population, and thus a called "family competition evolutionary algo-
er chance of survival and reproduction rithm" has recently been proposed for docking
o the next generation. Accordingly, the av- (187).
age fitness increases from generation to gen- Besides the three major classes of energy-
ration, until, at some point, the process is driven searches (MD, MC, GA), some further
rminated (by reaching either a fixed number heuristic algorithms and search strategies
generations or a constant fitness of the pop- have been developed or adapted for the dock-
on). The best individual of this final pop- ing problem. "Tabu search" was found to per-
ion represents the solution. form well in comparison with other algo-
any different variants and implementa- rithms (186) and has thus become the main
ons of GAS for docking exist, but the general search strategy of the PRO-LEADS docking
atures are always similar. The application of program (188, 189). Briefly, the tabu search
GAS in drug design and docking has been re- operates on randomly generated positions
by Clark et al. (175). A prominent ex- that are examined on the basis of a tabu list.
of a docking program based on a GA is This list contains a number of previously gen-
(176, 177). A special characteristic of erated solutions and serves to impose restric-
LD is the direct encoding of hydrogen tions on the search process: a random move of
nding motifs in the chromosome represen- the ligand is considered "tabu" if it generates a
ion. Upon chromosome decoding, a least- solution that is not sufficiently different from
ares fit is used to optimize the overlap of the stored solutions, unless its energy is more
plementary pairs of hydrogen-bonding favorable than the energy of the best solution
es present in the ligand and the receptor. so far. Using these restrictions, the search is
e newest version of AutoDock contains an prevented from revisiting regions of the
resting variant of a GA, a so-called search space and the exploration of new areas
arckian GA (115).This is the combination is encouraged. Ideas from tabu search are also
a traditional GA with a local search method used in the recently described adaptation of
perform energy minimization. At each gen- the Mining Minima algorithm for protein-
Docking and Scoring FunctiondVirtual Screening
ligand docking (190). Here, an exclusion zone docking it is frequently not justified to neglect
is placed around each energy minimum as it is protein flexibility (35). If no alternative for
discovered, to avoid rediscovering it in future docking into the rigid protein is available, at
docking iterations. Mining Minima itself is least a protein conformation (possibly from a
based on a variety of optimization techniques complex structure) should be used that is com-
to gradually focus a large region of random patible with suitable binding modes. Obvi-
search to areas around the lowest energy min- ously, a preferable docking tool would con-
ima. sider full protein flexibility, but appropriate
realization of this goal remains a challenge be-
3.2 Special Aspects of Docking
cause of the high dimensionality of protein
Besides the general characteristics outlined conformation space. Consideration of protein
above, there are a number of special issues flexibility also complicates the problem of
associated with the docking methodology that scoring and selecting the best ligand place-
deserve explicit consideration: protein flexibil- ment, given the difficulty in accurately evalu-
ity, water molecules, and objective assess- ating protein conformational free energies in
ment. In addition, the interplay of docking addition to ligand-binding free energies.
with QSAR methods and homology modeling Current approaches to the problem of flex-
is of further interest to highlight the possibil- ible protein docking have recently been re-
ities opened by combined application of stan- viewed by Carlson and McCammon (196), and
dard methods in structure-based drug design. more briefly by Abagyan and Totrov (18)and
Claussen et al. (197). The methods differ by
3.2.1 Protein Flexibility. Proteins are in- the degree of flexibility they can cover. The
herently dynamic systems (153,191).A single, least complex methods are those that model
fixed conformation, even the average provided small adjustments of contact residues and side
by a crystal structure, may not be an adequate chains in an implicit way using soft docking.
representation of the protein, unless the sys- The protein itself remains fixed, but either
tem is very rigid (192). Instead, even under through an adapted geometric representation
standard equilibrium conditions, the native or using a tolerant scoring function a certain
folded state of a protein is best characterized amount of overlap between the protein and
by a collection or ensemble of energetically the ligand is allowed, emulating some "plastic-
nearly equivalent conformations. If the condi- ity" of the receptor. The docking program by
tions are changed, the local minima and the Jiang and Kim based on the matching of mo-
population of these states may shift, eventu- lecular surface cubes is explicitly based on this
ally resulting in an observable change of the soft docking idea (103). Other more recent
average structure. Also, the introduction of a docking approaches have implemented a soft
ligand corresponds to a change of the environ- scoring function (198). The advantage of these
ment that may lead to similar effects. Accord- simple approaches is that they do not increase
ingly, the binding conformation of the recep- the demands on computing time.
tor may already be present in the ensemble of The next level is represented by methods
protein conformations (193, 194) and the li- that allow for explicit side-chain flexibility.
gand does not actively deform a fixed state of GOLD'S genetic algorithm can handle the ro-
the protein, as generally inferred from the "in- tation of a few terminal hydrogen-bond donor
duced fit" model. and acceptor groups to optimize the hydrogen-
Whatever the actual mechanism might be, bonding network (176, 177). A technique to
the comparison of experimental protein struc- handle larger side-chain movements is the use
tures in the ligand-free and in the complexed of side-chain rotamer libraries, as first demon-
state frequently shows protein conforma- strated by Leach. In this approach, heuristic
tional changes induced by or associated with algorithms such as dead-end elimination are
ligand binding (195). The spectrum of phe- used to search the large combinatorial space
nomena ranges from side-chain rotations to (199). Schaffer and Verkhivker instead use a
loop rearrangements and the movement of en- rotamer library to first generate likely side-
tire domains. Accordingly, in the context of chain conformations, which are then sub-
3 Docking
jected to energy minimization together with MD simulations with grid-based docking pro-
the docked ligand (200). Another approach tocols (209). The third and most sophisticated
making use of minimization has been de- approach to handle protein ensembles is im-
scribed by Apostolakis et al.: after "seeding" plemented into FlexE, a variant of the FlexX
the receptor with randomly generated ligand program (197). FlexE is based on a united pro-
positions that may overlap with the protein, tein description generated from the superim-
the complex is subjected to minimization, dur- posed structures of the ensemble. For the
ing which nonbonded interactions are gradu- parts that differ among the protein structures,
ally switched on, to gently relieve steric over- discrete alternative conformations are explic-
lap by minor conformational changes of the itly taken into account on the fly during the
ligand and receptor. The best-ranked solu- incremental construction of the ligand in the
tions are then subjected to further refinement binding site. As an important feature, these
by Monte Carlo minimization (151). Further- geometric alternatives are optimally joined to
more, the Monte Carlo minimization tech- create new valid protein structures in a com-
nique in internal coordinates of the ICM pro- binatorial fashion. Thus, conformations of the
gram can sample and optimize side-chain protein are not limited to those explicitly
torsions during ligand docking (117, 201). Fi- present in the ensemble, nor are the interac-
nally, the docking tool SLIDE allows for some tions blurred by averaging over distinct alter-
side-chain flexibility at the optimization stage native instances, which may correspond to un-
of initial placements. In SLIDE, collisions are realistic protein conformations.
resolved by rotations about single bonds in the The so-called Low Mode Search (LMOD),
ligand and the protein side-chains to reduce a originally established as a method for confor-
maximal number of collisions by minimal con- mational analysis (2101, has recently been
formational changes of both binding partners demonstrated to be applicable also to the prob-
(111,202). lem of docking flexible ligands into flexible
An alternative to account, in principle, for protein binding sites (211). To explore the po-
an arbitrary degree of protein flexibility is the tential energy surface of molecules, LMOD is
use of protein structure ensembles. The en- based on eigenvector following, where eigen-
sembles could be assembled from multiple vectors correspond to the (low-frequency)
crystal structures of a given protein, from "normal modes" of vibration. For the purpose
NMR structure determination, or from trajec- of docking, LMOD has been combined with a
tories of molecular dynamics simulations. In limited torsional Monte Carlo movement, as
addition, a rotamer library can be used to cre- well as random translation and rotation of the
ate a minimal set of new conformations (203). ligand.
Whatever the origin of the individual mem- Generally, however, full consideration of
bers of the ensemble, each represents a dis- flexibility, either of the binding site or the
tinct conformational state of the protein, and entire protein, remains the domain of MD sim-
may eventually correspond to the preferred li- ulations. The disadvantage is their high com-
gand-binding state. Three different ways to putational demand required to achieve signif-
use protein ensembles for docking can be dis- icant sampling. Simplified MD restricted to
tinguished: in its most straightforward form, the binding site has been used by Luty et al.,
docking is carried out sequentially with each where the bulk of the protein receptor is rep-
member of the ensemble using rigid-receptor resented as a grid, whereas a full atomic de-
docking (124,204-206). Another way is to use scription is used only for the proximity of the
a weighted-average representation of the en- binding site to include flexibility in the dock-
semble. Knegtel et al. followed this approach ing process (169). The approach of Mangoni et
by generating composite grids that were used al. mentioned above provides a method for en-
for scoring within the DOCK program (207). hanced sampling. It has been used to dock a
Recently, it has also been tested with ligand into a receptor that is treated fully flex-
AutoDock (208). Broughton has developed an- ible and solvated with explicit water molecules
other method by combining statistical analy- (165). Alternatively, shorter MD runs may be
sis of a conformational ensemble from short used at intermediate or final stages of a dock-
ing procedure to refine complexes generated tion. Because of the high computational costs,
by rigid-body docking methods. In this case, the approach seems affordable only in special
however, flexibility is not considered simulta- cases where the presence of explicit solvent
neously to the docking process. It thus only appears important.
refines solutions from rigid receptor docking An approach to explicitly place water mol-
and does not enhance the scope of the search ecules during fast docking has been intro-
for possible binding modes. duced into F l e a (216). In a preprocessing
phase, possible favorable water sites in the
3.2.2 Water Molecules. Water plays a cru- binding pocket are calculated and stored. Dur-
cial role in molecular interactions (212, 213). ing the incremental construction phase of
At the interface of a protein-ligand complex, FlexX, water molecules are switched on at
water molecules can have a significant impact these sites if they provide additional hydrogen
on complex formation, either by mediating or bonds to the ligand. Steric constraints pro-
improving specificity and affinity of the inter- duced by these water molecules and the qual-
action. They promote adaptability, thus allow- ity of the achieved hydrogen bond geometry
ing for promiscuous binding (214). Individual are then used to optimize the ligand orienta-
conserved ("structural") water molecules can tion during the construction process. In sev-
therefore be crucial for the successful design eral cases, water molecules between protein
of new inhibitors. A prominent example is the and ligand could be correctly predicted; how-
structural water molecule observed in nearly ever, the overall improvement on the FlexX
all HIV protease complexes with substrate- docking results for a test set of 200 complexes
like inhibitors. Attempts at replacing it have was nearly negligible.
guided the design of new tight-binding inhibi- The program SLIDE can consider tightly
tors [e.g., (21511. Instead of the usual implicit bound waters while docking potential ligands
modeling of solvation effects, explicit consid- (111). To select which water molecules to re-
eration of structural water molecules and wa- tain and which to remove from the binding
ter-mediated interactions would therefore be pocket before docking, the knowledge-based
a highly desirable feature in docking methods. approach Consolv (217) is applied to deter-
Ideally, simultaneously to the ligand place- mine those waters that are likely to be con-
ment the docking program should be able to served upon ligand binding and to adjust a ,
predict whether at a particular site water mol- penalty for their displacement. Once these wa-
ecules mediating protein-ligand interactions ters have been selected to be initially retained
may preferably reside or whether the displace- upon docking, SLIDE either translates or dis-
ment of these water molecules by appropriate cards a water molecule to remove overlap with
ligand functional groups would be more favor- ligand atoms after the ligand has been docked
able. No docking tool is yet available to accom- to the binding site. Displacement of a water
plish this task. Obviously, not only the place- molecule is performed only if collisions cannot
ment of water molecules is demanding, but be resolved by iterative translations. Any dis-
especially their energy scoring, resulting from placements by nonpolar ligand atoms are pe-
the complicated thermodynamics associated nalized upon scoring. In database screening
with water interactions. runs on three different target proteins, this
In principle, MD simulations provide the procedure was found to produce reasonable re-
most natural route to the explicit consider- sults with respect to water-mediated interac-
ation of water molecules. In the MD docking tions, but no systematic test has been reported
approach described by Mangoni et al., explicit so far.
water molecules are indeed used (165). It was As long as a simultaneous docking of water
found, though, that the presence of explicit molecules and ligands is an unsolved problem,
water molecules shields the interactions be- it remains common practice to consider essen-
tween the ligand and the receptor. Conse- tial water molecules as a fixed part of the bind-
quently, different weights were applied to the ing site. Preplaced water molecules may either
ligand-receptor and ligand-solvent interac- correspond to recurrently observed waters
tions, respectively, to cope with this complica- found in multiple crystal structures of the tar-
3 Docking
get protein, or to predicted positions based on by such a test. The number of complexes used
estimated water aMinity potentials suggested has varied as much as the reported success
by programs such as GRID (218-220). The lat- rates, which are between 10% (224) and 100%
ter strategy has been applied by Minke et al. (152). Clearly, success rates of 100% are
using AutoDock (221), showing that success- rather a consequence of the limited test set
ful docking of carbohydrate derivatives to the size than a reflection of the mere quality of the
heat-labile enterotoxin critically depends on docking method.
the inclusion of water molecules. Examples for Numerous critical issues have to be ad-
the consideration of experimentally observed dressed in this context. Validations carried out
water molecules as part of the target during on very few complexes (120) do not ade-
docking are the studies of Rao et al. (docking quately assess the scope of the method, partic-
to factor Xa using AutoDock) (222) and Pospi- ularly if no attempt was made to select a
sil (docking to thymidine kinase using representative set of structures that appropri-
AutoDock and FlexX). (223).
. The influence of ately covers a broad range of binding features
explicit water molecules in docking was also important to protein-ligand complexes. Up to
investigated in the validation study of the new now, only a few docking methods have been
program DARWIN (178). Inclusion of explicit assessed on a broad range of complexes [e.g.,
water molecules was essential in some cases, F l e d (200 complexes) (139), ScoreDock and
unless interaction energies were calculated DOCK (200 complexes) (225), EUDOC (154
with a Poisson-Boltzmann-based implicit sol- complexes) (125), DOCK, FlexX, and Drug-
vent model. Yet another example is a search Score (100-150 complexes) (2261, GOLD (100
for metallo-P-lactamase inhibitors (14) with complexes) (177), the method of Diller and
the docking program FLOG. Docking was per- Merz (using the GOLD test set) (1121, and
formed with three different configurations of PRO-LEADS (70 complexes) (18911. In the
bound water in the active site. The top-scoring case of GOLD it has been explicitly mentioned
compounds showed an enrichment in biphenyl that the test set was selected by a researcher
tetrazoles. A crystal structure of one tetrazole not involved in the development of the algo-
not only confirmed the predicted binding rithm (177). The definition of an objective and
mode but also displayed the water configura- relevant reference test set that could serve as
tion that had, retrospectively, been the most standard benchmark for every new docking
predictive one of the three models. Further method would be highly desirable for both
examples from virtual screening studies are user and developer (18). First efforts in devel-
available that show that the inclusion of con- oping a database that could be of use in this
served water molecules in the docking process context have been reported (227). Suitable
can dramatically improve the hit rate (15,161. test sets should cover a sufficient number of
highly diverse protein-ligand complexes, in-
3.2.3 Assessment of Docking Methods. cluding cases that provide some challenge to
Docking methods are usually assessed by their docking methods (e.g., water-mediated inter-
ability to reproduce
- the binding- mode of ex- actions, interactions with metal ions). To test
perimentally resolved protein-ligand com- performance with respect to potential induced
plexes: the ligand is removed from the com- fit, the structure of the unligated protein or
plex, a search area is defined around the actual alternative complexes with different bound li-
binding site, the ligand is redocked into the gands should be available as well. The test set
protein, and the achieved binding mode is should comprise fully resolved crystal struc-
compared with the experimental position, tures with a resolution of 52.5 A. Complexes
usually in terms of a root-mean-square devia- with ligands significantly involved in crystal
tion (rmsd).If the rmsd is below 2 A, it is gen- packing contacts should be avoided. Such
erally considered a successful prediction. The cases will likely fail in reproducing the exper-
vious goal is that such a "near-native" solu- imental binding mode because of missing con-
tion is ranked best among the set of ligand tacts present only in the packing environment
poses generated. Virtually any introduction of (228). Finally, the importance to study low-
a new docking method has been accompanied affinity or "non-binding" ligands must be ad-
dressed; accordingly, experimental informa- late experimental binding data with features
tion about the binding geometry and affinity described by a set of relevant descriptors. In
of some weak-binding ligands should also be 3D QSAR, such as CoMFA (Comparative Mo-
available. lecular Field Analysis), these descriptors are
In addition to the tests usually reported by essentially virtual interaction energies (van
the authors of a program, comparative studies der Waals and coulombic), calculated using an
have been reported
- on the assessment of dif- appropriate probe atom placed at the intersec-
ferent docking and scoring approaches. In part tions of a regularly spaced grid surrounding
they also address some of the aspects raised the molecules. The model derived from differ-
above. Westhead et al. have presented a com- ences in the various interaction fields -provides
parison of four heuristic search algorithms
a quantitative spatial description of those mo-
(simulated annealing, genetic algorithm, evo-
lecular properties that matter for binding.
lutionary programming, and tabu search)
(186). In an attempt to provide an unbiased They can be interpreted as a surrogate repre-
comparison, all algorithms were implemented sentation of the binding site. Essential for the
into the PRO-LEADS program and a single success of all 3D QSAR approaches is an ap-
scoringfunction was used. Other recent exam- propriate alignment of the ligands: their rela-
ples are the studies of Ha et al., who compared tive spatial superposition must reflect the dif-
DOCK (using two different scoring functions) ferences in binding geometry also experienced
and F l e a (229),and, in the context of virtual at the binding site of the structurally un-
screening, the work of Bissantz et al., who known protein. Various strategies have been
compared DOCK, F l e a , and GOLD together developed to achieve this goal (235, 236). In-
with seven different scoring functions (230) creasingly, however, these methods are also
(cf. also Section 5.2 below). applied if the receptor structure is known.
An unbiased test scenario is guaranteed if This results in "receptor-based 3D QSAR," a
researchers are provided with a set of protein- combination of a ligand-based QSAR approach
ligand complexes of experimentally resolved, with information extracted from receptor
but yet unpublished structure. Two such blind structures (238). This additional information
trial competitions have been carried out so far is used to generate a ligand alignment based
(231, 232). A series of interesting issues re- on the experimental or predicted binding,
garding docking tests and problems with true mode of the ligands in the binding site. The
predictions have been amply discussed by standard 3D QSAR techniques are subse-
Dixon (231) and participants in the CASP2 quently used to derive a correlation model and
docking competition (117, 145,233, 234). Un- to ultimately predict the binding affinity of
fortunately, the number of targets subjected new, appropriately aligned ligands (239). As a
to such blind tests has so far been rather practical advantage, receptor-based 3D QSAR
scarce. A major limitation to such blind com- provides important information as to which of
parisons is the availability of experimental the protein-ligand interactions are responsi-
data before publication. ble for the variance in biological activity
among the given set of ligands.
3.2.4 Docking and QSAR. As long as the Obviously, in the case of known receptor
problem of accurate binding free energy pre- structure, the ligand alignment can be ob-
diction on the basis of a given complex geom- tained by docking. This strategy has indeed
etry has not been resolved (cf. section on scoring been followed in a variety of studies: it has
functions), computational methods establishing been used to set up CoMFA models [e.g., (24011
quantitative structure-activity relationships or extended to the Comparative Binding En-
(QSARs) to estimate relative binding aMinity ergy (COMBINE) analysis (241-244), that ex-
differences within a set of ligands remain a plicitly exploits receptor information to gener-
pragmatic alternative. Both classical and 3D ate the QSAR descriptors. Furthermore, in a
QSAR methods have been developed as ligand- GRIDIGOLPE (245) analysis, the model gen-
based approaches (235-237). They rely exclu- erated with the docking alignment has been
sively on ligand information and try to corre- compared to the traditional CoMFA model
based on ligand alignment (238, 246); the often beyond the scope of the method. In fact,
alignment generated by docking could be members of a homologous protein family may
shown to exhibit higher relevance. show considerable differences in the binding
Another concept to combine docking with region. Accordingly, homology models may
QSAR has recently been proposed by Vieth not be sufficiently accurate to apply standard
and Cummins in their DoMCoSAR approach docking tools, and special methods addressing
(247). DoMCoSAR is used to statistically de- the docking of ligands to low-resolution struc-
termine the docking mode that is consistent tures have been presented (248).
with a structure-activity relationship, based Clearly, flexible-receptor docking could
on the explicit assumption that all molecules help to alleviate the problem. A frequently fol-
exhibit the same binding mode. In a first step, lowed alternative is to refine the initial com-
all molecules of a chemical series with complex between the protein model and the li-
mon substructure are docked in an unbiased
gand, most commonly by relaxation with MD
way to the protein binding site and the results
simulations (249-251). This may also be com-
are clustered to establish the most favorable
docking modes for the common substructure. bined with free energy calculations to deter-
Subsequently, constrained docking is per- mine the binding mode most consistent with
formed by forcing all molecules to align with experimental affinity data (252). However, re-
the common substructure in the major dock- finement does not overcome the problem that
ing modes. In a final stage, interaction-en- the initial conformation of the model may pre-
ergy-based descriptors are calculated for all clude the binding of certain ligands. This has,
major docking modes. QSAR models are then for example, been demonstrated by Schapira
derived to determine the statistically signifi- et al. in a virtual screening for retinoic acid
cant and most predictive set of descriptors and receptor (RAR) antagonists based on an RAR
thus the docking mode that is most consistent homology model (201). The automatic selec-
with a given structure-activity relationship. tion procedure based on flexible ligand dock-
As noted by the authors, the appeal of this ing was followed by optimization of the se-
method is that an objective statistical justifi- lected candidates with flexible protein side
cation for the selection of a binding mode is chains using the ICM program (82,116,117).
obtained. This may especially prove useful in Nevertheless, some known ligands were rg-
cases where the primary docking scores yield peatedly missed by the screening algorithm
nearly degenerate multiple binding modes and because of incompatible binding site confor-
a selection of the most representative result is mations. Consideration of side-chain flexibil-
difficult. However, because one alignment is ity already in the initial docking simulation
rendered prominent among others for the was required to accommodate these ligands.
sake of best agreement with the derived QSAR An approach developed especially for the
model, the danger exists that unconsidered or purpose of docking ligands into approximate
ill-defined descriptors in the QSAR could pos- protein models generated by homology model-
sibly distort the final or accepted alignment. ing is the DragHome method (253). The bind-
ing site is analyzed in terms of putative ligand
3.2.5 Docking and Homology Modeling. In interaction sites and translated using Gauss-
the absence of an experimental protein struc- ian functions into a functional binding-site de-
, a homology model may be used for dock- scription represented by physicochemical
and structure-based design. Such a model properties. Similarly, ligands are translated
be generated by comparative modeling into a description based on Gaussian functions
based on homologous proteins of known struc- and the dockingis computed by optimizing the
ture. Obviously, it is most reliable in the re- overlap between the two functional descrip-
ons of highest homology between the tem- tions. The use of "soft" Gaussian functions to
s and the target protein. Although an describe protein-ligand interactions is one
era11 skeleton of the target protein can fre- possibility to take into account the limited ac-
ently be obtained with sufficient accuracy, curacy of modeled structures for the purpose
e structural details of the binding site are of docking. The method for generating and op-
timizing ligand orientations relative to the accurate values of binding free energies, ex-
binding-site representation was adapted from tensive Monte Carlo or MD simulations are
the ligand alignment program SEAL necessary, which require large computational
(254-256). For a set of different ligands, the resources. Clearly, this is impractical for stan-
generated solutions are analyzed with respect dard docking applications. Furthermore, even
to the mutual ligand alignment. This align- the most advanced techniaues are reliable
ment is then used to generate 3D QSAR mod- only for calculating binding free energy differ-
els, which in turn can be interpreted with re- ences between closely related ligands (162,
spect to the surrounding protein model. This 163, 257, 258). However, some less rigorous,
can highlight inconsistencies and deficiencies but faster and, as experience shows, often not
present in the model, and thus information less accurate methods have been developed,
which in future developments of the methods that are suitable to handle larger numbers of
is planned to be fed back into a subsequent ligands. For example, continuum solvation
modeling step to improve the protein model. models are used to replace explicit solvent
The idea behind this is that the cycle of dock- molecules at least in the final energy evalua-
ing and alignment, ligand data analysis (3D tion of the simulation trajectory (2591, or lin-
QSAR), and protein structure modeling ear response theory is applied (260-262),
should be repeated until self-consistency is sometimes augmented by a surface term
achieved. This would provide a protein homol- (263).
ogy model optimized with respect to the bind- Scoring functions that can be evaluated
ing site and suitable to obtain consistent dock- fast enough to be applied in docking and vir-
ing results. tual screening can only estimate the free en-
ergy of binding. They usually take into ac-
count only one possible configuration of the
4 SCORING FUNCTIONS
receptor-ligand complex and disregard ensem-
ble averaging and explicit properties of the un-
This section is dedicated to the scoring aspect
bound states of the binding partners. Further-
of the docking problem. Various approaches
more, all methods share the assumption that
are discussed that try to capture the essential
the free energy can be decomposed into a sum
elements of protein-ligand interactions in
of terms (additivity). In a strict physical sense,
computationally efficient scoring functions.
this is not allowed, given that the free energy
The discussion focuses on general approaches
of binding is a state function, although its
rather than individual functions. The reader
components are not (77,264). In addition, sim-
is referred to Table 7.2 for original references
ple additive models cannot describe subtle co-
to the most important scoring functions.
operativity effects (265). Nevertheless, it is of-
ten useful to interpret receptor-ligand binding
4.1 Description of Scoring Functions for
in an additive fashion (266-2681, and esti-
Protein-Ligand Interactions
mates of binding free energy based on the ad-
Reversible protein-ligand binding is an equi- ditivity assumption are often accessible at
librium between the bound state and the un- very low computational cost.
bound state of the binding partners. The rig- Three main classes of fast scoring functions
orous theoretical description requires full can be distinguished:
- force field-based meth-
consideration of all species involved: the sepa- ods, empirical scoring functions, and know-
rate solvated protein, the separate solvated li- ledge-based methods. The following sections
gand, and the solvated complex, in which the are dedicated to a separate discussion of each
binding partners are partially desolvated and method.
form interactions with each other. The quan-
tity of interest to characterize this equilibrium 4.1 .I Force Field-Based Methods. An obvi-
is the free energy of binding. Its most accurate ous idea to circumvent parameterization ef-
calculations are based on the evaluation of en- forts for scoring is to use nonbonded energies
semble averages according to principles of sta- of existing, well-established molecular me-
tistical mechanics (45). To obtain reasonably chanics force fields for the estimation of bind-
4 Scoring Functions 307
Table 7.2 Overview of Currently Used Scoring Functions

Selected
Year Original References to
Type of Function Name of Function Published References Applications
Force field Charmm 1998 (274)
Force field + (Schapira, Abagyan et al.) 1999 (280)
desolvation
AMBER + desolvation 1999 (276)
Charmm + PB 1999 (393)
AMBER + desolvation 1999 (278)
MM PB/SA 1999 (343) (344, 346)
Linear LIE 1994 (260) (261,263, 394)
response
Simplified OWFEG Grid 2001 (284) (395)
free-energy
perturbation
Empirical (Wade, Goodford et al.) 1989,1993 (220,298) GRID (218)
SCORE1 1994 (294) LUDI (108, 109);
(300,396)
(Miller, Sheridan et al.) 1994 (121) FLOG (121)
GOLD score 1995 (176, 177) GOLD (176,177)
PLP 1995,2000 (185, 367)
F l e d score 1996 (110) FlexX (110,130,
138, 397)
VALIDATE 1996 (307)
(Jain) 1996 (297) Hammerhead
(388)
ChemScore 1997 (80) (295)
SCORE2 1998 (296)
(Takamatsu, Itai) 1998 (398)
SCORE 1998 (293) (225)
AutoDock3 1998 (115) (391) .
Fresno 1999 (399) (230)
Screenscore 2001 (287)
Desolvation HINT 1991 (400) (308)
terms
(Zhang, DeLisi et al.) 1997 (305)
Knowledge-based SMOG 1996 (313) SMOG(314)
BLEEP 1999 (315,316) (340)
PMF 1999 (317) (299, 319, 320,
339,369)
Drugscore 2000 (226) (15,318)
ing affinity. In doing so, one substitutes esti- studies, however, experimental data repre-
mates of the free energy of binding in solution sented rather narrow activity ranges and cov-
by an estimate of the gas phase enthalpy of ered little structural variation.
binding. Even this crude approximation can The AMBER (271, 272) and CHARMM
lead to satisfying results. A good correlation (179) nonbonded terms are used as scoring
was obtained between nonbonded interaction function in several docking programs. As men-
energies calculated with a modified MM2 force tioned above (Section 3.1.1), protein terms are
field and IC,, values of 33 HIV-1 protease in- usually precalculated on a rectangular grid to
hibitors (269). Similar results were reported speed up the energy calculation compared to
n a study of 32 thrombin-inhibitor complexes traditional atom-by-atom evaluations (273).
with the CHARMM force field (270). In both Distance-dependent dielectric constants are
usually employed to approximate the long- the partial charges of the ligand atoms. This
range shielding of electrostatic interactions by approach seems to be successful for Ki predic-
water (274). However, compounds with high tion as well as virtual screening applications
formal charges still obtain unreasonably high (284). Its conceptual advantage is the implicit
scores as a result of overestimated ionic inter- consideration of entropic and solvent effects
actions. For this reason, a common practice in and some protein flexibility.
virtual screening is to separate databases of The calculation of ligand strain energy tra-
compounds into subgroups according to their ditionally also lies in the realm of molecular
total charges and rank these groups sepa- mechanics force fields. Although effects of
rately. When electrostatic interactions are strain energy have rarely been determined ex-
complemented by a solvation term calculated perimentally (3), it is generally accepted that
by the Poisson-Boltzmann equation (32) or high-affinity ligands bind in low-energy con-
faster continuum solvation models (e.g., Ref. formations (285, 286). If a compound must
275),effects of high formal charges are usually adopt a strained conformation to fit into a re-
leveled out. In a validation study on three pro- ceptor pocket, this should lead to a less nega-
tein targets, Shoichet and coworkers observed tive binding free energy. Strain energy can be
significantly improved ranking of known in- estimated by calculating the difference be-
hibitors upon correction for ligand solvation tween the global energy minimum of the un-
(276). The current version of the docking pro- bound ligand and the current conformation of
gram DOCK calculates solvation corrections the ligand in the complex. However, force field
based on the generalized Born (277) solvation estimates of energy differences between indi-
model (278). The method has been tested in a vidual conformations are not reliable for all
study where several peptide libraries were systems. In practice, better correlation with
docked into various serine protease active experimental binding data is observed when
sites (279). strain energy is used as a filter to weed out
In the context of scoring, the van der Wads unlikely binding geometries rather than in-
term of force fields is mainly responsible for cluding it in the final score. Estimation of li-
penalizing docking solutions with respect to gand strain energy based on force fields can be
overlap between receptor and ligand atoms. It time-consuming and therefore alternatives
is often omitted when only the binding of ex- are often employed, such as empirical rules
perimentally determined complex structures derived from small-molecule crystal data
is analyzed (280-282). (140). Conformations generated by such pro-
Very recently, a new contribution to the list grams are, however, often not strain-free be-
of force-field-based scoring methods has been cause only one torsional angle is regarded at a
developed by Charifson and Pearlman. This time. Some strained conformations can be ex-
so-called OWFEG (one window free energy cluded when two consecutive dihedral angles
grid) method (283) is an approximation to the are taken into account simultaneously (287).
expensive first-principles method of free en-
ergy perturbation (FEP). For the purpose of 4.1.2 Empirical Scoring Functions. The un-
scoring, an MD simulation is carried out with derlying idea of empirical scoring functions is
the ligand-free, solvated receptor site. During that the binding free energy of a noncovalent
the simulation, the energetic effects of probe receptor-ligand complex can be factorized into
atoms on a regular grid are collected and av- a sum of localized, chemically intuitive inter-
eraged. Three simulations are run with three actions. Such decompositions can be a useful
different probes: a neutral methyl-type atom, tool to gain some insight into binding phenom-
a negatively charged atom, and a positively ena, even without analyzing 3D structures of
charged atom. The resulting three grids con- receptor-ligand complexes. Andrews and col-
tain information on the score contributions of leagues derived average functional group con-
neutral, positively, and negatively charged tributions to the binding free energy by ana-
probe atoms located in various positions of the lyzing a set of 200 compounds for which the
receptor site. They are used for scoring a li- affinity to a receptor had been experimentally
gand position by linear interpolation based on determined (266). Such average functional
4 Scoring Functions 309
group contributions can then be used to esti- Usually, between 50 and 100 complexes are
mate the mean overall binding affinity of a used to derive the weighting factors. In a re-
compound independent of a particular binding cent study it has been shown that many more
site. This value can be compared to the exper- than 100 complexes were necessary to achieve
imental binding free energy: if the experimen- convergence (293). The reason for this finding
tal affinity is similar to or even more favorable is probably the fact that the publicly available
than the computed one, the ligand obviously protein-ligand complexes fall in a few rather
shows a good fit with the receptor and its func- strongly populated classes.
tional groups are supposedly all involved in Empirical scoring functions usually con-
tain individual terms for hydrogen bonds,
interactions with the protein; on the other
ionic interactions, hydrophobic interactions,
hand, if it is significantly less favorable, the
and binding entropy. Hydrogen bonds are of-
compound apparently does not fully exploit its
ten scored by simply counting the number of
potential to form optimal interactions. Simi- donor-acceptor pairs that fall into a given dis-
larly, experimental binding affinities have tance and angle range favorable for hydrogen
been analyzed on a per-atom basis in quest of bonding, weighted by penalty functions for de-
the maximal binding affinity of noncovalent viations from ideal standard values (80, 294-
ligands (288). It was concluded that in the 296). The amount of error tolerance in these
strongest binding ligands each non-hydrogen penalty functions is critical. If large deviations
atom on average contributes 6.3 kJ/mol to the from the ideal are tolerated, the scoring func-
binding energy. tion cannot discriminate sufficiently between
The analysis of binding phenomena can be different placements of a ligand, whereas too
performed with much more detail if the 3D stringent tolerances artificially score similar
structures of receptor-ligand complexes are complexes rather differently. Attempts have
available. Based on the assumption of additiv- been described to reduce the strong distance
ity, the binding affinity AGbind can be esti- dependency of such interactions by assigning
mated as a sum of interactions multiplied by soft modulating functions on an atom-pair ba-
weighting factors: sis (297). Other concepts try to avoid penalty
functions and introduce distinct regression co-
efficients for strong, medium, and weak hy-
drogen bonds (293). The Agouron group ha;
Here, each fi corresponds to an interaction used a simple four-parameter potential that is
term that depends on structural features of a piecewise linear approximation of a potential
the complex and each AGi represents a weight- neglecting angular terms ("PLP scoring func-
ing coefficient, which is determined on the bation") (185). Most functions consider all types
sis of a training set of experimental affinities of hydrogen bonds equivalently. Some at-
for crystallographically known protein-ligand tempts have been made to distinguish be-
complexes. Scoring schemes that use this con- tween different donor-acceptor functional
cept are called empirical scoring functions. group pairs. Hydrogen bond scoring in GOLD
Several reviews summarize details of individ- (176, 177) is based on a list of hydrogen bond
ual parameterizations (26, 44, 56, 289-292). energies, derived from ab initio calculations,
The individual terms in empirical scoring for any combination of 12 donor and 6 accep-
nctions are usually chosen such that they tor atom types. A similar differentiation of do-
uitively cover important contributions of nor and acceptor groups is attempted in the
e total binding free energy. Most empirical program GRID (218) for the characterization
ring functions are derived by evaluating of binding sites (219,220, 298). The consider-
e functionsf , on a set of protein-ligand com- ation of such lookup tables in scoring func-
xes and fitting the coefficients AG, to exper- tions might help to avoid false predictions
ental binding affinities of these complexes originating from an oversimplification of some
multiple linear regression or supervised individual interactions.
ing. The relative weight of the individual Reducing the weight of hydrogen bonds lo-
ributions depends on the training set. calized at the solvent-exposed rim of a binding
site is a useful concept to avoid false positives and acceptor groups are overrepresented
in virtual screening. This is achieved by reduc- (many peptide and carbohydrate fragments).
ing charges of surface-exposed residues in In most empirical scoring functions, a hy-
cases where explicit electrostatic terms are drophobic character is attributed to several
used (274) or by multiplying the hydrogen atom types, with equivalent weight for all hy-
bond contribution with a factor that depends drophobic contributions. In a more sophisti-
on the accessibility of the involved protein cated approach, the propensity of particular
counter group (299). atom types to be solvent-exposed or embedded
Ionic interactions are handled in a way sim- in the interior of a protein can be assessed by
ilar to hydrogen bonds. Long-distance charge- so-called atomic solvation parameters. These
charge interactions are usually neglected, and have been derived, for example, from experi-
it is thus more appropriate to refer to salt mental octanol/water partition coefficients
bridges or charge-assisted hydrogen bonds. (303, 304) or from protein crystal structures
The scoring function by Boehm implemented (305, 306). Atomic solvation parameters are
in LUDI (294)assigns a stronger weight to salt used in the VALIDATE scoring function (307)
bridges than to neutral hydrogen bonds. This and have been tested in DOCK (308).
differentiation generally proved successful in Entropy terms account for the restriction
scoring series of thrombin inhibitors (295, of conformational degrees of freedom of the
300). However, comparable to force field scor- ligand upon complex formation. A crude but
ing, the danger exists that highly charged mol- useful estimate of this entropy contribution is
ecules receive overestimated scores. Experi- the number of rotatable bonds of a ligand (294,
ence with FlexX containing a variant of 296). This measure has the advantage of being
Boehm's scoring function has shown that a function of the ligand only. More sophisti-
more reliable predictions are obtained if cated estimates try to take into account the
charged and uncharged hydrogen bonds are nature of the ligand portion on either side of a
handled equally in a virtual screening applica- flexible bond, particularly with respect to the
tion. Similar experience has also been col- interactions formed with the receptor (80,
lected using the ChemScore function (80). 307). This concept is based on the assumption
Hydrophobic interactions are usually cali- that purely hydrophobic contacts allow for
brated to the size of the contact surface buried more residual motion in the ligand fragments.
upon receptor-ligand complex formation. Of-
ten, a reasonable correlation between experi- 4.1.3 Knowledge-Based Methods. Empiri-
mental binding energies can be achieved con- cal scoring functions regard only those inter-
sidering only a surface term [see, for example, actions that are explicitly part of the model. Less
(1, 301, 302) and the discussion in Section frequent interactions are usually neglected,
2.1.1. Various approximations for such surface even though they can be strong and specific, for
terms have been described, for example, as example, NH-.rr hydrogen bonds. To generate a
grid-based (294) or volume-based approaches comprehensive and consistent description of all
(cf. the discussion in Ref. 115). Many functions these interactions in the framework of empirical
are based on a distance-dependent summation scoring functions would be a difficult task. How-
over neighboring receptor-ligand atom pairs. ever, the exponentially growing body of struc-
Distance-dependent cutoffs have been intro- tural data on receptor-ligand complexes can be
duced in various ways, either short (110) or exploited to discover favorable binding geome-
longer to include atom pairs that are not in- tries. "Knowledge-based" scoring fundions try
volved in direct van der Wads contacts (80, to capture the knowledge about protein-ligand
185). The weighting factor AGi of the hydro- binding that is implicitly stored in the protein
phobic term depends strongly on the training data bank by means of statistical analysis of
set. Supposedly, this fact has been underesti- structural data, without referring to often in-
mated in the development of many empirical consistent experimentally determined binding
scoring functions (35) because in most train- afKnities (309).They are based on the concept of
ing sets ligands composed of numerous donor the inverse formulation of the Boltzmann law,
4 Scoring Functions
searched within this sphere. Subsequently,

the sphere is subdivided into shells of a pre-
where the energy function EGis called a poten- defined thickness. The number of receptor
tial of mean force for a state defined by the atoms i matching each spherical shell is di-
variables i, j, and k ; pG, is the corresponding vided by the volume of this shell and averaged
probability density; and Z is the partition over all occurrences of ligand atoms j in the
function. The second term of the sum is con- evaluated data set of protein-ligand com-
stant at constant temperature T and does not plexes. The term piJ in the denominator is the
have to be regarded, given that Z = 1 can be average density of receptor atoms i falling into
selected by defining a suitable reference state, the whole reference volume. It is argued that
which leads to normalized probability densi- the spherical reference volume around each
t i e s ~ ~The
, . inverse Boltzmann approach has ligand atom needs to be corrected by eliminat-
been applied to assemble potentials from da- ing the occupied volume of the ligand itself,
tabases of protein structures to score protein given that ligand-ligand interactions are not
models in the context of protein structure pre- regarded in this area. This is done by a volume
diction (310). To establish a function to score correction factor fj(r) that is a function of the
protein-ligand complexes, the variables i, j, ligand atom only and gives a rough estimate of
and k are assigned to address protein and li- the preference of an atom of type j to be ex-
gand atom types, and their interatomic disposed rather than buried in the ligand.
tances. The occurrence frequency of individ- Muegge could show that the volume-correc-
ual contacts is a measure of their energetic tion factor contributes significantly to the pre-
contribution to binding. If a specific contact dictive power of the PMF function (319). Also,
occurs more frequently than expected by ran- reference radii between 7 and 12 A are applied
dom or seen in an average distribution, it is to implicitly include solvation effects, espe-
assumed to be favorable. On the other hand, if cially the propensity of individual atom types
it occurs less frequently, repulsive or unfavor- to be located inside a protein cavity or in con-
able interaction between two atom types is an- tact with solvent (320). To rank docking solu-
ticipated. The frequencies are thus converted tions, the PMF function is evaluated in a grid-
into sets of atom-pair potentials ready for fur- based fashion and combined with a repulsive
ther evaluation. van der Waals potential at short distances.
First applications in drug research (134, The Drugscore function by Gohlke et al.
311,312) were restricted to small data sets of (226) is based on roughly the same formalism,
HIV protease-inhibitor complexes and did not albeit with several differences in the deriva-
result in generally applicable scoring function that lead to different potential forms.
tions. Recent publications (226, 313-318), Most notably,- . the statistical distance distribu-
however, have shown the usefulness of these tions pG(r)lpGfor the individual atom pairs ij
approaches. The first general-purpose func- are divided by a common reference state that
tion using such potentials was implemented in is taken as the average over the distance dis-
the de novo design program SMOG(313,314). tributions of all atom pairs p(r) = X X pii(r)lij.
The PMF fundion by Muegge and Martin To consider only direct ligand-protein con-
(317),consists of a set of distance-dependent at- tacts, the upper sample radius has been set to
om-pair potentials EJr) that are expressed as 6 A. At this distance, no further atoms (e.g., of
a water molecule) can mediate a protein-li-
EJr) = - k T In[ f,(r)pzJ(r)lp"]. gand interaction. The individual potentials
have the form
Here, r is the atom pair distance, and py(r) is
the number density of pairs ij in a certain ra-
dius interval about r. This density is calcu-
lated by the following procedure. First, a max- These pair potentials are used in combination
imum search radius is defined. This radius with potentials depending on single (protein or
describes a reference sphere around each li- ligand) atom types that express the propensity
d atom j. Receptor atoms of type i are of an atom type to be buried within a particular
protein environment on complex formation. ciencies that one should be aware of in any
Contributions of these surface potentials and application. First, most scoring functions are
the pair potentials are weighted equally in the in some way fitted to or derived from experi-
h a l scoring function. This scoring function has mental data. The functions necessarily reflect
initially been developed with the primary goal to the accuracy of the data that were used for
differentiate between correctly docked (near na- their derivation. For instance, a general prob-
tive) ligand poses versus decoy binding modes lem with empirical scoring functions is the
for the same protein-ligand pair. However, fact that the experimental binding energies
through appropriate scaling also quantitative usually originate from many different sources
estimates across different protein-ligand com- and therefore consist of a rather heteroge-
plexes are possible (318). neous data set affected by all kinds of experi-
Mitchell and coworkers choose a different mental errors. Furthermore, scoring func-
type of reference state for their BLEEP poten- tions mirror not only the quality but also the
tial (315). The pair interaction energy is writ- scope of experimental data used for their de-
ten as velopment. Virtually all scoring functions are
still derived from data mostly based on high-
affinity receptor-ligand complexes. Many of
these are still of peptidic nature, whereas in-
teresting leads in pharmaceutical research are
non-peptidic. This is reflected in the relatively
Here, the number density pG(r)is defined as high contributions of hydrogen bonds in the
above, but it is normalized by the occurrence total score. The balance between hydrogen
frequency of all atom pairs at this same dis- bonding and hydrophobic interactions is a
tance instead of by the number of pairs in very critical issue in scoring, and its conse-
the whole reference volume. The variable mG quences are especially obvious in virtual
is the number of pairs ij found in the evaluated screening applications, as illustrated in Sec-
data set, and u is an empirical factor that de- tion 5.3.
fines the weight of each observation. This po-
tential is combined with a van der Wads po- 4.2.2 Molecular Size. The simple additive
tential as a reference state to compensate for nature of most fast scoring functions o f t q
the lack of sampling at short distances and for leads to gradually increasing scores for mole-
certain underrepresented atom pairs. cules of larger size. Although it is true that
Besides differences in the functional form small molecules with a molecular weight be-
and reference state, from a more practical low 200-250 rarely show very high affinity,
point of view, the knowledge-based potentials there is no physical reason why larger com-
differ also with respect to scope of atom type pounds should automatically possess higher
definitions and the amount of structural data activity. Comparing the scores of two com-
used for their derivation. The number of dif- pounds of significant size difference therefore
ferent atom types ranges from 17 in Drug- calls for a term that compensates the size de-
Score to 40 nonmetal atom types in BLEEP. In pendency. In some applications, a constant
all cases, the Protein Data Bank (321) was the "penalty" term has been added to the score for
source of the solved crystal structures. For each heavy atom (324) or a term proportional
BLEEP 351 selected complexes were used, to the molecular weight has been considered
whereas the PMF function was extracted from (325). The empirical scoring function imple-
697 complexes, and Drugscore was derived us- mented in the docking program FLOG has
ing 1376 complexes. In the latter case, the data been normalized to remove the linear depen-
have been extracted from Relibase (322,323). dency of the crude score on the number of li-
gand atoms (121).Originally introduced to im-
4.2 Critical Assessment of Current prove the correlation between experimental
Scoring Functions and calculated affinities, entropy terms re-
flecting the change in conformational mobility
4.2.1 Influence of the Training Data. All upon ligand binding also help to reduce an ex-
fast scoring functions share a number of defi- cessive score for overly large and flexible mol-
ecules (80,294). The size of the solvent-acces- remove them according to user-specified
ible surface of the ligand in its bound state thresholds (329). A promising approach to
can also be used as penalty term to discard properly reflect such cases is the inclusion of
large ligands not fully buried in the binding artificially generated, erroneous, decoy solu-
site. It should be noted, however, that all these tions in the optimization of scoring functions
approaches are very pragmatic in nature and as reported for the scoring function of a flexi-
do not solve the problem of size dependency, ble ligand superposition algorithm (330,331).
which is closely related to a proper under-
standing of cooperativity effects (265). 4.2.4 Specific Attractive Interactions. An-
other general deficiency of scoring functions is
4.2.3 Penalty Terms. In general, scoring the simplified description of attractive inter-
functions reward favorable interactions such actions. Molecular recognition is not entirely
based on hydrogen bonding and hydrophobic
contacts. Especially in host-guest chemistry,
other specific types of interactions are fre-
d energetically unfavorable quently used to characterize the observed phe-
d within the binding site nomena. For example, hydrogen bonds are
not observed and can hardly be accounted formed between acidic protons and T-systems
ased scoring function. (332). These bonds can substitute for conven-
owledge-based scoring functions try to cap- tional hydrogen bonds in strength and speci-
eferring to a reference ficity, as has been noted in protein-DNA rec-
te that corresponds to a mean situation. At ognition (333).Another type of less frequently
st glance, the neglect of angular terms in the observed interactions is the cation-T interac-
ge-based scoring function, which is especially important at the sur-
d pair potentials that face of proteins (39, 334). Current empirical
not discriminate sufficiently between dif- scoring functions usually neglect these inter-
ent binding geometries. However, some de- actions. Similarly, the directionality of inter-
dency is considered, actions between aromatic rings is hardly con-
n that pair potentials for different atom sidered (335, 336). Because of the regression-
are always evaluated in combination type adjustment, some energy contributions
each other (226). Obvious deficiencies in originating from these interactions are al-
functions, such as ready implicitly incorporated into the conven-
rostatic repulsions and steric clashes, can tional interaction terms. This might be one
avoided by defining reasonable penalty explanation why hydrogen bond contributions
em from molecular are traditionally overestimated in regression-
echanics force fields. This has been realized based scoring functions. Knowledge-based ap-
the "chemical scoring" function imple- proaches automatically incorporate these in-
ogram DOCK (106, teractions in a scoring function, provided they
ich is a modified occur with reasonable frequency in the data
der Wads potential being attractive or re- set used to develop the potentials.
4.2.5 Water Structure and Protonation

ons cannot be State. Uncertainties about protonation states
oided by simple "clash" terms, but require a and the involvement of water in ligand bind-
ing further complicate scoring. These consid-
. Among these are incomplete steric filling erations are relevant for the development as
the binding cavity by a ligand within the well as the application of scoring functions. As
mentioned above, the entropic and enthalpic
contributions involving the reorganization of
water molecules upon ligand binding are very
d interface. Possible approaches to re- difficult to predict (see, for example, Ref. 337).
these shortcomings are empirical filters Currently, the most pragmatic approach to
detect such unsatisfactory solutions and handle the water problem is the elucidation of
"conserved water molecules" and to consider structure prediction, several studies have
them as part of the receptor. A knowledge- shown that knowledge-based scoring func-
based tool to estimate the "conservation" of tions are at least equivalent to regression-
water molecules upon ligand binding has been based functions. The PMF function has been
developed (217) and incorporated into a dock- successfully applied to structure prediction of
ing procedure (111) (cf. Section 3.2.2). It is inhibitors of neuraminidase (339) and MMP3
based on crystallographic information and (229) in combination with the program
tries to extract rules about water sites by an- DOCK, yielding superior results to the DOCK
alyzing whether they are recurrently occupied force field and chemical scoring. The Drug-
by water molecules in series of related pro- Score function was tested on a large set of PDB
tein-ligand complexes. complexes and gave significantly better re-
Scoring functions require predefined atom sults than those of the original FlexX scoring
types for each protein and ligand atom. This function using solutions generated by FlexX
also implies the fixed assignment of a proton- as the docking engine. DrugScore performed
ation state to each acidic and basic group. similarly to the force field score in DOCK, but
Knowledge-based functions, which do not con- outperformed the chemical scoring (226).
sider hydrogen atoms, are equally affected by Moreover, with respect to the correlation be-
the problem because the atom type definitions tween experimental and calculated binding
normally imply a certain protonation state. energies, very promising results have been ob-
Presently, such estimates might be reliable tained with DrugScore (318) and PMF (229,
enough for the situation in aqueous solution; 317, 319, 339). BLEEP has recently been
however, significant pK, shifts are possible tested for scoring docked protein-ligand com-
upon ligand binding (338) as a result of strong plexes (340). It was found to be slightly better
changes of the local dielectric conditions. They than the DOCK energy function in discrimi-
give rise to protonation reactions in parallel to nating decoy situations from near-native bind-
the binding process. With respect to scoring, ing modes.
switching from a donor to an acceptor func- Although in many docking programs the
tionality because of altered protonation states same function is applied as an objective func-
has important consequences (279). Accord- tion for structure generation and for energy
ingly, improved docking and scoring algo- evaluation, better results can sometimes be
rithms must incorporate a more detailed and obtained if different functions are applied. In
flexible description of protonation states. particular, the docking objective function can
be adapted to the docking algorithm used. In a
4.2.6 Performance in Structure Prediction parameter study, Vieth et al. found that using
and Rank Ordering of Related Ligands. Similar a soft-core van der Wads potential made their
to the broad range of available docking tools MD-based docking algorithm more efficient
(cf. Section 3.2.31, the multitude of different (274). Using F l e S as the docking engine, we
scoring schemes calls for an objective assess- observed that the original FlexX scoring func-
ment to evaluate their scope and limitations. tion emphasizes directional interactions
This depends in part on the anticipated appli- (mostly hydrogen bonds) in the docking phase.
cation; that is, whether protein-ligand com- Subsequently, the ranking of individual li-
plexes should be predicted (using the scoring brary entries can be done successfully with a
scheme as objective function in docking), simple PLP potential that lacks directional
whether a set of ligands should be ranked with terms, but considers general steric fit of recep-
respect to one target protein (K, prediction), or tor and ligand. Results are significantly worse
whether the scoring function is used to select if PLP is used already in the incremental
possible hits out of a large database of candi- built-up procedure of the docked ligand.
date molecules (virtual screening). It is even more difficult to draw valid con-
An objective assessment of the available clusions about the relative performance of
scoring functions is difficult because only very scoring functions to rank sets of inhibitors
few functions have been tested on the same with respect to their binding affinity for the
data sets or with the same docking tool. For same target. First, there is hardly any pub-
lished study in which different functions have the binding site that could be included in the
been applied to the same data sets. Second, docking process. Tools such as Relibase (322,
experimental data are often not measured un- 323) may be used to perform these compara-
der the same conditions but collected from tive analyses of protein-ligand complexes in
various literature references. This retrieval an efficient way. Subsequently, programs like
from various sources usually implies larger GRID (218),LUDI (108, log), Superstar (351,
uncertainties within the experimental data 352), or Drugscore (318) are used to visualize
potential binding sites ("hot spots") in the ac-
The task of ranking sets of 10-100 related tive site; in principle, any scoring function
ligands with respect to one target can also be could be used for this purpose.
handled by computationally more demanding An important result of the 3D structure
methods. The most general approaches are analysis is usually the identification of one or
probably force field scores complemented by more key interactions that all ligands should
electrostatic desolvation and surface area satisfy. In aspartic proteases, for example, in-
terms. An example is the MM-PBSA method hibitors should form at least one hydrogen
that combines Poisson-Boltzmann electro- bond to the catalytic Asp side-chains, whereas
statics with AMBER molecular mechanics cal- in metalloproteinases a coordination to the
culations and MD simulations (341,342). This metal seems mandatory. Sometimes, a known
method has recently been applied to an in- ligand portion is used as initial scaffold based
creasing number of examples, showing quite on which virtual screening techniques search
promising results (343-346). Poisson-Boltz- for optimal side-chains. In principle, this step
mann calculations have been performed on a is not required, and instead one could fully
variety of targets with many related computa- rely on the docking and scoring step. However,
tional protocols (280-282, 347-350). Alterna- following a pragmatic approach, it is impor-
tively, extended linear response protocols tant to use any well-founded information that
(263) can be used. The OWFEG grid method is available about the system under consider-
by Pearlman has also shown very promising ation because more valuable results can usu-
ally be expected this way.
Once a reasonable hypothesis about the
binding-site requirements has been gener-
VIRTUAL SCREENING ated, the next level of virtual screening is ap-
proached. Whether databases of commercially
outlined in section, virtual screening is a available compounds or "virtual" libraries of
s. Although, in principle, the
designed compounds are screened, it is advis-
ole process can be fully automated, it is
able not to dock every possible compound, but
ghly advisable to allow for manual interven-
only those that pass a series of hierarchical
ual inspection and selection
filters (cf. also Fig. 7.3). Simple preliminary
starts with a detailed filters remove
sis of the available 3D protein struc-
ly homologous struc- 0 compounds with reactive groups such as
swill also be analyzed, either to generate S 0 , C l or -CHO because they are expected
tional ideas about possible ligand struc- to cause problems in some biological assays
me insight on how to as a result of unspecific covalent binding to
eve selectivity against other proteins of the protein.
same class. A superposition of different 0 compounds with molecular weights below
provides some ideas 150 or above 500. Small molecules such as
epeatedly found in benzene are known to bind to proteins
t-binding protein-ligand complexes. Such rather unspecifically at several sites. Large
overlay will also highlight flexible parts of molecules such as polypeptides are difficult
e protein or recurring water molecules in to optimize subsequently, given that good
31 6 Docking and Scoring Functions/Virtual Screening
Selection based on
known Zn-binding groups, e.g.:
0 H 0
N-N
,N K H II
-S-NH,
I I1
OH H 0
3D Pharmacophore based on
binding site "hot spots"
- Visual inspection
Figure 7.3. Hierarchical filtering process in virtual screening for carbonic anhydrase inhibitors.
bioavailability is usually limited to com- hits can then be submitted to a similarity

pounds with molecular weights below 500. search using information about already
compounds termed as "non-drug-like" ac- known active compounds, which could either
cording to criteria extracted from known be ligands already structurally characterized
drugs (353,354). by crystallography or hits from a complemen-
tary HTS study. This optional step of the an-
After this general preselection, it can be ad- alysis tries to incorporate
- all available infor-
vantageous to apply further steps of hierarchi- mation about known hits and produces a
cal filtering. As mentioned above, this could reranking of the candidate molecules to be
involve the selection of functional groups in- submitted to docking. As tools for the spatial
evitably required to anchor a ligand to the similarity analysis, Feature Trees (357, 358),
most prominent interaction sites. Subse- FlexS (330), and SEAL (254-256) have all
quently, the information of the "hot spot" been successfully applied.
analysis-translated into a pharmacophore All remaining ligands are docked into the tar-
hypothesis-can be used as matching crite- get protein and a list of some hundred to several
rion for a fast database screen. Such tools ei- thousand small molecules, each with a com-
ther involve fast tweak searching (355)or scan puted score, is produced. These have to be h-
over precalculated conformers of the candi- ther analyzed to discard undesirable structures.
date molecules (356). The list of prospective Selection criteria could be any of the following:
5 Virtual Screening
0 Lipophilicity (if not addressed before). 5.1 Combinatorial Docking

Highly lipophilic molecules are difficult to Docking of large compound collections re-
test because of their low solubility. quires fast algorithms. If the collection is an
0 Structural class. If 50% of the docked struc-
unstructured library of more or less unrelated
tures belong to one single chemical class, it compounds, each individual molecule must be
is probably not necessary to test all of them docked independently ("sequential docking"),
(359). and only the fastest docking methods are ap-
r Unreasonable docked binding mode. Fast plicable in this context, unless massive com-
-
docking tools cannot produce reliable solu- puter resources are used, as in the Dock-
tions for all compounds; often, even some Crunch project based on the PRO-LEADS
well-scoring compounds are simply docked program (360).Examples for such fast docking
to the outer surface of the protein or adopt tools are SLIDE (111)or the docking method
rather strained conformations to achieve by Diller and Merz (112). Both have been de-
good surface complementarity within the veloped for database screening and library pri-
binding pocket. Computational filters help oritization. Before docking, it is generally ad-
to detect such situations (329). visable to eliminate compounds that would
provide only redundant information (similar-
Finally, the selected compounds are ordered ity filters) or are very unlikely to yield high
or synthesized and then tested. If the goal is to scores. Clearly, the filter routines need to be
identlfy even weakly binding ligands as first faster than the docking and scoring procedure,
leads, suflicient sensitivity of the biological as- but this is normallv " the case.
say has to be ensured [cf., for example, Ref. 161. Complementary to initial filtering, a preor-
In this context it has also to be considered that ganization of compounds into families exhib-
limited solubility of the hits in water or water1 iting some kind of similarity has been demon-
DMSO mixtures often hampers affinity deter- .
strated to im~rovethe results of database
minations at high concentrations. screening. In the strategy shown by Su et al.
Successful virtual screening has to produce (359), all molecules of any family are docked
a set of compounds significantly enriched with and scored, but only the best-scoring member
active compounds compared to random selec- of a high-ranking family is allowed to r e m a i ~
tion. A key parameter to assess the perfor- in the final hit list, whereas the scores of re-
mance of docking and scoring in virtual lated molecules are recorded as annotations to
screening is therefore, at least in theoretical this representative family member. This in-
case studies, the so-called enrichment factor. creases the diversity of the hit list and helps to
It is simply the ratio of active compounds in identify a higher number of different classes of
the subset selected by docking divided by the potential ligands.
number of active compounds found in a ran- An alternative to sequential docking can be
domly selected subset of equal size. To record followed if combinatorial libraries are evalu-
such enrichment factors also for controlling ated. Quite a few programs have been specifi-
performance at the various filter steps, a set of cally designed for speed-up by so-called com-
known active compounds is mixed with the set binatorial docking. They profit from the
of candidate molecules. This strategy, how- structured, incremental nature of combinato-
ever, requires a set of reasonable size (e.g., rial libraries and the fact that molecules of a
30-50 ligands), which is not always given in a combinatorial library consist of a common
real-life virtual screening study. Further- core. This core is assumed to form common
more, enrichment factors are far from being specific interactions with the receptor (possi-
ideal indicators, particularly at later filter bly supported by experimental evidence) and
steps where a (hopefully) increasing amount can thus be prepositioned in the binding
of active compounds detected among the en- pocket in one or a few similar orientations. It
tries of the database competes with the set of then serves as skeleton for the addition of sub-
known active ligands and artificially lowers stituents. Obviously, this step is ideally suited
the enrichment factor. for incremental construction algorithms (361)
and significantly reduces the complexity of the process and can be placed with high confidence
docking problem, limiting the required com- in a well-characterized specificity pocket, such
putation time per ligand. Earlier examples of as the S1 pocket in thrombin. A further issue
this combinatorial docking approach are to consider is mutual fragment dependencies,
PRO SELECT (362) and CombiDOCK (324). that is, when multiple fragments are hooked
The latter is based on the DOCK program and up to a scaffold in a sequential manner; the
has recently been enhanced by a vector-based results can depend on the sequence by which
orientation filter, to ensure productive scaf- they are added (see, for example, Ref. 363).
fold poses, and by a free-energy-based scoring Thus, in unfavorable cases, different orders of
procedure (279). Another recent combinato- attachment have to be followed to circumvent
rial docking procedure has been implemented this possible limitation.
as FlexXcextension in FlexX (363).It follows a
5.2 Seeding Experiments to Assess Docking
recursive scheme to traverse the combinato-
and Scoring in Virtual Screening
rial library space efficiently. The algorithm is
based on a tree data structure that allows the True enrichment factors can be calculated
efficient reuse of previously calculated dock- only if experimental data are available for the
ing results. FlexXc follows the library search full library, although such situations are un-
tree in a depth-first manner, whereas Combi- usual. Accordingly, studies using enrichment
DOCK uses a breadth-first approach to evalu- factors as a figure-of-merit to assess the per-
ate fragments attached to a scaffold. A general formance of a virtual screening can serve for
advantage of breadth-first searches is that theoretical validation purposes only. Several
they allow for an efficient pruning of the authors have tested the predictive ability of
search tree based on the scoring values. docking and scoring tools by compiling an ar-
De novo design tools have also been adapted bitrary set of diverse, drug-like compounds
to the problem of combinatorial docking and complemented by a number of known active
combinatorial library design. The program compounds. This "seeded" library is then sub-
LUDI, for example, has been enhanced by the jected to the virtual screening, and for the pur-
ability to connect building blocks in a chemi- pose of assessment it is assumed that the
cally and structurally adequate manner; it can added active compounds are the only true ac-
thus be used for combinatorial docking by fit- tives in the library. Clearly, this is a rather
ting building blocks onto the interaction sites questionable assumption.
and simultaneous linking to previously docked Several seeding experiments have been
core fragments (300). It has been successfully published. An example has been performed at
applied in the design of new thrombin inhibi- Merck using FLOG (121). A library consisting
tors accessible through a single reaction. An- of 10,000 compounds including inhibitors of
other example is a variant of the Builder pro- various types of proteases and HIV protease
gram (364) that was used to select substituents was docked into the active site of HIV pro-
for a library of cathepsin D inhibitors (12). Yet tease. This resulted in excellent enrichment of
another approach is DREAM+ +, a suite of pro- the HIV motease inhibitors: all inhibitors but
grams for the design of virtual combinatorial li- one were among the top 500 library members.
braries (365).Here, the DOCK algorithm is used However, inhibitors of other proteases were
for the molecular placement. Variable frag- also considerably enriched (366).
ments are joined consecutively in compliance Seeding experiments also allow for compari-
with predefined types of well-characterized or- sons of different docking and scoring proce-
ganic reactions. Speed-up is achieved by pre- dures, as shown, for example, by Charifson et
serving ("inheriting") information about com- al. (86), Bissantz et al. (230), and Stahl and
mon partial structures across different Rarey (287). Charifson et al. compiled sets of
reactions, such that only the conformations of several hundred active molecules for three dif-
newly added fragments are searched. ferent targets, p38 MAP kinase, inosine mono-
Generally speaking, combinatorial docking phosphate dehydrogenase, and HIV protease.
approaches work best in cases where a core These were docked into the corresponding ac-
fragment plays a dominant role in the binding tive sites together with 10,000 randomly se-
319
lected, but drug-like, commercially available tarity. This is clearly reflected in results of
compounds using DOCK (327) and the Vertex database-ranking experiments. To combine
in-house tool Gambler. ChemScore (80, 1881, the virtues of both scoring functions and to
the DOCK AMBER force field score, and PLP construct a more robust general function, a
(185) performed consistently well in enriching combination of PLP and F l e d called Screen-
active compounds. This result was partially Score has recently been published (287). It
attributed to the fact that a rigid-body optimi- was derived by a systematic optimization of
zation could be carried out with these func- library ranking results over seven targets and
tions because they include repulsive terms in covers a wide range of active sites with respect
contrast to many other tested functions. Stahl to form, size, and polarity. Screenscore ob-
and Rarey compared DrugScore (226), PMF tains good enrichments for COX-2 (highly li-
(317), PLP (185),and the original F l e d score pophilic binding site) and neurarninidase
using FlexX for docking (110,130,138). Inter- (highly polar site), whereas the individual
estingly, the two knowledge-based scoring functions fail in one of the two cases. The au-
functions performed differently. DrugScore thors of PLP have recently enhanced their
achieved better ranking for the tight-binding scoring function by including directed hydro-
ligands in narrow lipophilic cavities of COX-2 gen bonding terms (367). Similar to Screen-
and the thrombin S1 pocket. In contrast, PMF Score, this could also lead to a more robust
obtained better enrichment for the case of the scoring function.
very polar binding site of neuraminidase. Ob- 5.4 Finding Weak Inhibitors
viously, a general strength of PMF is the de-
scription of complexes showing multiple hy- Seeding experiments are often carried out
drogen bonds. This has also been noted in the with a small number of active compounds that
study by Bissantz et al., in which PMF was are already optimized for binding to the stud-
found to perform well for the polar target thy- ied target. Enrichment factors based on the
midine kinase and less well for the estrogen retrieval of these compounds are not very con-
clusive because the recovery of potent inhibi-
tors from a large set of candidate molecules is
5.3 Hydrogen Bonding versus Hydrophobic
significantly easier than the discovery of new,
but usually rather weak inhibitors from a
A balanced description of the contribution of large majority of nonbinders. In general, as i s
hydrogen bonding and hydrophobic interac- HTS, one can only expect hits from virtual
tions to the total score is of general impor- screening that bind in the low micromolar
tance, to avoid a bias toward either highly po- range.
lar or completely hydrophobic molecules. The Nevertheless, a recent study showed that
actual parameterization of a scoring function library screening can also successfully detect
depends on the compilation of the data set very weak ligands. Approximately 4000 com-
used to develop the function. Empirical scor- mercially available compounds had been
ing functions are more likely affected by the screened for FKBP-binding by means of the
data set composition used for parameteriza- SAR-by-NMR technique (368) and 31 com-
tion, but can be quickly reparameterized. In pounds with activity in the low millimolar
the case of knowledge-based functions such a range were detected. This set of compounds
readjustment is more difficult to perform; was flexibly docked into the FKBP binding site
e of the much larger data- using DOCK 4.0 with the PMF scoring func-
heir development, they are tion (369). Interestingly, significant enrich-
posed to be less dependent on special data ment factors of 2 to 3 were achieved, whereas
scoring with the standard AMBER score of
The PLP function, for example, addresses DOCK did not really provide an enrichment.
al steric complementarity and hydro-
5.5 Consensus Scoring
ic interactions based on rather long-
ge pair potentials, whereas the FlexX score Different scoring schemes focus on different
hydrogen-bond complemen- aspects as most important contributions to
binding. However, these differences do not RNA targets, providing a selection of approxi-
necessarily become obvious when calculating mately 5000 compounds. This was followed by
binding affinities of known active compounds. two additional steps involving longer sam-
In contrast, the scoring of non-active com- pling of conformational space to retrieve 350
pounds could unravel such differences. Vertex most promising candidates. Of these, a very
has reported good experience with so-called small fraction was tested experimentally and
consensus scoring. Here, docking results are two compounds were found to significantly re-
scored by several distinct functions and only duce the binding of the Tat protein to HIV-1
those hits are considered that are rendered
prominent by several of the functions. A sig-
TAR (CD,, a).
1
Recently, Grueneberg et al. discovered sub-
nificant decrease in false positives has been nanomolar inhibitors of carbonic anhydrase I1
described (86), but inevitably a number of true
by virtual screening (15). The study was per-
positives is lost (see, for example, Ref. 230).
formed following a protocol of several consec-
When consensus scoring is applied, one
should thus keep in mind that, although the utive steps of hierarchical filtering (Fig. 7.3).
number of false positives can be reduced, the Carbonic anhydrase I1 is a metalloenzyme
danger exists to discard some active com- used as prominent target for the treatment of
pounds highlighted by only one of the scoring glaucoma. Its binding site is a rather rigid,
functions. This would, for example, apply to funnel-shaped pocket. Known inhibitors such
the above-mentioned PLP and FlexX scoring as dorzolamide bind to the catalytic zinc ion by
functions, which emphasize different aspects a sulfonamide group. In a recent crystallo-
of ligand binding. Here, consensus scoring graphic study it could be demonstrated that
could be counterproductive. Therefore, along only the sulfonamide group represents an
with consensus scoring, the individual scoring ideal anchor for zinc coordination (377). An
results should be consulted. Generally, how- initial data set of 90,000 entries from the May-
ever, it appears that one can expect more ro- bridge (378) and LeadQuest (379) libraries
bust results from consensus scoring. was converted to 3D structures with Corina
(380). In a first filtering step, compounds were
5.6 Successful Identification of Novel Leads
requested to possess a known zinc-binding
through Virtual Screening
group. These compounds were then processed
A considerable number of publications have through UNITY (355) using a protein-derived
proved that virtual screening can be efficiently pharmacophore query. The pharmacophore
used to discover novel leads (11,13,142,370- hypothesis had been constructed from a "hot
375). Some of the most recent examples are spot" analysis of the available X-ray struc-
briefly presented in the following. tures of the enzyme. This yielded a set of 3314
The program ICM has been used to identify compounds. In a subsequent filtering step, the
novel antagonists for a nuclear hormone re- known inhibitor dorzolamide was used as a
ceptor (201) and, together with DOCK, to find template onto which all potential candidates
inhibitors for the RNA transactivation re- were flexibly superimposed by means of the
sponse element (TAR) of HIV-1 (25). The vir- program FlexS (330). The top-ranking com-
tual screening protocol started with 153,000 pounds from this step were docked into the
compounds from the Available Chemicals Di- binding site with FlexX (110,130,138),taking
rectory (ACD) (376) and involved increasingly into account four conserved water molecules
elaborate docking and scoring schemes as the in the active site. After visual inspection, 13
screening proceeded toward smaller selections top-ranking hits were selected for experimen-
of compounds. In the HIV-1 TAR study, the tal testing. Nine of these compounds showed
ACD library was first rigidly docked into the activities below 1 a , and three had Ki values
binding site using the DOCK program along below 1 nM.Two of the hits were also exam-
with a simple contact scoring scheme. Then, ined crystallographically. The docking solu-
20% of the best-scoring compounds were sub- tion predicted as best by Drugscore was found
jected to flexible docking with ICM and an em- to be closer to the experimental structure than
pirical scoring function specifically tailored to the one predicted by the FlexX score.
6 Outlook
This strategy of hierarchical filtering start- Then, close analogs of the first series of hits
ing with a mapping of candidate molecules were assayed, resulting in a total screen of
onto a binding site-derived pharmacophore, 3000 compounds. This provided 150 hits, clus-
followed by a similarity analysis with known tered into 14 chemical classes. Seven of these
ligands using either FlexS (3301, SEAL (254- classes could be demonstrated as novel DNA
256), or FeatureTrees (357, 358); and con- gyrase inhibitors competing for the ATP bind-
cluded by flexible docking with FlexX, which ing site. Subsequent structure-based optimi-
meanwhile was applied to three other proteins zation resulted in inhibitors with potencies
in the same laboratory. For t-RNA guanine equal to or up to 10 times better than those of
transglycosylase, thermolysin, and aldose re- known antibiotics.
dudase, novel micromolar to submicromolar
lead structures could be discovered. Most chal-
lenging in this context is aldose reductase be- 6 OUTLOOK
cause it performs pronounced induced fit
changes upon ligand binding. Crystal struc- The first docking programs were introduced
ture analysis of a micromolar hit retrieved by about 20 years ago, and the publication of the
virtual screening clearly revealed known and first generally applicable scoring functions
new areas of induced fit adaptation. The crys- dates back about 10 years. Since then, much
tal structure obtained with this hit provides a experience has been gained in developing and
good starting point for further lead optimiza- applying docking algorithms, using scoring
tion. functions, and assessing their accuracy. Sig-
The de novo design of inhibitors of the bac- nificant progress has been made over the last
terial enzyme DNA gyrase, a well-established few years and it appears as if there are now
antibacterial target (381), is another example docking tools available to address a variety of
for successful structure-based virtual screen- goals with considerable accuracy, from the
ing, reported by Roche (16). HTS performed precise and detailed analysis of binding inter-
on the proprietary compound library provided actions for a small set of ligands up to a fast
no suitable lead structures. Therefore, a new screening of large compound collections. Sim-
rational approach was developed to discover ilarly, scoring functions are currently avail-
potential lead structures using structural in- able that can be applied to a wide range of
formation of the ATP binding site in subunit B different proteins and consistently yield a c o h
of the enzyme. At the onset of the project, the siderable retrieval of active compounds. As a
crystal structures of DNA gyrase subunit B consequence, the pharmaceutical industry in-
complexed with a substrate analog and two creasingly uses virtual screening to identify
inhibitors were available. In the buried part of possible leads.
the pocket they all donate a hydrogen bond to In fact, structure-based design is now es-
an aspartic acid side-chain and accept one tablished as an important approach to drug
from a conserved water molecule. As a design discovery complementing HTS (3821, al-
concept, the formation of these two key hydro- though HTS has a number of serious disad-
gen bonds has been defined as mandatory. As vantages. It is expensive (383) and it leads to
an additional requirement, a lipophilic portion many false positives and a disappointingly
forming hydrophobic interactions with the en- small number of real leads (384, 3851, partic-
zyme was demanded. A new assay was estab- ularly if screening is performed on a member
lished to allow for the detection of weakly of a new protein class. Also, not all assays are
binding inhibitors. A computational search of easily amenable to HTS requirements. Fi-
the ACD (3761 and the Roche Compound In- nally, despite the library sizes of several mil-
ventory identified hits having low molecular lion entries available to the -pharmaceutical in-
weights and matching the above-mentioned dustry, these compound collections do not
criteria. Relying on the results of the in silico approach the size and diversity needed to even
screening Based on docking with LUDI and approximately cover the chemical space of
a pharmacophore search with CATALYST drug-like organic molecules. Accordingly, fo-
(356)l 600 compounds were tested initially. cused design of novel compounds and com-
pound libraries should only gain importance. 5. All scoring functions are essentially ex-
In light of current trends in structural geno- pressed as simple analytical functions fit-
mics and patenting strategies, one may specu- ted to experimental binding data. The pres-
late that structure-based de novo design will ently available crystal data on complex
become much more important in the near fu- structures are strongly biased toward pep-
ture. tidic ligands. Because these data are used
To meet the increasing demands being for the development of scoring functions,
placed on virtual screening, the development many overestimate the role of polar inter-
of more reliable scoring functions is certainly actions. The development of improved
vital for success. In addition, novel or im- scoring functions clearly requires access to
proved docking algorithms are required. We better data, especially for nonpeptidic, low
molecular weight, drug-like ligands, in-
conclude by summarizing our perspective on
cluding weakly binding compounds.
major challenges in the further development
of docking procedures and scoring functions: 6. Unfavorable interactions and unlikely
docking modes are not penalized strongly
1. The fact that protein-ligand interactions enough. Methods for taking such undesired
features into account are still lacking in
occur in aqueous solution is generally ap-
presently available scoring functions.
preciated, but not yet adequately ac-
counted for in molecular docking proce- 7. So far, fast scoring functions cover only
dures. In particular, the simultaneous part of the whole receptor-ligand binding
placement of explicit water molecules upon process. A more detailed picture could be
obtained by taking into account properties
docking, accurate estimates of the water
of the unbound ligand, that is, solvation
versus ligand interaction-energy balance,
effects and energetic differences between
and the fast prediction of protonation the low-energy solution conformations and
states in binding pockets await a more sat-
~. the bound conformation.
isfactory solution.
The consideration of a sufficient degree of 7 ACKNOWLEDGMENTS
protein flexibility needs to become part of
standard docking approaches. This will re- The authors have benefited from numerous'
quire faster algorithms. In addition, with discussions with many researchers active in
respect to scoring, an often overlooked as- the field of docking and scoring, especially
pect of this problem is that as soon as re- Holger Gohlke (University of Marburg/
ceptor flexibility is allowed, protein confor- Scripps Research Institute), Ingo Muegge
mational energy changes need to be (Bayer),and Matthias Rarey (GMD St. Augus-
accounted for appropriately. tin).
Although flexible-ligand docking has al-
ready become standard practice, the error REFERENCES
rate in predictions of interaction geome- 1. H. J. Boehm and G. Klebe, Angew. Chem. Int.
tries is still significant for more flexible li- Ed. Engl., 35, 2588 (1996).
gands. Again, more efficient algorithms 2. R. E. Babine and S. L. Bender, Chem. Rev., 97,
will be required to sample the conforma- 1359 (1997).
tion space more thoroughly. 3. J. Greer, J. W. Erickson, J. J. Baldwin, and
Polar interactions are still not treated ade- M. D. Varney, J. Med. Chem., 37,1035 (1994).
quately. It is striking that, even though the 4. S. W. Kaldor, V. J. Kalish, J. F. Davies, 2nd,
B. V. Shetty, J. E. Fritz, K. Appelt, J. A. Bur-
role of hydrogen bonds in biology has been
gess, K. M. Campanale, N. Y. Chirgadze, D. K.
appreciated for a long time and the nature Clawson, B. A. Dressman, S. D. Hatch, D. A.
of hydrogen bonds is qualitatively well un- Khalil, M. B. Kosa, P. P. Lubbehusen, M. A.
derstood, their quantitative energetic de- Muesing, A. K. Patick, S. H. Reich, K. S. Su,
scription in protein-ligand interactions is and J. H. Tatlock, J. Med. Chem., 40, 3979
still unsatisfactory (65). (1997).
rences
W . Lew and U . Choung, Curr. Med. Chem., 7, A. V . Filikov, V . Mohan, T . A. Vickers, R. H.

663 (2000). Griffey,P. D. Cook, R. A. Abagyan, and T . L.
M. von Itzstein, W.-Y. W u , G. B. Kok, M. S. James, J. ComputAided Mol. Des., 14, 593
Pegg, J. C. Dyason, B. Jin, T . V . Phan, M. L. (2000).
Smythe, H. F. White, S. W . Oliver, P. M. Col- D. E. Clark, C. W . Murray, and J. Li i n K. B.
mant, J. N.Varghese, D. M. Ryan, J. M.Woods, Lipkowitz and D. B. Boyd, Eds., Reviews i n
R. C. Bethell, V . J. Hotham, J. M. Cameron, Computational Chemistry, Vol. 11, Wiley-
and C. R. Penn, Nature, 363,418 (1993). VCH, New York, 1997, p. 67.
R. S. Bohacek, C. McMartin, and W . C. Guida, M. A. Murcko in K. B. Lipkowitz and D. B.
Med. Res. Rev., 1 6 , 3 (1996). Boyd, Eds., Reviews in Computational Chem-
R. E. Hubbard, Cum. Opin. Biotechnol., 8,696 istry, Vol. 11, Wiley-VCH, New York, 1997, p.
(1997). 1.
M. A. Murcko, P. R. Caron, and P. S. Charifson, G. Schneider and H. J. Boehm, Drug Discov.
Ann. Rep. Med. Chem., 34,297 (1999). Today, 7 , 6 4 (2002).
G. Klebe, J. Mol. Med., 78,269 (2000). W . P. Walters, M. T . Stahl, and M. A. Murcko,
Drug Discov. Today, 3,160 (1998).
D. A. Gschwend,W . Sirawaraporn, D.V . Santi,
and I. D. Kuntz, Proteins, 29,59 (1997). T . Langer and R. D. Hoffmann,Cum. Pharm.
Des., 7,509 (2001).
E. K. Kick, D. C. Roe, A. G. Skillman, G. Liu,
T . J. Ewing, Y . Sun, I. D. Kuntz, and J. A. W . F. van Gunsteren and H. J. C. Berendsen,
Ellman, Chem. Biol., 4,297 (1997). Angew. Chem. Znt. Ed. Engl., 29,992 (1990).
P. Burkhard, U. Hommel, M. Sanner, and B. Honig and A. Nicholls, Science, 268, 1144
M . D. Walkinshaw, J. Mol. Biol., 287, 853 (1995).
(1999). G. A. Jeffreyand W . Saenger, Hydrogen Bond-
J. H. Toney, P. M. Fitzgerald, N. Grover- ing i n Biological Structures, Springer-Verlag,
Sharma, S. H. Olson,W . J. May, J. G. Sundelof, Berlin, 1991.
D. E. Vanderwall, K. A. C l e w , S. K. Grant, G. A. Jeffrey, An Introduction to Hydrogen
J. K. W u , J. W . Kozarich, D. L. Pompliano, and Bonding, Oxford University Press, New York,
G. G. Hammond, Chem. Biol., 5, 185 (1998). 1997.
S. Grueneberg, B. Wendt, and G. Klebe, An- A. M. Davis and S. J. Teague, Angew. Chem.
gew. Chem. Int. Ed. Engl., 40,389 (2001). Int. Ed. Engl., 38, 736 (1999).
H. J. Boehm, M. Boehringer, D. Bur, H. S. K. Burley and G. A. Petsko, Adv. Protein
Gmuender, W . Huber, W . Klaus, D. Kostrewa, Chem., 39,125 (1988).
H. Kuehne, T . Luebbers, N. Meunier-Keller, S. Tsuzuki, K. Honda, T . Uchimaru, M. Mi-
and F. Mueller, J. Med. Chem., 43, 2664 kami, and K. Tanabe, J. Am. Chem. Soc., 122,
(2000). 11450 (2000).
S. K. Burley, Nat. Struct. Biol.7 (Suppl.),932 M. Brandl, M. S. Weiss, A. Jabs, J. Suhnel, and
(2000). R. Hilgenfeld, J. Mol. Biol., 307,357 (2001).
R. Abagyan and M. Totrov, Curr. Opin. Chem. J. P. Gallivan and D. A. Dougherty, Proc. Natl.
Biol., 5, 375 (2001). Acad. Sci. USA, 96,9459 (1999).
I. Muegge and M. Rarey i n K. B. Lipkowitz and I. L. Alberts, K. Nadassy, and S. J.Wodak, Pro-
D. B. Boyd, Eds., Reviews i n Computational tein Sci., 7, 1700 (1998).
Chemistry, Vol. 17, Wiley-VCH, New York, N. Borkakoti, Curr. Opin. Drug Disc. Dev., 2,
2001, p. 1. 449 (1999).
, G. R. Smith and M. J. Sternberg, Cum. Opin. A. Ben-Naim, Hydrophobic Interactions, Ple-
Struct. Biol., 12,28 (2002). num, New York, 1980.
C. J. Camacho and S. Vajda, Curr. Opin. C. Tanford, The Hydrophobic Effect, John
Struct. Biol., 12, 36 (2002). Wiley & Sons, New York, 1980.
, A. H. Elcock, D. Sept, and J. A. McCammon, J. Ajay and M. A. Murcko, J. Med. Chem., 38,
Phys. Chem. B , 105, 1504 (2001). 4953 (1995).
, B. K. Shoichet and I. D. Kuntz, Chem. Biol., 3, M. K. Gilson, J. A. Given, B. L. Bush, and J. A.
151 (1996). McCammon, Biophys. J., 72, 1047 (1997).
, M. J. Sternberg, H. A. Gabb, and R. M. Jack- J. E. Ladbury and B. Z. Chowdhry, Chem.
son, Curr. Opin. Struct. Biol., 8,250 (1998). Biol., 3, 791 (1996).
47. T. Wiseman, S. Williston, J. F. Brandts, and 67. A. C. Tissot, S. Vuilleumier, and A. R. Fersht,
L. N. Lin,Anal. Biochem., 179, 131 (1989). Biochemistry, 35, 6786 (1996).
48. S. H. Sleigh, P. R. Seavers, A. J. Wilkinson, 68. J. D. Dunitz, Science, 264,670 (1994).
J. E. Ladbury, and J. R. Tame, J. Mol. Biol., 69. C. Chothia, Nature, 254,304 (1975).
291,393 (1999). 70. F. M. Richards, Annu. Rev. Biophys. Bioeng.,
49. M. H. Parker, D. F. Ortwine, P. M. O'Brien, 6, 151 (1977).
E. A. Lunney, C. A. Banotai, W. T. Mueller, P. 71. K. A. Sharp, A. Nicholls, R. Friedman, and B.
McConnell, and C. G. Brouillette, Bioorg. Med. Honig, Biochemistry, 30,9686 (1991).
Chem. Lett., 10,2427 (2000).
72. M. S. Searle and D. H. Williams, J. Am. Chem.
50. D. H. Williams, D. P. O'Brien, and B. Bardsley, Soc., 114, 10690 (1992).
J. Am. Chem. Soc., 123, 737 (2001).
73. M. S. Searle, D. H. Williams, and U. Gerhard,
51. P. Gilli, V. Ferretti, G. Gilli, and P. A. Brea, J.
J. Am. Chem. Soc., 114,10697 (1992).
Phys. Chem., 98, 1515 (1994).
74. M. A. Hossain and H. J. Schneider, Chem. Eur.
52. J. D. Dunitz, Chem. Biol., 2, 709 (1995).
J.,5, 1284 (1999).
53. K. Sharp, Protein Sci., 10,661 (2001). 75. J. Hermans and L. Wang, J. Am. Chem. Soc.,
54. P. C. Weber, J . J. Wendoloski, M. W. Panto- 119,2702 (1997).
liano, and F. R. Salemme, J. Am. Chem. Soc., 76. K. P. Murphy, D. Xie, K. S. Thompson, L. M.
114, 3197 (1992).
Amzel, and E. Freire, Proteins, 18,63 (1994).
55. F. Dullweber, M. T. Stubbs, D. Musil, J. Stuer- 77. K. A. Dill, J. Biol. Chem., 272, 701 (1997).
zebecher, and G. Klebe, J. Mol. Biol., 313,593
(2001). 78. G. Folkers, Ed., Pharm. Acta Helv., 69, 175
(1995).
56. J. R. Tame, J. Cornput.-Aided Mol. Des., 13,99
(1999). 79. D. E. J. Koshland, Angew. Chem. Znt. Ed.
Engl., 33,2408 (1994).
57. A. R. Khan, J. C. Parrish, M. E. Fraser, W. W.
Smith, P. A. Bartlett, and M. N. James, Bio- 80. M. D. Eldridge, C. W. Murray, T. R. Auton,
chemistry, 37,16839 (1998). G. V. Paolini, and R. P. Mee, J. Cornput.-Aided
Mol. Des., 11,425 (1997).
58. H. Mack, T. Pfeiffer, W. Hornberger, H. J.
Boehm, and H. W. Hoeffken, J. Enzyme Znhib., 81. I. K. McDonald and J. M. Thornton, J. Mol.
9, 73 (1995). Biol., 238, 777 (1994).
59. Y. W. Chen, A. R. Fersht, and K. Henrick, J. 82. M. Totrov and R. Abagyan in R. B. Raffa, Ed.,
Mol. Biol., 234, 1158 (1993). Drug-Receptor Thermodynamics: Introduction
and Applications, John Wiley & Sons, Chiches-
60. P. R. Connelly, R. A. Aldape, F. J. Bruzzese,
ter, 2001, p. 603.
S. P. Chambers, M. J. Fitzgibbon, M. A. Flem-
ing, S. Itoh, D. J. Livingston, M. A. Navia, J. A. 83. I. D. Kuntz, E. C. Meng, and B. K. Shiochet,
Thomson, and K. P. Wilson, Proc. Natl. Acad. Acc. Chem. Res., 27, 117 (1994).
Sci. USA, 91, 1964 (1994). 84. A. R. Leach and M. M. Hann, Drug Discov.
61. A. R. Fersht, J. P. Shi, J. Knill-Jones, D. M. Today, 5, 326 (2000).
Lowe, A. J . Wilkinson, D. M. Blow, P. Brick, P. 85. R. A. Lewis, S. D. Pickett, and D. E. Clark in
Carter, M. M. Waye, and G. Winter, Nature, K. B. Lipkowitz and D. B. Boyd, Eds., Reviews
314,235 (1985). in Computational Chemistry, Vol. 16, Wiley-
62. U. Obst, D. W. Banner, L. Weber, and F. Die- VCH, New York, 2000, p. 1.
derich, Chem. Biol., 4, 287 (1997). 86. P. S. Charifson, J. J. Corkery, M. A. Murcko,
63. B. P. Morgan, J. M. Scholtz, M. D. Ballinger, and W. P. Walters, J. Med. Chem., 42, 5100
I. D. Zipkin, and P. A. Bartlett, J. Am. Chem. (1999).
Soc., 113, 297 (1991). 87. G. E. Terp, B. N. Johansen, I. T. Christensen,
64. B. A. Shirley, P. Stanssens, U. Hahn, and C. N. and F. S. Jorgensen, J. Med. Chem., 44, 2333
Pace, Biochemistry, 31, 725 (1992). (2001).
65. H. Kubinyi in B. Testa, H. van de Waterbeemd, 88. T. P. Lybrand, C u m Opin. Struct. Biol., 5,224
G. Folkers, and R. Guy, Eds., Pharmacokinetic (1995).
Optimization in Drug Research, Wiley-VCH, 89. G. Jones and P. Willett, Curr. Opin. Biotech-
Weinheim, Germany, 2001, p. 513. nol., 6,652 (1995).
66. H A . Schneider, T. Schiestel, and P. Zimmer- 90. T. Lengauer and M. Rarey, Curr. Opin. Struct.
mann, J. Am. Chem. Soc., 114,7698 (1992). Biol., 6,402 (1996).
R. Rosenfeld, S. Vajda, and C. DeLisi, Annu. 114. G. M. Morris, D. S. Goodsell, R. Huey, and A. J.
Rev. Biophys. Biomol. Struct., 24, 677 (1995). Olson, J. Cornput.-Aided Mol. Des., 10, 293
P. Bamborough and F. E. Cohen, Curr. Opin. (1996).
Struct. Biol., 6, 236 (1996). 115. G. M. Morris, D. S. Goodsell, R. S. Halliday, R.
J. S. Dixon and J. M. Blaney in Y. C. Martin Huey, W. E. Hart, R. K. Belew, and A. J. Olson,
and P. Willett, Eds., Designing Bioactive Mol- J. Comput. Chem., 19,1639 (1998).
ecules: Three-Dimensional Techniques and 116. R. Abagyan, M. Trotov, and D. Kuznetsov,
Applications, American Chemical Society, J. Comput. Chem., 15,488 (1994).
Washington, DC, 1997, p. 175. 117. M. Totrov and R. Abagyan, Proteins (Suppl.),
P. J. Gane and P. M. Dean, Curr. Opin. Struct. 215 (1997).
Biol., 10,401 (2000).
118. J. Y. Trosset and H. A. Scheraga, J. Comput.
C. A. Sotriffer, W. Flader, R. H. Winger, B. M. Chem., 20,412 (1999).
Rode, K. R. Lied, and J. M. Varga, Methods,
20,280 (2000). 119. J . Y. Trosset and H. A. Scheraga, Proc. Natl.
Acad. Sci. USA, 95,8011 (1998).
R. B. Russell and D. S. Eggleston, Nut. Struct.
Biol.7 (Suppl.), 928 (2000). 120. F. S. Kuhl, G. M. Crippen, and W. G. Richards,
J. Comput. Chem., 5 , 2 4 (1984).
M. Hendlich, F. Rippmann, and G. Barnickel,
J. Mol. Graph. Model., 15, 359 (1997). 121. M. D. Miller, S. K. Kearsley, D. J. Underwood,
and R. P. Sheridan, J. Cornput.-Aided Mol.
G. P. Brady, Jr. and P. F. Stouten, J. Cornput.-
Des., 8, 153 (1994).
Aided Mol. Des., 14,383 (2000).
C. A. Orengo, A. E. Todd, and J. M. Thornton, 122. D. M. Lorber and B. K. Shoichet, Protein Sci.,
Curr. Opin. Struct. Biol., 9, 374 (1999). 7,938 (1998).
J. M. Thornton, A. E. Todd, D. Milburn, N. 123. M. McGann (2001). FRED [Online]. OpenEye.
Borkakoti, and C. A. Orengo, Nut. Struct. http://www.eyesopen.com/fred.html [2001,
Biol.7 (Suppl.), 991 (2000). Sept 251.
S. Schmitt, M. Hendlich, and G. Klebe, Angew. 124. Y. P. Pang and A. P. Kozikowski, J Cornput.-
Chem. Znt. Ed. Engl., 40,3141 (2001). Aided Mol. Des., 8,669 (1994).
J. Ruppert, W. Welch, and A. N. Jain, Protein 125. Y. P. Pang, E. Perola, K. Xu, and F. G. Pren-
Sci., 6,524 (1997). dergast, J. Comput. Chem., 22, 1750 (2001).
F. Jiang and S. H. Kim, J. Mol. Biol., 219, 79 126. E. C. Meng, D. A. Gschwend, J. M. Blaney, and
(1991). I. D. Kuntz, Proteins, 17,266 (1993).
D. Fischer, S. L. Lin, H. L. Wolfson, and R. 127. D. A. Gschwend and I. D. Kuntz, J. Cornput.-
Nussinov, J. Mol. Biol., 248,459 (1995). Aided Mol. Des., 10, 123 (1996).
P. Burkhard, P. Taylor, and M. D. Walkin- 128. R. L. DesJarlais, R. P. Sheridan, J. S. Dixon,
shaw, J. Mol. Biol., 277,449 (1998). I. D. Kuntz, and R. Venkataraghavan, J. Med.
I. D. Kuntz, J. M. Blaney, S. J. Oatley, R. Lang- Chem., 29,2149 (1986).
ridge, and T. E. Ferrin, J. Mol. Biol., 161, 269 129. J. Wang, P. A. Kollman, and I. D. Kuntz, Pro-
(1982). teins, 36, 1 (1999).
C. M. Oshiro and I. D. Kuntz, Proteins, 30,321 130. M. Rarey, B. Kramer, and T. Lengauer,
(1998). J. Cornput.-Aided Mol. Des., 11,369 (1997).
H. J. Boehm, J. Cornput.-Aided Mol. Des., 6, 131. T. J. Ewing, S. Makino, A. G. Skillman, and
593 (1992). I. D. Kuntz, J. Cornput.-Aided Mol. Des., 15,
H. J. Boehm, J.Cornput.-Aided Mol. Des., 6,61 411 (2001).
(1992). 132. R. L. DesJarlais, R. P. Sheridan, G. L. Seibel,
M. Rarey, B. Kramer, T. Lengauer, and G. J. S. Dixon, I. D. Kuntz, and R. Venkataragha-
Klebe, J. Mol. Biol., 261, 470 (1996). van, J. Med. Chem., 31,722 (1988).
V. Schnecke and L. A. Kuhn, Perspect. Drug 133. D. J. Bacon and J. Moult, J. Mol. Biol., 225,
Discov. Des., 20, 171 (2000). 849 (1992).
D. J. Diller and K. M. Merz, Jr., Proteins, 43, 134. A. Wallqvist and D. G. Covell, Proteins, 25,403
113 (2001). (1996).
D. S. Goodsell and A. J. Olson, Proteins, 8,195 135. M. Y. Mizutani, N. Tomioka, and A. Itai, J.
(1990). Mol. Biol., 243, 310 (1994).
136. B. B. Goldman and W. T. Wipke, Proteins, 38, 160. T. P. Straatsma and J. A. McCammon, Annu.
79 (2000). Rev. Phys. Chem., 43,407 (1992).
137. S. Makino and I. D. Kuntz, J. Comput. Chem., 161. P. A. Kollman, Chem. Rev., 93,2395 (1993).
18, 1812 (1997). 162. T. P. Straatsma in K. B. Lipkowitz and D. B.
138. M. Rarey, S. Wefing, and T. Lengauer, J. Com- Boyd, Eds., Reviews in Computational Chem-
put.-Aided Mol. Des., 10,41(1996). istry, Vol. 9, VCH, New York, 1996, p. 81.
139. B. Kramer, M. Rarey, and T. Lengauer, Pro- 163. M. R. Reddy, M. D. Erion, and A. Agarwal in
teins, 37, 228 (1999). K. B. Lipkowitz and D. B. Boyd, Eds., Reviews
140. G. Klebe and T. Mietzner, J. Cornput.-Aided in Computational Chemistry, Vol. 16, Wiley-
Mol. Des., 8,583 (1994). VCH, New York, 2000, p. 217.
141. G. Klebe, J. Mol. Biol., 237,212 (1994). 164. A. Di Nola, D. Roccatano, and H. J. Berendsen,
142. R. L. DesJarlais and J. S. Dixon, J. Cornput.- Proteins, 19, 174 (1994).
Aided Mol. Des., 8,231 (1994). 165. M. Mangoni, D. Roccatano, and A. Di Nola,
143. B. K. Shoichet and I. D. Kuntz, Protein Eng., 6, Proteins, 35, 153 (1999).
723 (1993). 166. M. Vieth, J. D. Hirst, B. N. Dominy, H. Daigler,
144. N. Metropolis, A. W. Rosenbluth, M. N. Rosen- and C. L. Brooks, J. Comput. Chem., 19, 1623
bluth, A. H. Teller, and E. Teller, J. Chem. (1998).
Phys., 21, 1087 (1953). 167. Y. Pak and S. Wang, J.Phys. Chem. B, 104,354
145. T. N. Hart, S. R. Ness, and R. J. Read, Proteins (2000).
(Suppl.), 205 (1997). 168. Y. Pak, I. J. Enyedy, J. Varady, J. W. Kung,
146. T. N. Hart and R. J. Read, Proteins, 13, 206 P. S. Lorenzo, P. M. Blumberg, and S. Wang,
(1992). J. Med. Chem., 44, 1690 (2001).
147. M. Liu and S. Wang, J. Cornput.-Aided Mol. 169. B. A. Luty, Z. R. Wasserman, P. W. F. Stouten,
Des., 13,435 (1999). C. N. Hodge, M. Zacharias, and J. A. McCam-
mon, J. Comput. Chem., 16,454 (1995).
148. Z. Li and H. A. Scheraga, Proc. Natl. Acad. Sci.
USA, 84,6611 (1987). 170. A. Miranker and M. Karplus, Proteins, 11, 29
(1991).
149. R. Abagyan and P. Argos, J. Mol. Biol., 225,
519 (1992). 171. A. Caflisch, A. Miranker, and M. Karplus,
J. Med. Chem., 36,2142 (1993).
150. A. Caflisch, S. Fischer, and M. Karplus,
J. Comput. Chem.,18, 723 (1997). 172. R. Judson in K. B. Lipkowitz and D. B. Boyd,
Eds., Reviews in Computational Chemist?,
151. I. Apostolakis, A. Plueckthun, and A. Caflisch, Vol. 10, VCH, New York, 1997.
J. Comput. Chem., 19,21(1998).
173. D. E. Goldberg, Genetic Algorithms in Search,
152. C. McMartin and R. S. Bohacek, J. Cornput.-
Optimization, and Machine Learning, Addi-
Aided Mol. Des., 11,333 (1997).
son-Wesley, Reading, MA, 1989.
153. M. Karplus and J. A. McCammon, Annu. Rev.
174. L. Davis, Handbook of Genetic Algorithms,
Biochem., 52, 263 (1983).
Van Nostrand Reinhold, New York, 1991.
154. W. F. van Gunsteren and A. E. Mark, Eur.
175. D. E. Clark and D. R. Westhead, J. Cornput.-
J. Biochem., 204,947 (1992).
Aided Mol. Des., 10, 337 (1996).
155. W. F. van Gunsteren, P. H. Huenenberger,
A. E. Mark, P. E. Smith, and I. G. Tironi, Com- 176. G. Jones, P. Willett, and R. C. Glen, J. Mol.
put. Phys. Commun., 91,305 (1995). Biol., 245,43 (1995).
156. D. Rognan, Perspect. Drug Discov. Des., 11, 177. G. Jones, P. Willett, R. C. Glen, A. R. Leach,
181 (1998). and R. Taylor, J. Mol. Biol., 267, 727 (1997).
157. M. E. Tuckerman and G. J. Martyna, J. Phys. 178. J. S. Taylor and R. M. Burnett, Proteins, 41,
Chem. B, 104, 159 (2000). 173 (2000).
158. D. L. Beveridge and F. M. DiCapua, Annu. Rev. 179. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson,
Biophys. Biophys. Chem., 18,431 (1989). D. J. States, S. Swaminathan, and M. Karplus,
159. D. A. Pearlman and P. A. Kollman in W. F. van J. Comput. Chem., 4, 187 (1983).
Gunsteren and P. K. Weiner, Eds., Computer 180. R. S. Judson, E. P. Jaeger, and A. M. Treasury-
Simulation of Biomolecular Systems: Theoret- wala, Theochem. J. Mol. Struct., 114, 191
ical and Experimental Applications, Vol. 1, (1994).
ESCOM, Leiden, The Netherlands, 1989, p. 181. K. P. Clark and Ajay, J. Comput. Chem., 16,
101. 1210 (1995).
rences
C. M. Oshiro, I. D. Kuntz, and J. S. Dixon, H. A. Carlson, K. M. Masukawa, K. Rubins,

J. Cornput.-Aided Mol. Des., 9,113(1995). F. D. Bushman, W. L. Jorgensen, R. D. Lins,
M. Thormann and M. Pons, J. Comput. Chem., J. M. Briggs, and J. A. McCammon, J. Med.
22,1971(2001). Chem.,43,2100(2000).
G.Jones in P. v. R. Schleyer, N. L. Allinger, T. H. A. Carlson, K. M. Masukawa, and J. A. Mc-
Clark, J. Gasteiger, P. A. Kollman, H. F. Cammon, J. Phys. Chem. A, 103,10213(2000).
Schaefer, 3rd, and R. P. Schreiner, Eds., Ency- D. Bouzida, P. A. Rejto, S. Arthurs, A. B. Col-
clopedia of Computational Chemistry, John son, S. T. Freer, D. K. Gehlhaar, V. Larson,
Wiley & Sons, New York, 1998. B. A. Luty, P. W. Rose, and G. M. Verkhivker,
D. K. Gehlhaar, G. M. Verkhivker, P. A. Rejto, Int. J. Quantum Chem., 72,73(1999).
C. J. Sherman, D. B. Fogel, L. J. Fogel, and R. M. Knegtel, I. D. Kuntz, and C. M. Oshiro, J.
S. T. Freer, Chem. Biol., 2,317(1995). Mol. Biol., 266,424(1997).
D. R. Westhead, D. E. Clark, and C. W. Murray, F. Osterberg, G. M. Morris, M. F. Sanner, A. J.
J. Cornput.-Aided Mol. Des., 11,209(1997). Olson, and D. S. Goodsell, Proteins, 46, 34
(2002).
J. M. Yang and C. Y. Kao, J. Comput. Chem.,
H. B. Broughton, J. Mol. Graph. Model., 18,
21,988(2000).
247(2000).
C. A. Baxter, C. W. Murray, D. E. Clark, D. R. I. KolossvAry and W. C. Guida, J. Am. Chem.
Westhead, and M. D. Eldridge, Proteins, 33, SOC.,118,5011(1996).
367(1998).
I. Kolossv6ry and W. C. Guida, J. Comput.
C. A. Baxter, C. W. Murray, B. Waszkowycz, J. Chem., 20,1671(1999).
Li, R. A. Sykes, R. G. Bone, T. D. Perkins, and J. Israelachvili and H. Wennerstrom, Nature,
W. Wylie, J. Chem. Inf Comput. Sci., 40,254 379,219(1996).
(2000). M. Levitt and B. H. Park, Structure, 1, 223
L. David, R. Luo, and M. K. Gilson, J. Cornput.- (1993).
Aided Mol. Des., 15,157(2001). J. E. Ladbury, Chem. Biol., 3,973(1996).
J. A. McCammon and S. C. Harvey, Dynamics P. Y. Lam, P. K. Jadhav, C. J. Eyermann, C. N.
ofProteins and Nucleic Acids, Cambridge Uni- Hodge, Y. Ru, L. T. Bacheler, J. L. Meek, M. J.
versity Press, London, 1987. Otto, M. M. Rayner, Y. N. Wong, C. H. Chang,
H. Frauenfelder, S. G. Sligar, and P. G. P. C. Weber, D. A. Jackson, T. R. Sharpe, and
Wolynes, Science, 254,1598(1991). S. Ericksonviitanen, Science, 263,380(1994).
E.Freire, Adv. Protein Chem., 51,255(1998). M. Rarey, B. Kramer, and T. Lengauer, P r e
teins, 34,17(1999).
B.Ma, S. Kumar, C. J. Tsai, and R. Nussinov,
Protein Eng., 12,713(1999). M. L. Raymer, P. C. Sanschagrin, W. F. Punch,
S. Venkataraman, E. D. Goodman, and L. A.
M.Gerstein and W. Krebs, Nucleic Acids Res., Kuhn, J. Mol. Biol., 265,445(1997).
26,4280(1998). P. J. Goodford, J. Med. Chem.,28,849(1985).
H . A. Carlson and J. A. McCammon, Mol. R. C. Wade and P. J. Goodford, J. Med. Chem.,
Pharmacol., 57,213(2000). 36,148(1993).
H . Claussen, C. Buning, M. Rarey, and T. Len- R. C. Wade, K. J. Clark, and P. J. Goodford,
gauer, J. Mol. Biol., 308,377(2001). J. Med. Chem., 36,140(1993).
D. A. Gschwend, A. C. Good, and I. D. Kuntz, J. W. E. Minke, D. J. Diller, W. G. Hol, and C. L.
Mol. Recognit., 9,175(1996). Verlinde, J. Med. Chem.,42,1778(1999).
A. R. Leach, J. Mol. Biol., 235,345(1994). M.S. Rao and A. J. Olson, Proteins, 34, 173
L. Schaffer and G. M. Verkhivker, Proteins, 33, (1999).
295(1998). P. Pospisil, L. Scapozza, and G. Folkers in
H. D. Hoeltje and W. Sippl, Eds., Rational Ap-
M. Schapira, B. M. Raaka, H. H. Samuels, and proaches to Drug Design. Proceedings of the
R. Abagyan, Proc. Natl. Acad. Sci. USA, 97, 13th European Symposium on Quantitative
1008(2000). Structure-Activity Relationships, Prous Sci-
V.Schnecke, C. A. Swanson, E. D. Getzoff, J. A. ence, Barcelona/Philadelphia, 2001,p. 92.
Tainer, and L. A. Kuhn, Proteins, 33, 74 R. M. Knegtel, D. M. Bayada, R. A. Engh, W.
(1998). von der S a d , V. J. van Geerestein, and P. D.
A. C. Anderson, R. H. O'Neil, T. S. Surti, and Grootenhuis, J. Cornput.-Aided Mol. Des., 13,
R. M. Stroud, Chem. Biol., 8,445(2001). 167(1999).
225. P. Tao and L. Lai, J. Cornput.-Aided Mol. Des., Design. Proceedings of the 13th European
15, 429 (2001). Symposium on Quantitative Structure-Activity
226. H. Gohlke, M. Hendlich, and G. Klebe, J. Mol. Relationships, Prous Science, BarcelonaPhil-
Biol., 295, 337 (2000). adelphia, 2001, p. 78.
227. 0. Roche, R. Kiyama, and C. L. Brooks, 3rd, 245. G. Cruciani and K. A. Watson, J. Med. Chem.,
J. Med. Chem., 44,3592 (2001). 37,2589 (1994).
228. C. A. Sotriffer,H. H. Ni, and J. A. McCammon, 246. W . Sippl and H. D. Hoeltje, J. Mol. Struct.-
J. Am. Chem. Soc., 122,6136 (2000). Theochem., 503,31 (2000).
229. S. Ha, R. Andreani, A. Robbins, and I. Muegge, 247. M. Vieth and D. J. Cummins, J. Med. Chem.,
J. Cornput.-Aided Mol. Des., 14,435 (2000). 43, 3020 (2000).
230. C. Bissantz, G. Folkers, and D. Rognan, 248. M. Wojciechowski and J. Skolnick, J. Comput.
J. Med. Chem., 43,4759 (2000). Chem., 23, 189 (2002).
231. J. S. Dixon, Proteins (Suppl.),198 (1997). 249. S. Moro, A. H. Li, and K. A. Jacobson, J. Chem.
232. N. C. Strynadka, M. Eisenstein, E. Katchalski- Inf Comput. Sci., 38, 1239 (1998).
Katzir, B. K. Shoichet, I. D. Kuntz, R. Aba- 250. S. Moro, D. Guo, E. Camaioni, J. L. Boyer,
gyan, M. Totrov, J. Janin, J. Cherfils, F. Zim- T . K. Harden, and K. A. Jacobson, J. Med.
merman, A. Olson, B. Duncan, M. Rao, R. Chem., 41, 1456 (1998).
Jackson, M. Sternberg, and M. N. James, Nut. 251. R. Kiyama, Y . Tamura, F. Watanabe, H. Tsu-
Struct. Biol., 3,233 (1996). zuki, M. Ohtani, and M. Yodo, J. Med. Chem.,
233. B. Kramer, M. Rarey, and T . Lengauer, Pro- 42, 1723 (1999).
teins (Suppl.), 221 (1997). 252. C. A. Sotriffer, W . Flader, A. Cooper, B. M.
234. V . Sobolev, T . M. Moallem, R. C. Wade, G. Rode, D. S. Linthicum, K. R. Liedl, and J. M.
Vriend, and M. Edelman, Proteins (Suppl.), Varga, Biophys. J., 76,2966 (1999).
210 (1997). 253. A. Schafferhans and G. Klebe, J. Mol. Biol.,
235. H. Kubinyi, Ed., 30 QSAR in Drug Design. 307,407 (2001).
Theory, Methods, and Applications, ESCOM, 254. S. Kearsley and G. Smith, Tetrahedron Com-
Leiden, The Netherlands, 1993. put. Methodol., 3, 615 (1990).
236. H. Kubinyi, G. Folkers, andY. C. Martin, Eds., 255. G. Klebe, T . Mietzner, and F. Weber, J. Com-
30 QSAR in Drug Design. Recent Advances, put.-Aided Mol. Des., 8, 751 (1994).
Vol.3, Kluwer/ESCOM, Dordrecht, The Neth-
erlands, 1998.
256. G. Klebe, T . Mietzner, and F. Weber, J. Com-
put.-Aided Mol. Des., 13,35 (1999). .
237. H. Kubinyi, QSAR, Hansch Analysis and Re- 257. P. A. Kollman,Acc. Chem. Res., 29,461 (1996).
lated Approaches, VCH, Weinheim, Germany,
258. M. L. Lamb and W . L. Jorgensen, Cum. Opin.
1993.
Chem. Biol., 1,449 (1997).
238. W . Sippl, J. Cornput.-Aided Mol. Des., 14, 559
259. M. K. Gilson, J. A. Given, and M. S. Head,
(2000).
Chem. Biol., 4, 87 (1997).
239. C. L. Waller, T . I . Oprea, A. Giolitti, and G. R.
260. J. Aqvist, C. Medina, and J. E. Samuelsson,
Marshall, J. Med. Chem., 36,4152 (1993).
Protein Eng., 7, 385 (1994).
240. A. M. Gamper, R. H. Winger, K. R. Liedl, C. A.
261. T . Hansson, J. Marelius, and J. Aqvist, J. Com-
Sotriffer, J. M. Varga, R. T . Kroemer, and
put.-Aided Mol. Des., 12, 27 (1998).
B. M. Rode, J. Med. Chem., 39,3882 (1996).
262. J. Aqvist,V . B. Luzhkov, and B. 0. Brandsdal,
241. R. C. Wade i n H. D. Hoeltje and W . Sippl, Eds.,
Acc. Chem. Res., 35,358 (2002).
Rational Approaches to Drug Design. Proceed-
ings of the 13th European Symposium on 263. R. C. Rizzo, J. Tirado-Rives, and W . L. Jor-
Quantitative Structure-Activity Relationships, gensen, J. Med. Chem., 44, 145 (2001).
Prous Science, BarcelonaPhiladelphia, 2001, 264. A. E. Mark and W . F. van Gunsteren, J. Mol.
p. 23. Biol., 240, 167 (1994).
242. A. R. Ortiz, M. T . Pisabarro, F. Gago, and R. C. 265. D. Williams and B. Bardsley, Perspect. Drug
Wade, J. Med. Chem., 38,2681 (1995). Discov. Des., 17, 43 (1999).
243. R. C. Wade, A. R. Ortiz, and F. Gago, Perspect. 266. P. R. Andrews, D. J. Craik, and J. L. Martin,
Drug Discov. Des., 9, 19 (1998). J. Med. Chem., 27, 1648 (1984).
244. T . Wang and R. C. Wade in H. D. Hoeltje and 267. H. J. Schneider, Chem. Soc. Rev., 23, 227
W . Sippl, Eds., Rational Approaches to Drug (1994).
268. T. J . Stout, C. R. Sage, and R. M. Stroud, Struc- 290. R. M. A. Knegtel and P. D. J. Grootenhuis,
ture, 6, 839 (1998). Perspect. Drug Discov. Des., 9-11, 99 (1998).
269. M. K. Holloway, J. M.Wai, T . A. Halgren, P.M. 291. T . I. Oprea and G. R. Marshall, Perspect. Drug
Fitzgerald, J. P. Vacca, B. D. Dorsey, R. B. Discov. Des., 9-11,3 (1998).
Levin, W . J. Thompson, L. J. Chen, and S. J. 292. H. J. Boehm and M. Stahl, Med. Chem. Res., 9,
deSolms, J. Med. Chem., 38,305 (1995). 445 (1999).
270. P. D. J . Grootenhuis and P. J. M. van Galen, 293. R. Wang, L. Liu, L. Lai, and Y . Tang, J . Mol.
Acta Cryst. D, 51,560 (1995). Model., 4, 379 (1998).
271. S. J. Weiner, P. A. Kollman, D. A. Case, U. C. 294. H. J. Boehm, J. Cornput.-Aided Mol. Des., 8,
Singh, C. Ghio, G. Alagona, S. Profeta, and P. 243 (1994).
Weiner, J. Am. Chem. Soc., 106, 765 (1984).
295. C. W . Murray, T . R. Auton, andM. D. Eldridge,
272. S. J. Weiner, P. A. Kollman, D. T . Nguyen, and J. Cornput.-Aided Mol. Des., 12,503 (1998).
D. A. Case, J. Comput. Chem., 7,230 (1986).
296. H. J. Boehm, J. Cornput.-Aided Mol. Des., 12,
273. E. C. Meng, B. K. Shoichet, and I. D. Kuntz, 309 (1998).
J. Comput. Chem., 13,505 (1992).
297. A. N. Jain, J. Cornput.-Aided Mol. Des., 10,427
274. M. Vieth, J . D. Hirst, A. Kolinski, and C. L. (1996).
Brooks, J. Comput. Chem., 19,1612 (1998).
298. D. N. Boobbyer, P. J. Goodford, P. M. McWhin-
275. N. Majeux, M. Scarsi, J. Apostolakis, C. nie, and R. C. Wade, J. Med. Chem., 32, 1083
Ehrhardt, and A. Caflisch, Proteins, 37, 88 (1989).
299. M. Stahl, Perspect. Drug Discov. Des., 20, 83
276. B. K. Shoichet, A. R. Leach, and I. D. Kuntz, (2000).
Proteins, 34,4 (1999).
300. H. J. Boehm, D. W . Banner, and L. Weber,
277. W. C. Still, A. Tempczyk, R. C. Hawley, and T. J. Conput.-Aided Mol. Des., 13,51 (1999).
Hendrickson, J. Am. Chem. Soc., 112, 6127
301. M. Matsumara, W . J . Becktel, and B. W . Mat-
thews, Nature, 334,406 (1988).
278. X. Zou, Y. Sun, and I. D. Kuntz, J. Am. Chem.
302. V . Nauchitel, M . C. Villaverde, and F. Suss-
SOC.,121,8033 (1999). man, Protein Sci., 4, 1356 (1995).
279. M. L. Lamb, K. W . Burdick, S. Toba, M. M.
303. L. Wesson and D. Eisenberg, Protein Sci., 1,
Young, A. G. Skillman, X. Zou, J. R. Arnold,
227 (1992).
and I. D. Kuntz, Proteins, 42,296 (2001).
304. S. Vajda, Z. Weng, R. Rosenfeld, and C. DeLisi,
280. M. Schapira, M. Totrov, and R. Abagyan, J.
Biochemistry, 33, 13977 (1994).
Mol. Recognit., 12, 177 (1999).
305. C. Zhang, G. Vasmatzis, J . L. Cornette, and C.
281. T. Zhang and D. E. Koshland, Jr., Protein Sci.,
DeLisi, J. Mol. Biol., 267, 707 (1997).
282. P. H . Hunenberger, V . Helms, N . Narayana, 306. S. Miyazawa and R. L. Jernigan, Macromole-
S. S. Taylor, and J. A. McCammon, Biochemis- cules, 18,534 (1985).
try, 38,2358 (1999). 307. R. D. Head, M. L. Smythe, T . I. Oprea, C. L.
283. D. A. Pearlman and P. S. Charifson, J. Med. Waller, S. M. Green, and G. R. Marshall, J. Am.
Chem., 44, 502 (2001). Chem. Soc., 118,3959 (1996).
284. D. A. Pearlman, J. Med. Chem., 42, 4313 308. E. C. Meng, I. D. Kuntz, D. J. Abraham, and
G. E. Kellogg, J. Cornput.-Aided Mol. Des., 8,
299 (1994).
285. J. Bostrom, P. 0. Norrby, and T . Liljefors,
J. Cornput.-Aided Mol. Des., 12,383 (1998). 309. H. Gohlke and G. Klebe, Curr. Opin. Struct.
Biol., 11,231 (2001).
286. M. Vieth, J. D. Hirst, and C. L. Brooks, 3rd,
J. Cornput.-Aided Mol. Des., 12, 563 (1998). 310. M. J. Sippl, J. ComputAided Mol. Des., 7,473
(1993).
287. M. Stahl and M. Rarey, J. Med. Chem., 44,
311. G.Verkhivker, K. Appelt, S. T . Freer, and J. E.
288. I. D. Kuntz, K. Chen, K. A. Sharp, and P. A. Villafranca,Protein Eng., 8,677 (1995).
Kollman, Proc. Natl. Acad. Sci. USA, 96,9997 312. A. Wallqvist, R. L. Jernigan, and D. G. Covell,
Protein Sci., 4, 1881 (1995).
289. J. D. Hirst, Curr. Opin. Drug Disc. Dev., 1,28 313. R. S. DeWitte and E. I. Shakhnovich, J. Am.
Chem. Soc., 118,11733 (1996).
314. R. S. DeWitte, A. V . Ishchenko, and E. I. 335. G. B. McGaughey, M. Gagne, and A. K. Rappe,

Shakhnovich, J. Am. Chem. Soc., 119, 4608 J. Biol. Chem., 273,15458 (1998).
(1997). 336. C. Chipot, R. Jaffe,B. Maigret, D. A. Pearlman,
315. J. B. 0 . Mitchell, R. A. Laskowski, A. Alex, and and.P. A. Kollman, J. Am. Chem. Soc., 118,
J. M. Thornton, J. Comput. Chem., 20, 1165 11217 (1996).
(1999). 337. T . G. Davies, R. E. Hubbard, and J. R. Tame,
316. J. B. 0. Mitchell, R. A. Laskowski, A. Alex, Protein Sci., 8,1432(1999).
M. J. Forster, and J. M. Thornton, J. Comput. 338. G. Klebe, F. Dullweber, and H. J. Boehm in
Chem., 20,1177(1999). R. B. Raffa, Ed., Drug-Receptor Thermody-
317. I. Muegge and Y . C. Martin, J. Med. Chem., 42, namics: Introduction and Applications, John
791(1999). Wiley & Sons, Chichester, 2001, p. 83.
318. H. Gohlke, M. Hendlich, and G. Klebe, Per- 339. I. Muegge, Med. Chem. Res., 9,490(1999).
spect. Drug Discov. Des., 20,115(2000). 340. I. Nobeli, J. B. 0 . Mitchell, A. Alex, and J. M.
319. I. Muegge, J.Comput. Chem., 22,418(2001). Thornton, J. Comput. Chem., 22,673(2001).
320. I. Muegge, Perspect. Drug Discov. Des., 20,99 341. I. Massova and P. A. Kollman, Perspect. Drug
(2000). Discov. Des., 18,113(2000).
321. H. M. Berman, J. Westbrook, Z. Feng, G. Gilli- 342. P. A. Kollman, I. Massova, C. Reyes, B. Kuhn,
land, T . N . Bhat, H. Weissig, I. N. Shindyalov, S. Huo, L. Chong, M. Lee, T . Lee, Y . Duan, W .
and P. E. Bourne, Nucleic Acids Res., 28,235 Wang, 0. Donini, P. Cieplak, J. Srinivasan,
(2000). D. A. Case, and T . E. Cheatham, 3rd, Acc.
Chem. Res., 33,889(2000).
322. M.Hendlich, Acta Crystallogr. D Biol. Crystal-
logr., 54,1178(1998). 343. I. Massova and P. A. Kollman, J. Am. Chem.
SOC.,121,8133(1999).
323. A. Bergner, J. Guenther, M. Hendlich, G.
344. 0.A.Donini and P. A. Kollman, J. Med. Chem.,
Klebe, and M. Verdonk, Biopolymers (Nucl.
43,4180 (2000).
Acid Sci.), 61,299(2002).
345. B. Kuhn and P. A. Kollman, J. Med. Chem., 43,
324. Y.Sun, T.J. Ewing, A. G. Skillman, and I. D.
3786(2000).
Kuntz, J. Cornput.-Aided Mol. Des., 12, 597
(1998). 346. B. Kuhn and P. A. Kollman, J. Am. Chem. Soc.,
122,3909 (2000).
325. E. J. Martin, R. E. Critchlow, D. C. Spellmeyer,
S. Rosenberg, K. L. Spear, and J. M. Blaney, 347. N.Froloff,A.Windemuth, and B. Honig, Pro-
Pharmacochem. Libr., 29,133(1998). tein Sci., 6 , 1293(1997).
326. B. K. Shoichet, D. L. Bodian, and I. D. Kuntz, 348. J. Shen, J. Med. Chem., 40,2953(1997).
J. Comput. Chem., 13,380 (1992). 349. C. J. Woods, M. A. King, and J. W . Essex,
327. T . J. A. Ewing and I . D. Kuntz, J. Comput. J. Cornput.-Aided Mol. Des., 15,129(2001).
Chem., 18,1175(1997). 350. G. Archontis, T.Simonson, and M. Karplus, J.
328. T . Ewing, Ed., DOCK User Manual, Version Mol. Biol., 306,307(2001).
4.0, Regents of the University o f California, 351. M. L. Verdonk, J. C. Cole, and R. Taylor, J.
San Francisco, 1998. Mol. Biol., 289,1093(1999).
329. M. Stahl and H. J. Boehm, J. Mol. Graph. 352. M. L. Verdonk, J. C. Cole, P. Watson, V . Gillet,
Model., 16,121(1998). and P. Willett, J. Mol. Biol., 307,841(2001).
330. C. Lemmen, T . Lengauer, and G. Klebe, 353. D. E. Clark and S. D. Pickett, Drug Discov.
J. Med. Chem., 41,4502 (1998). Today, 5,49(2000).
331. C. Lemmen, A. Zien, R. Zimmer, and T . Len- 354. I. Muegge, S.L. Heald, and D. Brittelli, J. Med.
gauer, Pac. Symp. Biocomput., 482(1999). Chem., 44,1841(2001).
332. G.R. Desiraju and T . Steiner, The Weak Hy- 355. Unity Chemical Information Software, Tripos
drogen Bond in Chemistry and Biology, Oxford Inc., St. Louis, MO.
University Press, Oxford, 1999. 356. Y.Kurogi and 0. F. Guener, Curr. Med. Chem.,
333. G. Parkinson, A.Gunasekera, J. Vojtechovsky, 8,1035(2001).
X . Zhang, T . A. Kunkel, H. Berman, and R. H. 357. M. Rarey and J. S. Dixon, J. Cornput.-Aided
Ebright, Nat. Struct. Biol., 3,837(1996). Mol. Des., 12,471(1998).
334. J. P. Gallivan and D. A. Dougherty, J. Am. 358. M. Rarey and M. Stahl, J. Cornput.-Aided Mol.
Chem. Soc., 122,870(2000). Des., 15,497(2001).
A. I. Su, D. M. Lorber, G. S. Weston, W. A. Maybridge Database (19991,Maybridge Chem-
Baase, B. W. Matthews, and B. K. Shoichet, ical Co. Ltd., UK.
Proteins, 42,279(2001). LeadQuest Chemical Compound Libraries
Protherics (2000).Dockcrunch [Online]. Pro- (2000),Tripos Inc., St. Louis, MO.
therics Molecular Design Ltd. http://www.pro- J. Sadowski, C. Rudolph, and J. Gasteiger, Tet-
therics.com/crunch/ [2001,Sept 251. rahedron Comput. Methodol., 3,537(1990).
H.J. Boehm and M. Stahl, Curr. Opin. Chem. A. Maxwell, Biochem. Soc. Trans., 27, 48
Biol., 4,283(2000). (1999).
C. W. Murray, D. E. Clark, T. R. Auton, M. A. D. Bailey and D. Brown, Drug Discov. Today,
Firth, J. Li, R. A. Sykes, B. Waszkowycz, D. R. 6,57(2001).
Westhead, and S. C. Young, J. Cornput.-Aided R. M. Eglen, G. Schneider, and H. J . Boehm in
Mol. Des., 11,193 (1997). G. Schneider and H. J. Boehm, Eds., Virtual
M. Rarey and T. Lengauer, Perspect. Drug Dis- Screening for Bioactive Molecules, VCH, Wein-
cov. Des., 20,63 (2000). heim, Germany, 2000,p. 1.
D. C. Roe and I. D. Kuntz, J. Cornput.-Aided R. Lahana, Drug Discov. Today, 4,447(1999).
Mol. Des.,9,269 (1995). C. A. Lipinski, F. Lombardo, B. W. Dominy,
S. Makino, T. J. Ewing, and I. D. Kuntz, and P. J. Feeney, Adv. Drug Delivery Rev., 23,
J. Cornput.-Aided Mol. Des., 13,513 (1999). 3(1997).
M. D. Miller, R. P. Sheridan, S. K. Kearsley, Y. Iwata, M. Arisawa, R. Hamada, Y. Kita,
and D. J. Underwood, Methods Enzymol., 241, M. Y. Mizutani, N. Tomioka, A. Itai, and S.
354(1994). Miyamoto, J. Med. Chem., 44,1718(2001).
G.M.Verkhivker, D. Bouzida, D. K. Gehlhaar, V. Sobolev, R. C. Wade, G. Vriend, and M. Edel-
P. A. Rejto, S. Arthurs, A. B. Colson, S. T. man, Proteins, 25,120 (1996).
Freer, V. Larson, B. A. Luty, T. Marrone, and W. Welch, J. Ruppert, and A. N. Jain, Chem.
P. W. Rose, J. Cornput.-Aided Mol. Des., 14, Biol., 3,449 (1996).
731(2000).
E. Perola, K. Xu, T. M. Kollmeyer, S. H. Kauf-
S. B.Shuker, P. J. Hajduk, R. P. Meadows, and mann, F. G. Prendergast, and Y. P. Pang,
S. W. Fesik, Science, 274,1531(1996). J Med Chem., 43,401(2000).
I. Muegge, Y. C. Martin, P. J . Hajduk, and D. S. Goodsell, G. M. Morris, and A. J. Olson, J.
S. W. Fesik, J. Med. Chem., 42,2498(1999). Mol. Recognit., 9,1 (1996).
R. L. DesJarlais, G. L. Seibel, I. D. Kuntz, P. S. C. A. Sotriffer, H. Ni, and J. A. McCamSon,
Furth, J. C. Alvarez, P. R. Ortiz de Montellano, J. Med. Chem., 43,4109 (2000).
D. L. Decamp, L. M. Babe, and C. S. Craik,
T . Hou, J. Wang, L. Chen, and X. Xu, Protein
Proc. Natl. Acad. Sci. USA, 87,6644(1990).
Eng., 12,639 (1999).
B. K. Shoichet, R. M. Stroud, D. V. Santi, I. D.
B. N. Dominy and C. L. Brooks, 3rd, Proteins,
Kuntz, and K. M. Perry, Science, 259, 1445
36,318(1999).
(1993).
I. D. Wall, A. R. Leach, D. W. Salt, M. G. Ford,
L. R. Hoffman, I. D. Kuntz, and J. M. White,
and J. W. Essex, J. Med. Chem., 42, 5142
J. Virol., 71,8808 (1997).
(1999).
I.Massova, P. Martin, A. Bulychev, R. Kocz, M.
D. A. Pearlman and P. S. Charifson, J. Med.
Doyle, B. F. Edwards, and S. Mobashery,
Chem., 44,3417(2001).
Bioorg. Med. Chem. Lett., 8,2463 (1998).
S. S. So and M. Karplus, J. Conput.-Aided
D. Tondi, U. Slomczynska, M. P. Costi, D. M.
Mol. Des., 13,243(1999).
Watterson, S. Ghelli, and B. K. Shoichet,
Chem. Biol., 6,319(1999). M. Rarey, B. Kramer, and T. Lengauer, Bioin-
formatics, 15,243(1999).
T. Toyoda, R. K. Brobey, G. Sano, T. Horii, N.
Tomioka, and A. Itai, Biochem. Biophys. Res. Y. Takamatsu and A. Itai, Proteins, 33, 62
Commun., 235,515 (1997). (1998).
Available Chemicals Directory, MDL Informa- D. Rognan, S. L. Lauemoller, A. Holm, S. Buus,
tion Systems Inc., San Leandro, CA. and V. Tschinke, J. Med. Chem., 42, 4650
F. Abbate, C. T. Supuran, A. Scozzafava, P. (1999).
Orioli, M. T. Stubbs, and G. Klebe, J. Med. G. E. Kellogg, G. S. Joshi, and D. J. Abraham,
Chem.,45,in press (2002). Med. Chem. Res., 1,444(1991).
4 Bioinformatics and Target Discovery 343
Table 8.2 Comparative EST Counts for Five Genes Sequenced from Normal Prostate, Stage
B2 Cancer, Stage C Cancer, and Benign Prostatic Hyperplasia (BPH) cDNA Libraries
Normal Stage B2 Cancer Stage C Cancer BPH
Prostate All Other
Gene Total Tags P Tags P Tags P Tissue
PSA 13 7 0.7-0.8 14 0.6-0.7 22 0.8-0.9 0
PAP 4 1 0.1-0.2 34 B0.999 9 0.7-0.8 1
HGK 1 7 >0.999 6 0.97-0.98 5 0.8-0.9 0
PSI 0 3 0.993-0.994 7 0.997-0.998 1 0.4-0.5 0
PS2 0 2 0.97-0.98 7 0.997-0.998 0 0-<0.1 0
Total clones 4500 1400 3400 4800 732,000
The tag counts are from Ref. 21. The P values are calculated according to Equation 8.1, modified for use with different
total EST counts from the source libraries. The web URL http://igs-server.enrs-mrs.fr/-audic/e~-bid~nflat.pl was used to
calculate the probability intervals. A P value nearer to 1 indicates that the differential expression is likely to be significant.
While prostate specific antigen (PSA) and glandular kallikrein (HGK) have been proposed as prostate cancer markers, both
PSI and PS2 are prostate specific. Thus, the down-regulation of PAP in stage B2 cancer is not significant using this test,
whereas, the test shows its up-regulation in the BPH sample to be more significant. So, for lower changes in copy number,
where more sensitivity is expected, this test of significance is a valuable tool.
overall profiles obtained from tag counting ex- ues for the probability of certain genes ex-
periments could be performed using the tradi- pressed at different levels in normal prostate,
tional 2 test. However, this is the wrong ap- stage B2 cancer, stage C cancer, and tissue
proach for experiments where the significance from a benign prostatic hyperplasia (BPH)
of differences between expression levels (i.e., sample as shown in Table 8.2.
tag counts) of individual genes is to be deter- The relationship between gene expression,
mined, for example, in diseased and normal mRNA level, and protein expression is com-
tissue states (19). One of the issues in per- plex and not one that can be gleaned from col-
forming tag-sampling experiments is that the lecting copy number information in this type
experiments themselves are usually not repli-
cated. Thus, the dispersion of results cannot
of experiment. Even with careful statistical .
analysis, such as that described above, the as-
be used to estimate the SEs associated with sumption that increases or decreases in copy
each expression measurement. This elimi- number reflect real biologically significant
nates the possibility of using standard tests of events relies on the confidence with which we
variance. Instead the Poisson distribution, can compare a library made from one set of
which includes an implicit estimate of stan- cells to a library made from a different set of
dard error, approximates random sampling of cells. Thus, most transcript analysis experi-
tags very well. Audic and Claverie (20) have ments setting out to be quantitative end up
proposed a significance test (see Equation 8.1) simply as target identification exercises. A ma-
in which the sample size plays no part, so long jor goal of proteomics is to generate a factory-
as it is the same for both experiments, but only type approach to profiling protein level expres-
depends on the observed tag counts of the sion that more closely reflects the biological
same gene from diseased, g,, and normal, g,, reality. The EST approach has been turned
states: into an industrial scale process but has not
been able to impact the drug discovery process
significantly because of the biological lirnita-
tions described and the lack of sound mathe-
matical modeling of the whole process.
The equation has also been extended to cover Expression experiments are measures of
the more practical case of different total num- cell population averages, not the contents of
bers of tags. Thus, taking some data from Fan- individual cells, so it is important to consider
non (21) as an example, we can calculate val- to what extent all cells in the candidate popu-
Bioinformatics: Its Role in Drug Discovery
1 INTRODUCTION vide some background to the tools and

technologies that are used on a daily basis by
In January 2001, the biopharmaceutical com- bioinformatics scientists in an effort to make
pany Millennium announced that, as part of a the subject more accessible to readers with a
multimillion-dollar research collaboration non-biological training. This article is not,
with Bayer, an anticancer agent had pro- however, a training manual in running pro-
ceeded to clinical trials (1,2).The remarkable grams nor is it an index to the latest resources
achievement was not that the collaboration on the Internet. Both bioinformatics and the
between a pharmaceutical company and a World Wide Web grow at such a pace that any
high technology genomics company had been journal article is likely to be out-of-date before
so successful in terms of a product, but that it it goes to press. Nevertheless, the Internet is a
had ostensibly lopped 2 years from the discov- crucial resource for the practicing bioinfor-
ery lifecycle in the process. As a result of per- matics researcher. All of the core uses for exe-
ceived improvements in research efficiency cuting projects are available there both in
such as this, more effort than ever is being terms of databases and computer programs.
placed in development and implementation of Lists and descriptions would fill many vol-
genomics and screening technology automa- umes. Certain resources are mentioned where
tion. The substantial volumes of data thus necessary to illustrate specific examples. Some
generated are used to pan for innovative lead useful starting points for an exploration of
compounds for novel therapeutic targets to bioinformatics on the Internet are shown in
feed ever more voracious development pipe- Table 8.1. An introduction to tools and tech-
lines (3). niques is available in Ref. 4, while Ref. 5 con-
The genomics tools of rapid sequence tains a more technical approach based on un-
screening, microarray chips, expression anal- derstanding of machine learning. For a more
ysis, protein interactions, macromolecular mathematical treatment, readers are referred
structure determination, sequence compari- to Ref. 6 . Reference may be made to the jour-
son, and a host more are all biological tech- nal Briefings in Bioinformatics for reasonably.
niques that generate different types of raw accessible descriptions of systems and pro-
data. Moreover, the data are produced in far cesses. The journal Bioinfomatics is aimed at a
greater quantity than has been seen in biology more technical audience. The annual database
before and the sheer speed with which the issue of Nucleic Acids Research is the gener-
data are produced is unprecedented. The im- ally accepted place for publication of bioinfor-
provements in process throughput, which are matics databases. Some fundamental papers
so exciting to the financial markets and so crit- referred to are relatively old for a young sci-
ical to the alleviation of pain and suffering in ence.
human populations, are readily achievable be- Rather than delve into minutiae here, the
cause of the science of information integration aim has been to present an overview of the
and knowledge transformation that are the way bioinformatics is being used in the pro-
hallmarks of bioinformatics. It is not enough cess of drug discovery in 2002. Many impor-
simply to produce data, even from the most tant and exciting aspects of the academic re-
leading edge of techniques; we must be able to search that is being carried out are passed over
manage it effectively and extract useful infor- in silence or referred to only by a brief com-
mation that leads to that critical knowledge on ment. Computer technologies change rapidly
which realistic drug discovery decisions can be and bioinformatics has always been at the
made. forefront in applying new computing para-
The opening example illustrates how digms to biological problems (e.g., use of the
genomic technologies, including bioinformat- Internet, object-oriented programming tech-
ics, are making an impact at the drug discov- nologies, neural networks, parallel comput-
ery level. The purpose of this article is to pro- ing). Some of the molecular biology or bio-
2 Bioinformatics in Drug Discovery 335
Table 8.1 A Selected List of Key Websites for Further Exploration

of Online Bioinformatics Resources
-- - -
Internet Site Brief Description

http:llwww.ncbi.nlm.nih.gov The National Center for Biotechnology Information (NCBI).Located
at the National Library of Medicine in Bethesda, MD, USA. The
home of the GenBank DNA sequence database; PubMed literature
search engine; sequence search tools (e.g., PSI-BLAST); genomic
sequence navigation tools. A substantial repository of resources in
all areas of bioinformatics.
http:llwww.ebi.ac.uk The European Bioinformatics Institute (EBI).This site is located at
Hinxton Hall, Cambridge, UK. The home of the EMBL Nucleotide
Sequence Database; data management tools [including publicly
accessible version of SRS-the Sequence Retrieval System (7)l;
protein family databases; microarray tools; etc. An extensive
repository of resources for bioinformatics.
http:llwww.expasy.ch The Expert Protein Analysis System. Dedicated to the analysis of
protein sequence and structure as well as two-dimensional PAGE.
Home of the SWISS-PROT protein knowledgebase and TrEMBL
computer annotated supplement (8).
http:llwww.man.ac.uk The University of Manchester Bioinformatics Education and
Research site (UMBER). Useful because it is the home of the
PRINTS (9,101resource for protein fingerprint analysis and a
valuable teaching site for bioinformatics.
http:l/www.ensembl.org Site developed by the Sanger Centre, Hinxton Hall, Cambridge, UK
and the EMBL-EBI, presenting tools for browsing and researching
the human genome sequence (11).This is a public access server
providing data and access at no charge. Commercial sites are also
available for working on commercially produced human genome
sequence.
http://www.mips.biochem.mpg.de The Munich Information Center for Protein Sequences (MIPS).
Provides a different view of several model organism genomes with
tools for analysis. .
physical aspects of techniques used to eye overview such as this, some basic material
generate data are outlined from the point of has to be covered to enable comprehension
view of a bioinformatician and not that of a both of the data and the manner of its analy-
practicing molecular biologist (although it has sis.
been reviewed by one, see the acknowledg- The second use to which bioinformatics has
ments) to give a flavor of the kinds of experi- been put in the drug discovery process is more
ments that are performed. fundamental. It is concerned with the use of
techniques in molecular sequence analysis to
2 BlOlNFORMATlCS IN DRUG generate relationships between sequences
DISCOVERY that are themselves used to provide funda-
mental structures for databases of drug dis-
Bioinformatics in drug discovery has tradi- covery information. Relationships between
tionally been used as a tool for finding new data elements are important because they
drug targets ("target selection"). This tech- help to place individual elements in a context
nique of target discovery is an important con- that can be readily assimilated by the user of
tribution that bioinformatics has been able to the system. In many situations, observers ap-
make to the drug discovery process. However, proach data from different points of view and
bioinformatics alone without the background bring to bear the richness of differing scien-
of molecular biological and biophysical exper- tific experiences. Whether we care to admit it
iments is sterile. To understand even a bird's- or not, "biologists" and "chemists" have dif-
ferent training and background and thus offer attempting to answer biological questions.
a range of opinions on similar pieces of infor- They also stress the importance of organizing
mation. Even the word "activity" means dif- and understanding biological data, rather
ferent things to a chemist who has synthesized than linking these aspects strongly to specific
a group of compounds or a biologist who has hardware or software implementations. Use of
developed an assay to test the compounds. computers may be involved in the process but
Both aspects are necessary for the discovery of the definitions are not limited by the applica-
new drugs, but they are different viewpoints tion of any particular technology.
that need to be supported by appropriate rela- Bioinformatics has also been defined as the
tionship mining in the data. If the bioinfor- application of computer technology to solving
matics job is done well, both views can be ac- biological problems. This definition, perhaps
commodated in the data structures and user what some would consider to be the canonid
interfaces used by both sets of users. one, is broad but restricts the scope of the def-
Throughout the pharmaceutical industry, inition to problems to which computer tech-
bioinformatics and chemoinformatics groups nology can be applied.
are working closer together than has been the
case hitherto. This is a consequence of the re- 3.2 Integration of Information
alization that managing data effectively re-
Bioinformatics has become a byword for inte-
quires integration of thinking (about defini-
gration; specifically the integration of data
tions of common attributes of molecules both
across different data resources to generate in-
small and large), integration of processes, and
tegrated information resources. Linking data
integration of implementation. The recent
and information in this way is fundamental to
rise in popularity in bioinformatics of the on-
bioinformatics activities and so some discus-
tology is an example of the application of a
sion of the meaning of data, information, and
computer science paradigm to the issue of re-
knowledge in the context of bioinformatics for
dundancy in nomenclature in many areas of
drug discovery is provided in Section 6. Inte-
biology. Application across the chemistry-bi-
gration is important because it provides con-
ology domain interface could well be beneficial
text, or at least a background, against which
for drug discovery effectiveness. The ontology
computational analyses are performed. In the
is simply a means to an end, in this instance,
past, for single molecule experiments, th5s
that end is improved communication and un-
background was achieved through reading the
derstanding of basic concepts within and
literature. Now that multiple molecule exper-
across the boundaries of major scientific disci-
iments are common, even genome-wide or in-
plines. There may, of course, be a variety of
ter-genome analyses, it is simply not practical
other means to reach that goal.
any longer to rely on the literature in its raw
form, unless it is part of an integrated knowl-
3 WHAT IS BIOINFORMATICS? edge-based approach that provides connec-
tions between disparate pieces of information,
3.1 Definitions backed up by experimental evidence from
which to draw conclusions (12).
Concisely, bioinformatics is our ability to or-
ganize biological data. From another perspec-
3.3 Bioinformatics and Skills
tive, bioinformatics is our ability to under-
stand how biological information is organized. The pursuit of bioinformatics involves a num-
From this understanding should spring an en- ber of different skills. Organizing, storing, re-
hanced view of the interactions between bio- trieving, and querying sets of biological data
logical molecules. This should, in turn, inform are techniques that lie at the heart of the sub-
our search strategies for new small molecules ject. An ability to analyze the characteristics of
that will modulate the behaviour of biological particular sets of biological data is fundamen-
molecules to give a beneficial therapeutic ef- tal. The translation of those characteristics
fect. These definitions arise from observation into electronic representations that can be or-
of the way diverse skills are brought to bear in ganized on a large scale is the domain of the
3 What is Bioinformatics?
bioinformatics software and database devel- pact. Technologies will be described to the ex-
oper. The process of analyzing and under- tent that such understanding is necessary to
standing biological data using the tools avail- grasp the relevance of the data being gener-
able is the domain of the bioinforrnatics ated and its significance.
analyst. When new tools are in the course of
development, substantial interaction between
the two skill sets is essential. 3.4 Standardization
In the pharmaceutical environment, both Progress in linking items of relevant data and
developer and analyst skills are necessary. generating integrated information resources
This is so even where commercial software is would be very limited were it not for efforts in
in use, because there is no single system avail- standardization that have been brought about
able commercially that provides the level of by international collaboration. There is still a
integration between the worlds of bio- and long way to go, however. While it is becoming
chemoinformatics necessary to effectively en- cheaper to obtain each piece of individual data,
hance the drug discovery process. Some inter-
the proportion of automated experiments is
facing of different systems is required and the
increasing, at least in the life sciences, because
warehousing of proprietary data is always an
of the ready availability of new technologies. It
issue.
This broad description of bioinformatics may seem a simple matter to create resources
and of the two types of bioinformatics scientist that store and manage streams of DNA
is quite abstract. It does not detail the charac- bases-represented by the four alphabetic
teristics of the data with which the bioinfor- characters A, C, T, and G. However, when we
matics scientist has to work. Neither does it also wish to integrate information on experi-
define the set of tools that the developer mentally or computationally determined
should work with or implement. There was a annotation and cross-reference to other re-
time, in the late 1980s and early 1990s, when sources using gene names, there are signifi-
the type of data was well defined. Molecular cant problems. The literature abounds with
sequence data, the stream of bases in DNA, synonyms for gene names and functions; even
and the stream of residues at the protein level the labels given to specific cellular functions
were the main types of data. Programmers de-
veloped code in FORTRAN or C and scripting
are not always clearly defined (13). .
To be able to process data automatically, it
languages were immature. has to be presented in a form that can be
Now, as science moves forward into a new parsed by a computer program and must also
millennium, additional types of data have be- include all the elements necessary to an un-
come important; for example, protein-protein derstanding of the biological system under
interactions and three-dimensional structure, study. Reliable information systems should
high density gene expression chips, cell imag- have source data of a consistently high quality
ing, etc. Developers have a wide range of tools to prevent application errors and enable inte-
to call on, including high performance C and gration into other biological information sys-
Ctt compilers, rich scripting languages (Perl, tems. Some progress is now being made to-
Python, etc.), and efficient, easily accessible wards consensus in gene naming through the
operating systems (particularly Linux) that work of the HUGO Gene Nomenclature Com-
make porting software to different hardware mittee (see http://www.gene.ucl.ac.uk/nomen-
platforms less of an issue than it was. clature/). Many researchers now use this sys-
Of equal importance to the medicinal tem as a source of unique gene names and
chemist, to whom this review is principally di- descriptions in the published literature (14)
rected, is the impact of bioinformatics on the and in commercial products (e.g., see http://
discovery of new medicines. Rather than ex- www.biowisdom.com/). Standardizing vocab-
plain comprehensively all the popular tools ulary expressing the relationships between
and their underlying algorithms, this review the complex network of gene functions is the
focuses on the points in the discovery research work of the Gene Ontology (GO) project (see
process where bioinformatics is making an im- http://www.geneontology.org/).
4 BlOlNFORMATlCS AND TARGET say development and screening process. Tar-

DISCOVERY get identification makes use of sequence data
for functional assignment by inference from
The desire to find new drug targets is similarity with known sequences (Fig. 8.2). It
grounded in the need of pharmaceutical com- also benefits from assessments of differential
panies to address the requirements of differ- levels of expression in different cellular con-
ent disease markets. The literature is full of texts and at various stages in the expression
papers detailing sequence determinations of process (transcriptome or proteome, see Fig.
newly cloned receptors and enzymes along 8.3 for definitions). Selectivity panel selection
with research on their functional properties. relies on a thorough mining of the related gene
The realization that the sequences of genes family and may benefit from phylogenomic
could be acquired relatively cheaply through analysis (see Section 5.3). Use of bioinformat-
the use of automated sequencing machines us- ics for integration of data capitalizes on the
ing fluorescent base technology, rather than generation of relationship information be-
the previous generation of radioactive se- tween known genes and the ability to use hy-
quencing gels, meant that sequence data be- perlinking to create navigational tools and us-
gan to flood the public DNA sequence data- able interfaces.
bases. The growth of these databases is A further development, the production of
reviewed in Fig. 8.1. millions of short expressed sequence tags
Because the translation of DNA coding se- (ESTs),encouraged the focus on target discov-
quence to protein sequence is straightforward, ery during the 1990s. The development of EST
given the understanding of the genetic code, it technology itself spawned the genesis of sev-
is a trivial task to implement software to pro- eral new genomics companies, including Hu-
vide translations of open reading frames man Genome Sciences and Incyte, which have
(ORFsl) to be housed in the annotation sec- worked in collaboration with the pharmaceu-
tions of the DNA databases. Consequently, tical industry to hunt for new disease related
protein sequence databases have become targets.
swamped with hypothetical proteins-those
4.1 Functional Genomics and Target
proteins assumed to exist because open read-
Discovery
ing frames have been discovered and from
which hypothetical protein sequences had The collation of gene sequence data, froh
been computed. The function of these se- whatever source, is in effect simply a matter of
quences has been assigned through a compar- transferring data from one place to another;
ison of their amino acid residue similarity with for example, from a sequence chromatogram
that of known sequences (for example, those to a computer database. Learning the se-
that have had their biochemical function dem- quence of a genome, or any of its constituent
onstrated through heterologous expression). parts, is a long way from understanding its
In this way, very large numbers of sequences biological function. The sequencing of ge-
have been processed into the databases and nomes has resulted in a technical genome de-
annotated using such sequence comparison scription (at a particular level of detail)
techniques. through the process of cataloguing an organ-
When it comes to the practical details of ism's genes. This level of detail is often called
how bioinformatics can speed the process of the physical map of the genome. There are
drug discovery, it is reasonable to ask what other ways of mapping the genome that pro-
sorts of data could be valuable in that process. vide different levels (or we may think of it as
The stages of the drug discovery process resolution) of genomic detail; genetic maps in-
where bioinformatics makes an impact are dicating the location of genes for specific traits
target identification, assay selectivity panel have been known for some time, while single
selection, and integration throughout the as- nucleotide polymorphism (SNP) maps can be
used to highlight the positions of differences
' ORFs are contiguous strings of residues, uninter- between populations through study of genetic
rupted by the genetic code's "stop" signal. polymorphisms (15). Indeed, the identification
mnatics and Target Discovery 339
Dec-92 Dec-93 Dec-94 Dec-95 Dec-96 Dec-97 Dec-98 Dec-99 Dec-00

Year
Dec-92 Dec-93 Dec-94 Dec-95 Dec-96 Dec-97 Dec-98 Dec-99 Dec-00

Year
ure 8.1. Bar charts indicating the growth of GenBank from December 1992 to August 2001 in
ns of (a) bases and (b) sequence entries. The release files indicate no release in February 1999. It
vident from the trends in both charts that while there has been explosive growth, particularly
n December 1999 until about August 2000, growth is slowing. The base entry curve is showing a
inctly sigmoid shape.
Figure 8.2. A schematic illustrating the bioinformatics process required to create an online gene
index by collating data and then integrating related elements to generate value added information
through hyperlinking to online resources. Determination of phylogenetic relationships is a relatively
late stage in the process. EST analysis is only performed after phylogenetic relationships have been
determined because EST data does not cover the whole expressed sequence and may not therefore
cover regions that were included in the phylogenetic determination. This is a knowledge generation
phase because it is allowing placement of potential new targets within the context of a carefully
researched phylogenetic tree. Transfer of knowledge is intimately related to the environment in
which the results of analysis are made available, in this case as an online resource.
of genes themselves from genomic sequence is tein level has been a crucial tool for assessing
itself a non-trivial matter, especially where the significance of specific classes of cells as
those genes are interrupted by non-coding re- targets. With the advent of fluorescence-based
gions (introns) and control regions (expres- sequencing techniques and automated se-
sion promoter sites). Functional genomics is quencing technology, it is now much quicker
the Process of creating an understanding of to generate sequence data on specific molecu-
the way genomes f~nctionthrough gene ex- lar targets than ever. Many researchers spend
pression. Genes are expressed by a variety of entire careers working on one target type or a
mechanisms, not all of which are fully under- restricted part of a target gene family. hi^
stood. We can, however, make some measure- approach has yielded many valuable targets
ments of the results of gene expression at the for drug discovery. With the new technologies
transcript level, mRNA, and at the protein of molecular biology, it is now possible to sur-
level. Several of the techniques that have been in a variety of contexts; perhaps
used assist drug target are pre- ey targets
vwithin different types of cells, cells treated
sented in the following sections. with different agents, or even across entire ge-
nomes using chip technologies.
4.2 Expression Profiling for Target Discovery
There are issues of interpretation of exper-
A
Bioinformatics spans analysis in depth on imental design and results. Does mRNA ex-
small quantities of data through to expansive pression mean anything at a quantitative
genomic scale analyses, which may be at a level? Perhaps even a qualitative view of
lesser level of detail. Historically, the expres- mRNA expression can be misleading. How is
sion of genes at the mRNA level or at the pro- mRNA expression correlated with protein ex-
5' lntron lntron 3'

5' UTR 4 Exon Exon Exon J. 3' UTR
Sense strand
genomic DNA L/
Transcription
Genome
J.
5' UTR Coding Sequence 3' UTR
mRNA Transcriptome
Translation
Protein Proteome
Figure 8.3. A schematic illustration of the relationships between levels of genomic information.
Genomic DNA is contained in the nucleus of eukaryotic cells. In many species, including humans,
information required to make up the coding sequence of a gene is split into exons (regions that are
expressed) interrupted by introns (regions that are not expressed and are edited out of the message
at the transcription step). At either end of the gene sequence are untranslated regions (UTRs). 5' and
3' refer to the orientation of the strand of DNA as defined by the sugar-phosphate backbone. The
mRNA is the messenger RNA molecule generated by the process of transcription, which is itself
mediated by a number of enzymes. The collection of mRNA transcripts that make up the mRNA
expression profile of a cell is known as the transcriptome, although the term could also refer to the
total possible mRNA transcripts achievable from a genome. Finally, translation of the mRNA occurs
on the ribosome and protein sequence is produced, which folds into its final three-dimensional
shape-a process that may be assisted by a number of different chaperone proteins. Any post-
translational modifications are all part of the proteome, the collection of proteins that represent the
expressed genome.
pression? In general, most drugs we discover contain 1-3 million clones. The library itself is
are likely to interact with proteins and not created from mRNA extracted from tissue or
mRNA, so some understanding of protein ex- refined cell populations. By making a random
pression is an essential adjunct to our genomic selection of several thousand clones from a
knowledge. Hence the proteomics approaches cDNA library it is possible by sequencing
described later. Our exploration of expression ESTs to generate a rapid, if somewhat low res-
profiling begins with a study of mRNA tran- olution, survey of the types of genes repre-
script profiling using expressed sequence tags sented by the library. The library in turn re-
because this technique has led to rapid gene flects the composition of genes that are
discovery that has, in turn, been able to assist expressed in the tissue or cell line from which
with the annotation of genomic sequences. it was constructed. Thus, we have a qualita-
Then, we consider how whole genome expres- tive link between gene expression, at the
sion profiles can provide a rich new source of mRNA level, and the sequence level analysis
data for bioinformatics analysis. required for target identification, without the
need to go through the full sequencing and
4.2.1 EST Profiling. An EST is a short, sin- validation process across the whole length of
gle sequence run collecting data over about each clone. This is a very significant time and
200-400 bases from a clone selected from a cost saving. One of the major issues of EST
cDNA library. Typically, cDNA clone libraries profiling has been the significance that can be
ascribed to expression levels through counting development. It can also be seen that this pro-
copies of ESTs. This issue is dealt with in some cess is much lengthier than taking a single
detail in Section 4.2.4. sequence read (a single oligonucleotide string)
without correcting errors or considering cov-
4.2.2 Sequence Assembly for cDNA Clon- erage of the complete gene sequence.
ing. To appreciate fully the speed advantage
of sequencing tags, rather than fully validat- 4.2.3 Comparing ESTs with Databases.
ing the sequence of an entire clone of a gene, it Bioinformatics provides the tools necessary to
is useful to step through a brief description of compare each EST with the databases of
the cloning process from the point of view of a known genes and a hypothetical functional as-
practitioner of bioinformatics. signment may be made to a proportion (typi-
Sequence assembly is the process of dealing cally 40-50%) of all the ESTs from a sequenc-
with the bioinformatics of cloning and genomic ing run.
sequencing (16, 17). When a gene is cloned, it In this way a rich resource of tags for many
is selected from a set of potential clones in a clones from many diverse libraries has been
cDNA library. The gene is present as a piece of built up in the public domain and in commer-
cDNA inserted into a cloning vector (a piece of cially available, proprietary databases. One
circular DNA) that has been designed for the particular approach that generated much in-
purpose of cloning. It is necessary to check terest in the 1990s was that advocated by In-
that the cDNA indeed represents the sequence cyte. Here, the simple identification of a gene
of the gene that has been cloned. To do this, expressed through identification of its EST
DNA oligonucleotides are designed that will was not the primary goal. Instead, the ap-
bind in a complementary fashion (hybridize) proach was based on comparative transcript
to the DNA of the cloning vector and also at expression (the so-called "digital Northern").
150- to 200-base intervals along the cDNA it- Here the number of copies of each EST iden-
self. These oligonucleotides are then extended tified was calculated, giving counts for the
by adding a base that is complementary to the numbers of each type of EST found in compar-
cDNA insert by using a DNA polymerase. Dif- ing normal with diseased tissue, for example.
ferent polymerasesare available commercially Subsequent techniques have focussed on more
that provide high fidelity reproduction of the controlled experiments in which specific cell
cDNA insert. In fluorescence-based sequenc- lines are treated with an agent and the expreH-
ing, a small proportion of the nucleotides sion of genes before administration is com-
available to the polymerase are fluorescent an- pared with the profile afterward. This is the
alogues. Incorporation of one of these into the basis of pharmacogenomics (18).
oligonucleotide terminates the extension, re-
sulting in a population of oligonucleotides of 4.2.4 Statistics for Assessing Expression
different lengths. These are separated by elec- Level Significance. There are issues with ap-
trophoresis and the sequence determined. proaches based on counting the number of
When the sequence of each oligonucleotide copies of an EST observed in the output from a
has been determined, the strings of letters sequencing machine. First, the tissue or cell
that represent the bases are assembled to- line must be of very high quality and the
gether to generate a full-length sequence of mRNA harvested in a timely manner because
the cDNA that has actually been cloned. Er- it degrades very quickly. Second, the process
rors in the base sequence can be resolved at
+
of preparing the cDNA library should enable
this stage, and if necessary, mutagenesis ex- the numbers of clones to be estimated as accu-
periments designed to correct any mistakes. rately as possible. Third, the random sampling
The bioinformatics process is intimately for the sequencing runs must be controlled
linked with the molecular biology techniques carefully so as not to introduce bias into the
of cloning and sequencing. For target discov- experiment. The mathematical model for eval-
ery, a very high degree of confidence in the uating the meaning of data from such experi-
sequence of the cDNA clone is required before ments is not well worked out.
the clone can be expressed and used in assay Comparison of the differences between the
Table 8.2 Comparative EST Counts for Five Genes Sequenced from Normal Prostate, Stage
B2 Cancer, Stage C Cancer, and Benign Prostatic Hyperplasia (BPH) cDNA Libraries
Normal Stage B2 Cancer Stage C Cancer BPH
Prostate All Other
Gene Total Tags P Tags P Tags P Tissue
PSA 13 7 0.7-0.8 14 0.6-0.7 22 0.8-0.9 0
PAP 4 1 0.1-0.2 34 B0.999 9 0.7-0.8 1
HGK 1 7 >0.999 6 0.97-0.98 5 0.8-0.9 0
PSI 0 3 0.993-0.994 7 0.997-0.998 1 0.4-0.5 0
PS2 0 2 0.97-0.98 7 0.997-0.998 0 0-<0.1 0
Total clones 4500 1400 3400 4800 732,000
The tag counts are from Ref. 21. The P values are calculated according to Equation 8.1, modified for use with different
total EST counts from the source libraries. The web URL http://igs-server.enrs-mrs.fr/-audic/e~-bid~nflat.pl was used to
calculate the probability intervals. A P value nearer to 1 indicates that the differential expression is likely to be significant.
While prostate specific antigen (PSA) and glandular kallikrein (HGK) have been proposed as prostate cancer markers, both
PSI and PS2 are prostate specific. Thus, the down-regulation of PAP in stage B2 cancer is not significant using this test,
whereas, the test shows its up-regulation in the BPH sample to be more significant. So, for lower changes in copy number,
where more sensitivity is expected, this test of significance is a valuable tool.
overall profiles obtained from tag counting ex- ues for the probability of certain genes ex-
periments could be performed using the tradi- pressed at different levels in normal prostate,
tional 2 test. However, this is the wrong ap- stage B2 cancer, stage C cancer, and tissue
proach for experiments where the significance from a benign prostatic hyperplasia (BPH)
of differences between expression levels (i.e., sample as shown in Table 8.2.
tag counts) of individual genes is to be deter- The relationship between gene expression,
mined, for example, in diseased and normal mRNA level, and protein expression is com-
tissue states (19). One of the issues in per- plex and not one that can be gleaned from col-
forming tag-sampling experiments is that the lecting copy number information in this type
experiments themselves are usually not repli-
cated. Thus, the dispersion of results cannot
of experiment. Even with careful statistical .
analysis, such as that described above, the as-
be used to estimate the SEs associated with sumption that increases or decreases in copy
each expression measurement. This elimi- number reflect real biologically significant
nates the possibility of using standard tests of events relies on the confidence with which we
variance. Instead the Poisson distribution, can compare a library made from one set of
which includes an implicit estimate of stan- cells to a library made from a different set of
dard error, approximates random sampling of cells. Thus, most transcript analysis experi-
tags very well. Audic and Claverie (20) have ments setting out to be quantitative end up
proposed a significance test (see Equation 8.1) simply as target identification exercises. A ma-
in which the sample size plays no part, so long jor goal of proteomics is to generate a factory-
as it is the same for both experiments, but only type approach to profiling protein level expres-
depends on the observed tag counts of the sion that more closely reflects the biological
same gene from diseased, g,, and normal, g,, reality. The EST approach has been turned
states: into an industrial scale process but has not
been able to impact the drug discovery process
significantly because of the biological lirnita-
tions described and the lack of sound mathe-
matical modeling of the whole process.
The equation has also been extended to cover Expression experiments are measures of
the more practical case of different total num- cell population averages, not the contents of
bers of tags. Thus, taking some data from Fan- individual cells, so it is important to consider
non (21) as an example, we can calculate val- to what extent all cells in the candidate popu-
344 Bioinformatics: Its Role in Drug Discovery
Table 8.3 Brief Descriptions of Three Technologies for Genomic Scale Transcript Proilling
Expression
Profiling
Technology Brief Description Form of Data Generated
cDNA array chip Tens of thousands of cDNA clones of genes are placed Fluorescence intensities and
onto a glass slide in a grid formation. Hybridisation colours for each spot on
of molecular probes (RNA extracts) to the clones is the chip. The nature of
detected using a fluorescence system. By using two the clones on the chip is
sets of probes, labelled with differently coloured known.
fluorescent dyes, it is possible to assess expression
differences.
High-density Arrays of oligonucleotides are synthesised directly An image of the entire chip
oligonucleotide onto the glass chip using special chemistries and is processed using
arrays light sensitive masking. This generates arrays of specialised chip scanning
known sequences of fixed length. Probes are software.
hybridised to the arrays and computational
analysis is necessary to interpret the resulting
patterns.
Serial analysis A sequence-based approach to the identification of Sequence data for SAGE
of gene differentially expressed genes through comparative tags allows profiling of
expression analysis. Allows simultaneous analysis of sequences gene expression.
that derive from different cell populations or
tissues. This is not a chip-based method.
Identification of sequences relies on completeness
of public sequence databases and, therefore, can
only be used to analyse known genes.
lation are in the same state (22). Whereas ber of organisms, including Homo sapiens, the
work in single-celled organisms may be more flowering plant Arabidopsis thaliana, the sin-
straightforward to control, work in multi-cel- gle celled yeast Saccharomyces cerevisiae, and
lular organisms has the added complexity that a large number of bacteria. The analysis of the
expression measurements may involve contri- sequence data then becomes the issue. It is Go
butions from cells derived from a variety of trivial task even to locate the positions of all
tissues. Furthermore, when taking into con- the genes in the human genome. Genes for
sideration mRNA copy number, it should be which there are no homologs in the current
understood that absolute transcript abun- sequence databases will take some time to elu-
dance measurements do not completely mea- cidate. See Ref. 23 for a detailed analysis of
sure mRNA concentration. this topic and then Refs. 24 and 25 for detailed
Although there was initially some concern studies on the human genomic sequence.
that the use of ESTs was a shortcut to discov- The three basic technologies for generation
ery of genes for the purposes of patenting and of genome-wide expression information are
ring-fencing areas of research for profit, in cDNA microarrays, high-density oligonucleo-
fact, the substantial numbers of quality ESTs tide arrays ("GeneChipsV),and serial analysis
in the public domain have helped in the pin- of gene expression (SAGE) (22). These tech-
pointing of genes in genomic data and have nologies are outlined in Table 8.3.
contributed to the speed with which the hu- In terms of quantities of data, a single mi-
man genome sequence was completed. croarray experiment looking at 40,000 genes
from 10 different samples, under 20 different
4.2.5 Genome-Wide Expression Analysis. A conditions, produces at least 8,000,000 pieces
major step towards understanding how organ- of data (26). Chip technologies, though origi-
isms work is the determination of the com- nally expensive because of the costs of chip
plete sequence of all genes in the genome. This fabrication, are now being used to contribute
remarkable goal has been achieved for a num- data to public domain databases and are
. 5 Databases, Tools, and Applications
widely used in industrial applications. A re- 5.1 Databases

cent comparison of array databases available Data repositories of DNA sequence, protein
for local installation, public submission of data sequence, and higher-level resources of inte-
or public query, listed 13 different systems grated information pertaining to the relation-
from sources worldwide (27). One such repos-
ships between sequences (for example, pat-
itory, ArrayExpress, is now being funded by tern and -profile databases) are the core tools
the European Union at the EBI and is in the for performing a wide variety of bioinformat-
early stages of development (see http://www. ics analyses. Many of these databases are in
ebi.ac.uWarrayexpress). It is intended to be the public domain and are freely accessible
compliant with the microarray gene expres- through links available at a range of websites
sion database (MGED) standard (see http:// (including those listed in Table 8.1), although
www.mged.org/). copyright is claimed in the annotation sections
The process of gene expression by microar- of some databases. The January 2002 issue of
ray is shown below, based on (13). Nucleic Acids Research is a special annual da-
tabase issue. It contains 112 articles describ-
1. Construct array ing in some detail different databases in use in
2. Prepare biological samples for investiga- the field. These are a subset of the 339 data-
tion bases (up from 281 in the previous year) listed
3. Extract and label sample RNA and briefly described in the Molecular Biology
4. Hybridize samples to array Database Collection, which constitutes an ad-
ditional article in itself (28,29). The complete
5. Image the array list can also be found at http://www.nar.
6. Locate spots and evaluate fluorescent in- oupjournals.org. While the list was being pre-
tensities pared for the 2001 edition, 55 new databases
7. Construct gene expression matrix from were added to the previous total. In 2002,58
spot intensities additions were made. This rapid expansion in
8. Analyze gene expression matrix the number of databases available is indica-
tive of the recognition by the community of
the need for accessible, carefully designed da-
As with any biological experiment, the re- tabases to meet the needs of a wide diversity of'
sult of this process should be the accumulation research programs.
of knowledge concerning the biological pro- Much of the value of databases, assuming
cesses under study. Interestingly, the first five the provision of accurate sequence data, arises
steps are material handling processes, while from the quality of the annotation that is
the remaining steps only involve information available. This normally includes at least a
processing. brief description of the function of the se-
quence and essential references to the litera-
ture. Many databases include a lot more than
5 DATABASES, TOOLS, A N D this. In particular, SWISS-PROT (8)is viewed
APPLICATIONS as the most reliable source for annotation in-
formation. SWISS-PROT emerged in the
Because bioinformatics is about the manage- 1980s out of a need to have high quality, ro-
ment of information in the domain of biology, bust annotations for the protein sequences
databases play a significant part in acting as that made up its core content. However, the
repositories for a wide variety of different process of annotation is labor intensive and
types of data. The main focus of this section is not one that is easily automated. Although the
to give a flavor of the breadth of databases content of SWISS-PROT is well regarded, it
available and to highlight the role of the pri- lacks the completeness of the source DNA da-
mary sequence databases and the secondary tabases because of the necessary delay in in-
pattern (or family) databases in assisting with corporating newly annotated sequences. In-
protein functional assignment. deed, a team of annotators is employed at the
EBI solely to perform this task. A computa- search and selecting the appropriate search
tionally annotated supplement, TrEMBL (8), method, followed by insight and experience in
has been made available to make up for this assessing the meaning of the results of the
deficiency. Nevertheless, computer annota- search. A search query with a single previ-
tion still has some way to go before it comes ously known sequence is likely to return not
close to the level of competence of skilled hu- only the match with itself but also a host of
man annotators. This is an area of active re- other matches at varying levels of similarity
search (30). with the query sequence. This extra informa-
Nevertheless, with the rapid generation of tion can be very valuable in placing the query
sequence data from genome scale experiments sequence in the context of many closely re-
more effective means of characterizing pro- lated sequences that make up the family of
tein sequences and annotation are now re- genes to which the query belongs. More dis-
quired. The database has responded by im- tantly related sequence matches can poten-
proving labeling of annotation in both SWISS- tially indicate genes with similar function,
PROT and TrEMBL and by adding more even if the match is relatively short and of low
advanced and rigorous tagging of evidence for score.
functional statements that have been made The experienced analyst should be able to
(31). sort the significant matches from the uninter-
Whereas most patent sequences are avail- esting ones. Often, this type of experience is
able in the public domain for use in research difficult, if not impossible with current tech-
and for commercial exdoitation, there is a nologies, to capture in a computer program.
substantial body that are the subject of patent Rules that seem to work under some circum-
protection. It is often useful when conducting stances produce nonsensical results in others.
searches of sequence databases to be aware of As a result, many of the techniques used for
the sequences that are patented because this current sequence comparison engines are heu-
may imply certain restrictions on the use to ristic rather than strictly algorithmic, that is,
which these sequences can be put in a com- the rules that are implemented as part of the
mercial context. The commercial repository is process for returning significant hits from the
maintained by Derwent (Thomson Scientific), query database tend to produce the correct re-
which generates the Geneseq database of pat- sult but cannot be guaranteed to do so in all
ented sequences. This is a useful collection be- circumstances. For a fuller discussion of algo-
cause it contains a broad historical collection rithms and heuristics, albeit outside the con-
as well as more recent examples, although the text of bioinformatics, see Ref. 32.
terms for a commercial license to use the da- One of the key aspects of sequence compar-
tabase may be off-putting to some potential ison is the understanding of similarity when
users. There are also patent sections of Gen- applied to molecular sequences. There are es-
BanWEMBL DNA databases too. but these are sentially two ways of considering this: simple
of limited value because they contain only residue identity and residue substitution. In
more recent sequence data. this discussion, we consider the comparison of
two protein sequences, but the process is the
5.2 Sequence Comparison
same for comparison of DNA or RNA se-
When dealing with the output of most experi- quences. The alphabet used in the comparison
ments in target discovery the question "has is just different because it is 20 for protein
this gene been seen before?" arises. The an- sequences and 4 for DNA and RNA. By com-
swer is, at first sight, straightforward: Com- paring residues at the same position in each
pare the sequence obtained from the experi- sequence and counting up the number of iden-
mental output with all the known sequences tities we arrive at, a score that can be ex-
and print the result. pressed as a percentage match for the pair of
Sequence comparison makes up a major sequences. The alternative method compares
part of the work of the bioinformatics analyst. each pair of residues and looks up a score for
It demands skill in operating the tools; for ex- that pair in a substitution table or scoring ma-
ample, choosing the appropriate databases to trix. The summed score across the whole se-
5 Databases, Tools, and Applications
quence length can again be expressed as a per- FASTA (36). Such methods are readily imple-
centage match. The two sequences under mented on standard computer hardware and
comparison are, however, likely to be suffi- thus are accessible as Internet resources or as
ciently different that equivalent residue posi- local implementations on UNIX or Linux serv-
tions are not in register when the two se- ers.
quences are laid out, one on top of the other. In The most popular tool currently in use is
this situation, the sequences must be aligned BLAST (Basic Local Alignment Search Tool)
with each other so that equivalent residue po- (37) from the NCBI. BLAST is an example of a
sitions are in register to make the score mean- heuristic that attempts to optimize a specific
ingful. This may involve insertion of gaps into similarity measure. The most recent revisions
one or both sequences. The skill here is to cre- to the algorithm are gapped BLAST and PSI-
ate an alignment between the two sequences BLAST (38), with improved accuracy for PSI-
that reflects some biological reality; it is from BLAST using composition-based statistics
this biological reality that we derive the notion (39).
of equivalent residue positions. These posi- 5.3 Phylogenomics and Gene Family
tions can be deduced from manual manipula-
Databases
tion of the alignment on the basis of mutation
data or other functional information using a Determining protein function from genomic
suitable sequence editor (33), or perhaps from sequences is a central goal of bioinformatics
understanding the spatial layout of residues if (40),and to achieve this goal, comparing single
structural data is available. In each of these sequences against databases of DNA or pro-
cases, the resulting sequence alignment will tein sequences is a necessary bioinformatics
reflect the manner in which equivalent resi- skill. However, many such searches have al-
due positions have been determined-both ready been carried out, and the results are
methods have their place. A variety of meth- available to analyze at a higher level of ab-
ods have been developed for comparing pairs straction in the protein and gene family data-
of sequences, including the basic classical bases (9,10, 14,41-43). It is the relationships
methods of Needleman and Wunsch (34) and between sequences that form the basis of any
Smith and Waterman (35). gene family database. Many of the current da-
Extending these pairwise comparison tabases did not set out to become gene family
methods to database searching has been car- databases. However, application of the under-
ried out, and a plethora of hybrid methods and lying methodology for defining gene families
improvements have been made. The manner (whether based on blocks of conserved se-
in which significant alignments are reported quence alignment or on profiles representing
varies from implementation to implementa- entire sequences, or simple regular expres-
tion. Database searching by alignment in this sions) has resulted in a number of resources
way is computationally intensive and special- that are particularly valuable in placing drug
ized computer hardware is often used to gain discovery targets in their biological context.
speed increases. Because the comparison of The processes of evolution by natural selec-
pairs of sequence takes place in an exhaustive tion imply that species are related to each
manner, these types of database searching other in a tree-structured hierarchy; but more
methods are considered to be the most sensi- than this, the history of sequence relation-
tive. More modern methods of database ships during evolution is also significant. Or-
searching look for shorter matches spread ganisms are defined by their genes, and their
over the lengths of the query and database behavior is modified through environmental
sequences, and then extend these matches un- experience. The relationships between genes
til the score for the match falls below a thresh- within a single organism indicate that genes
old level. Lists of sequence matches returned and their protein products also fall into well-
are then aligned using a pairwise alignment defined families. Protein phylogenetic profil-
technique to provide a match and score over ing (40) and phylogenomic analysis (44) are
the whole length of the comparison sequences. methods that are valuable where functional
For an example of this type of approach, see assignment by sequence similarity alone is
Figure 8.4. An example of a

user interface to a phylogenomic-
oriented database (48). Relative
distances, following black line
paths, between nodes on the tree
bf phosphodiesterases indicate
the similarity level between
members of the family, based on
the regions of the sequences se-
lected for the phylogenetic analy-
sis. Links to aligned domains per-
mit the alignments themselves to
be explored. The order in which
the genes appear in the tree (the
branching order) gives an indica-
tion of the homology relationship
between members of the family.
See Section 5.3.
~roblematic.This is because phylogenomic a multiple sequence alignment (see Fig. 8.5).

analysis is based on understanding the process Conserved regions of un-gapped sequence
by which sequences have diverged from com- were chosen from this alignment to use as in-
mon ancestors rather than focusing on the se- put to a phylogenetic analysis method (46,47)
quence similarity itself, which is an evolution- and an evolutionary tree was eventually re-
ary endpoint. The approach is to determine constructed. This tree represents aview of the
the phylogenetic tree of a gene family and then relationships between genes in the phosphodi-
to overlay any known functions of the genes on esterase gene family: more closely related
the tree. The functions of uncharacterized genes are closer together in the diagram (i.e.,
genes are predicted by their phylogenetic po- they are connected by shorter paths); those
sitions relative to those of the previously char- further away are less closely related to each
acterized genes. Importantly, depending on other. Figure 8.4 is based on an entry in Tar-
the manner of their construction, the trees getBASE (48), which adopts the phylogenomic
may indicate similarity distances along con- paradigm. The relationships between mem-
necting branches but it is the order of branch- bers of gene families are used as the structure
ing that reflects evolutionary relatedness for an object-oriented database and associated
(otherwise known as homology). For an inter- user interface that provides a navigation tool
esting discussion of the correct use of the for a curated gene index.
terms homology and similarity see Ref. 45. There are other approaches to family data-
This approach is illustrated in the phospho- bases that rely more extensively on sequence
diesterase gene family tree presented in Fig. similarity to define classes of genes or pro-
8.4. The set of relationships has been deter- teins. For example, PROSITE (49) is a re-
mined by comparing not just two sequences source that uses regular expressions to define
with each other, or a database of sequences to patterns of residues that represent biologi-
one sequence (as in a sequence database cally significant sequence motifs. Recent ver-
search), but by comparing a set of phosphodi- sions have incorporated profiles, weight ma-
esterase sequences to each other in the form of trices that express the characteristics of a gene
6 The Bioinforrnatics Knowledge Model
251 300
pdela-human KLHYRWTMAL MEEFFLQGDK EAELGLP.FS PLCDRKSTM. VAQSQIGFID
pdelb-human LVHSRWTKAL MEEFFRQGDK EAELGLP.FS PLCDRTSTL. VAQSQIGFID
pde 1c-human DLHHRWTMSL LEEFFRQGDR EAELGLP.FS PLCDRKSTM. VAQSQVGFID
pde2a-human KTTRKIAELI YKEFFSQGDL E.KAMGNRPM EMMDREKA.Y IPELQISFME
pde3a-human ELHLQWTDGI VNEFYEQGDE EASLGLP.IS PFMDR.SAPQ LANLQESFIS
pde3b-human DLHLKWTEGI VNEFYEQGDE EANLGLP.IS PFMDR.SSPQ LAKLQESFIT
pde4a-human ELYRQWTDRI MAEFFQQGDR ERERGME.IS PMCDKHTAS. VEKSQVGFID
pde4b-human ELYRQWTDRI MEEFFQQGDK ERERGME.IS PMCDKHTAS. VEKSQVGFID
pde4c-human PLYRQWTDRI MAEFFQQGDR ERESGLD.IS PMCDKHTAS. VEKSQVGFID
pde4d-human QLYRQWTDRI MEEFFRQGDR ERERGME.IS PMCDKHNAS. VEKSQVGFID
pde5a-human PIQQRIAELV ATEFFDQGDR ERKELNIEPT DLMNREKKNK IPSMQVGFID
pde6a-human EVQSQVALLV AAEFWEQGDL ERTVLQQNPI PMMDRNKADE LPKLQVGFID
pde6b-human EVQSKVALLV AAEFWEQGDL ERTVLDQQPI PMMDRNKAAE LPKLQVGFID
pde6c-human EVQSQVALMV ANEFWEQGDL ERTVLQQQPI PMMDRNKRDE LPKLQVGFID
pde7a-human ELSKQWSEKV TEEFFHQGDI EKKYHLG.VS PLCDRHTES. IANIQIGFMT
pde7b-human EMSKQWSERV CEEFYRQGEL EQKFELE.IS PLCNQQKDS. IPSIQIGFMS
QYCIEWAARI SEEYFSQTDE EKQQGLPVVM PVFDRNTCS. IPKSQISFID
DLCIEWAGRI SEEYFAQTDE EKRQGLPV'VM PVFDRNTCS. IPKSQISFID
EVAEPWVDCL LEEYFMQSDR EKSEGLP.VA PFMDRDKVT. KATAQIGFIK
PVTKLTANDI YAEFWAEGD. EMKKLGIQPI PMMDRDKKDE VPQGQLGFYN
EISRQVAELV TSEFFEQGDR ERLELKLTPS AIFDRNRKDE LPRLQLEWID
Figure 8.5. Part of an alignment of catalytic domains of the human phosphodiesterase gene family.
Positions in the alignment where gaps have been introduced into a sequence to bring it into align-
ment with other sequences are indicated by "." characters.
family using all the sequence information be used to identify modules of functional se-
available. The principal value of this resource quence across different gene families.
is that it presents patterns for recognition of One of the issues in using different data-
gene families that are relatively simple to un- bases of gene family information is that defi-
derstand. The downside is that the use of such nitions of which genes belong to which gene
patterns can produce both true positive hits families can vary depending on the method
(members correctly predicted) and false posi- used. Apweiler et al. have undertaken a useful
tive hits (members incorrectly predicted). effort at rationalizing and integrating family
PROSITE lists true and false positives for database annotation a t the EBI in the Inter-
searches performed in the production of a re- Pro resource (52). The databases that make up
lease of the database, but it is as well to be the membership of the InterPro consortium
aware that when the patterns are used in iso- are PROSITE (49), PRINTS (91, Pfam (531,
lation, there is often a false positive hit rate ProDom (54), and SMART (55). InterProScan
that must be taken into account by reconciling is a tool that enables scanning of individual
the results of a pattern search with the results protein sequences against the InterPro mem-
of database annotations or other pattern rec- ber databases (56).
ognition methods.
The PRINTS system (9, 10, 50, 51) is an
approach based on an examination of core re- 6 THE BlOlNFORMATlCS KNOWLEDGE
gions of un-gapped sequence conservation MODEL
within a set of aligned sequences (multiple se-
quence alignment). The method rigorously Up to this point, we have discussed sources of
builds up fingerprints for a gene family data and means of manipulating and compar-
through use of an iterative database searching ing data elements (in terms of sequences,
technique allied to intelligently applied se- alignments, gene families, etc.), but the end
quence alignment. The fingerprints them- point of all this analytical process must be the
selves can then be used to diagnose new gene acquisition of knowledge. It is through in-
family members in novel sequence data or can creased understanding that sound decisions
Bioinforrnatics: Its Role in Drug Discovery
can be made in applying the results of bioin- ment only comes later when the observer ac-
formatics analyses to application areas, such tually reads and understands the article.
as drug discovery. So, in this section we con- Compare this with the act of photocopying a
sider the relationships between data, informa- research article, a process that does not in it-
tion and knowledge, which are frequently re- self add to understanding on the part of either
garded as poor relations to laboratory-based the photocopier or of the researcher. The ac-
experimental data acquisition. However, as quisition of knowledge implies an active rela-
drug discovery organizations, including large tionship between author and recipient of the
pharmaceutical and smaller biotechnical com- information. In this, intuitive sense, we know
panies, develop a significant history of assays, that the hierarchical view works to some ex-
screens, and leads, it is vital to have strong tent as a model of the way in which some
internal support for managing data flows, in- knowledge is acquired.
tegrating related data into information sys-
6.3 The Scientific View
tems, and transforming knowledge thus
gleaned into tangible benefits. The second view is the scientific one (58).
Here, we start with the piece of information
6.1 Data, Information, and Knowledge that we are trying to understand, perhaps a
According to the University of California at gene whose function we plan to determine. Ex-
Berkeley (571, it has taken 300,000 years for periments are designed and performed to de-
humankind to accumulate 12 exabytes of termine the characteristics of the function of
data.2It will take just 2.5 more years to create the gene; such experiments yield data that de-
the next 12 exabytes. (An exabyte is scribe aspects of the information. Knowledge
1,000,000,000,000,000,000 bytes or a billion comes from understanding and interpreting
gigabytes.) This is a truly unimaginable the results of the experiments. Again, knowl-
amount of data, equivalent to the data stored edge is accumulated as part of an active rela-
on a pile of floppy disks 24 miles high. It is the tionship between the data describing the in-
rate of accumulation of data that is the key formation and the investigator reviewing the
point of interest, however, and the fact that it data and drawing conclusions about the state
is accelerating. of the information. Gene function is itself a
It is crucial to distinguish between the complicated concept because the functions of
terms data, information, and knowledge so gene products can rarely be assessed in isola-
that we can think clearly about the goal of data tion, owing to the network of interactions in
accumulation in our own industry sector. which most genes are involved. A collection of
There are two views: a tiered hierarchical view sequence data, collected at the DNA or protein
and a more formally correct scientific view. level, describe the molecular structure of a
gene or its product at a primary level-it is
6.2 The Hierarchical View not, however, a complete description. There
are other biochemical factors to be considered;
In the hierarchical view, data is the bottom for example, proteins that assist in the folding
rung of a ladder leading to the accumulation of process to create an active three-dimensional
information that leads, ultimately, to an in- molecule, post-translational modifications,
crease in knowledge. Apply this hierarchical glycosylation, interactions with other mole-
principal to an everyday example of taking cules to generate a higher-level function, etc.
this article to the photocopier: the data repre-
sented by the article is the sequence of strokes 6.4 Data is Not Knowledge
and dots on the page that make up the page Simply increasing the amount of data in the
image. The page image is the information rep- genomic universe does not necessarily in-
resented by the article. The knowledge ele- crease the speed of knowledge acquisition. In
short, data is not knowledge. Knowledge itself
In fact, the referred study uses the word "informa- requires understanding and demands the ac-
tion." However, within the usage of this article tive participation of the one acquiring the
"data" is a more appropriate term. knowledge.
6 The Bioinformatics Knowledge Model
Most pharmaceutical organizations have in the results could be interpreted unambigu-

place the means for collating data from a wide ously, is a daunting one. Structural determi-
variety of sources. Genomic information is nation, by X-ray crystallography or nuclear
available freely in the public domain as well as magnetic resonance, has deepened our under-
in proprietary databases. Some successful standing of some biological processes immea-
companies have used the multiple subscrip- surably-particularly in the realm of certain
tion database model to generate revenue to proteases, DNA binding proteins, and some
create more data to return to their customers. other soluble enzymes. The majority of drug
In this model, data tends to be available on a targets are, however, membrane bound (for
non-exclusive basis but it is up to the licensee example, the plethora of ion channels and G-
to determine how best to interpret the data in protein-coupled receptors), making struc-
its own research environment. Some informa- tural determination to any degree of critical
tion linking is available in such models, more confidence impossible. Molecular modeling
especially those that use Internet portals as a can assist in this process and has been a valu-
user interface and results delivery mecha- able tool for many years in thinking through
nism. This has been a valuable means of ac- possibilities and providing a framework for in-
quiring data in gene and protein expression. terpreting other biochemical results. The fur-
The model is, however, showing signs of age. ther away we move from rigorously deter-
Pharmaceutical and biotechnology companies mined experimental data, however, the less
have needed to make substantial investments likely is a pharmaceutical company to embark
in technology and specialist skills (particu- on the commitment of expense to exploratory
larly bioinformatics) just to warehouse the studies in drug discovery. Finally, in consider-
data and make the analytical results available ing the suitability of a gene as a drug discovery
to drug discovery program scientists. Yet, all target, we must take into consideration the
this effort still remains the heartland of temporal nature of gene expression, an area of
genomics: target discovery groups have research that has not yet been adequately ad-
sprung up in the pharmaceutical and biotech- dressed in the analysis of genomic data.
nology companies to create a process by means The alternative approach for dealing with
of which targets can be gleaned from the the wealth of genomic data, in the context of
genomic morass. drug discovery, is to consider the genomic d a u
as a background landscape against which to
6.5 Drug Targets
pick out the currently known and validated
When considering drug targets, as opposed to drug targets. These targets fall into families
simply gene products, there are a host of char- whose relationships can be rigorously deter-
acteristics that must be taken into consider- mined at the sequence level. The related mem-
ation. For all the drugs that are currently bers of these families can then be assessed by
available on the commercial market, there are analogy to determine their appropriateness as
only about 500 drug targets on which they in- drug targets. These phylogenetic (gene family)
dividually act. We now understand the human relationships then form the basis of the struc-
genome to contain about 30,000 genes, which ture for a database and become both a tool for
give rise to a still incalculable number of pro- navigating and exploring the relationships
tein products (59). How many of these prod- themselves as well as a mechanism for inte-
ucts represent tractable drug targets? The gration of other types of drug discovery data-
gold standard for assessing whether a protein for example, high-throughput screening re-
is or is not a target is target validation. In this sults and structure activity relationships of
process, additional biochemical or molecular bound ligands. Thus, we can see that an inte-
genetic data are required to determine gration of information across the realms of
whether a protein is truly involved in a disease genomics, target discovery, screening and lead
state and thus whether it could be considered optimization becomes possible and an achiev-
to be a suitable target molecule for drug dis- able goal.
covery. The prospect of performing such tar- Target analysis belongs in the domain of
get validation experiments, always assuming knowledge for drug discovery. Such knowl-
edge is a representation of our understanding dimensional structure. The biological func-

of the function of a gene product in a disease tion of the protein in vivo is then deduced from
state. This functional understanding is dean understanding of the structure. In this par-
rived from the analysis of experimental data adigm, there is no limitation on the number of
linked to our experience of the functional in- structures that can be determined except the
formation, itself defined (if only partially) by ability to purify sufficient protein for crys-
the data that characterize that functional in- tallization trials. The usual caveats apply
formation. Setting such knowledge within the regarding the solution of structures of mem-
context of the genomic universe enhances our brane-bound proteins, for example, G-pro-
ability to select new targets for future valida- tein-coupled receptors, ion channels, certain
tion studies, either through molecular genetic classes of kinases, etc.
techniques (gene knock-out, anti-sense, etc.) Often, the most useful functional informa-
or through mechanistic validation using small tion is derived from the structures of protein-
molecule tools discovered as part of the assay ligand complexes because they reveal the na-
development and screening process. ture of the bound ligand and its location in the
There are attempts at capturing this type protein. In the case of enzymes, a catalytic
of knowledge within databases (known as mechanism can often be postulated taking
knowledge-bases). Much valuable research is into account the disposition of residues in the
going on in the allied areas of knowledge rep- active site pocket. While such structures have
resentation. At present, active human partici- traditionally been determined by design, the
pation is required in accumulating knowledge ligand is unknown in the structural genomics
and deriving ultimate benefit from it in the approach. Only in rare cases will a ligand be
form of new drugs that fulfill unmet human co-crystallized by serendipity from the cloning
needs in therapeutic situations. organism.
From the perspective of bioinformatics, it is
important to appreciate that structural deter-
7 STRUCTURAL CENOMlCS mination can only provide data that reflect the
biochemical or biophysical properties of the
Structural genomics is the process of deter- protein. The biological role in the cell or or-
mining the three-dimensional structures of an ganism is a complex of interactions including
organism's proteome (60). Predictions of pro- spatial and temporal dimensions. Sometimes
tein function can be attempted from knowl- information can be derived using other tech-
edge of the structure alone, or the additional niques-for example, cDNA expression analy-
information gained can be used to inform se- sis, two-dimensional gel electrophoresis, bio-
quence-based methods of functional predic- chemical assay, etc. Bioinformatics techniques
tion (61). building on other types of data can also assist
The traditional paradigm of classical struc- in providing biological context for the function
tural biology has been to select a protein based of a protein-for example, phylogenetic, fin-
on its known biological function, ascertain its gerprint or regular expression analyses, etc.
molecular structure, and use the data thus All of these techniques yield data that, when
gleaned to understand how its biological func- reviewed as a whole, can direct the course of
tion is carried out at the molecular level. To further experiments or influence experimen-
this end, more than 12,000 structures have tal design.
been determined, to varying degrees of resolu- We have seen that, purely using techniques
tion and confidence. Much has been learned, of sequence comparison, the function of about
as a result, of the complexity of protein struc- 40% of genes sequenced from genome projects
ture and the manner of interaction of proteins can be inferred from sequence identity or sim-
and their native ligands, or of proteins and ilarity measures or by motif comparison using
small molecule drugs. a variety of techniques. It is known that pro-
The essence of structural genomics is to teins exhibiting insignificant sequence simi-
start from a gene sequence, produce the func- larity often adopt similar tertiary structures,
tional protein, and then determine its three- which themselves have similar (or at least re-
7 Structural Cenomics
lated) molecular functions. In fact the variety A neural network was trained to predict pro-
of types of fold taken up by polypeptides is tease function with 86% accuracy in a test set.
thought to be quite limited [SCOP (62) and Neural networks are an example of a tech-
CATH (63)l. Discovering a useful relationship nique used in bioinformatics for generating a
between folding topology and sequence, which predictive program from a set of weights that
can be used to predict folding accurately is, can be applied in a learning tool. The tool is
however, not trivial. By comparing the struc- trained by using parameters that show dis-
tures newly determined from structural crimination between, in this case, proteases
genomics initiatives with structures already and non-proteases. In this example, 36 pro-
deposited in the Protein Data Bank, it may be teases were tested. Each protease in turn was
~ossibleto extend the inference of molecular used as a test example, the network being
function further than that achieved from se- trained using the remaining 35 proteases. In
quence comparisons alone. Once the molecu- 31 of 36 cases (86%),the network was able to
lar function has been characterized in this identify the remaining protease. By perform-
computational way, we may begin to postulate ing the same test on 258 counter-examples,
the cellular function of the protein under anal- 87% were correctly classified as non-pro-
ysis. teases.
7.1 Predicting Protein Function from 7.3 Fold Compatibility Methods

Structure
The ability to recognize the way in which a
Some consider the "Holy Grail" of computa- protein sequence is folded in three dimensions
tional biology to be the accurate prediction of a should enable us to model the interactions of
protein's function solely from knowledge of its specific side-chains in a manner that is simply
primary sequence. We have already briefly not possible when considering proteins en-
mentioned the role of structural information tirely at the sequence level. This notion has
in guiding and illuminating the process of mo- resulted in sequence threading algorithms
lecular sequence alignment (Section 5.2.1). that assess the level of compatibility of a se-
Structural data can be a truly effective means quence with a database of fold patterns (65,
of understanding spatial relationships be- 66). The principal downside to this approach is
tween amino acid side-chains, backbone donor that novel structural types cannot be pre-'
and acceptor groups, and the means of inter- dicted, because at least one example of each
action of natural and man-made drugs. For a fold type must be present in the fold pattern
thorough and insightful overview of these database. Structural genomics may be the
matters see Ref. 64-the entire volume is es- means whereby fold pattern databases can be
sential reading. populated with sufficient data to make them
Many methods have been proposed for cap- useful as predictive tools.
italizing on our understanding of protein
structure by creating algorithms that attempt 7.4 CASP and the State-of-the-Art
to predict function from structure, or place
Currently, methods of structure prediction
proteins in structural categories that may
from sequence perform poorly. The results of
have implications for functional analysis.
the biannual CASP experiment (Critical As-
sessment of techniques for protein Structure
7.2 Neural Networks and Protease Function
Prediction see http://predictioncenter.llnl.
Stawiski et al. (60) recently performed a study god) are equivocal to say the least. A recent
of the unique structural features of proteases report on the improvements of aligning target
in which the authors noted consistent struc- sequences to a structural template (67) indi-
tural similarities among unrelated protease cates that over the last four CASP competi-
family members. They found that proteases tions there was no significant improvement in
tend to be more tightly packed than other pro- quality in this key step in the prediction pro-
teins and they tend to have fewer a helical cess. Alignment remains the major source of
regions and more residues in loop structures. error in all models based on less than 30% se-
quence identity. The subjective impression is their targets rather than merely exploiting bi-
that structure prediction is getting better year ological assay systems as tools for drug discov-
after year. This analysis, however, seems to ery.
suggest there is some way to go before reliable
models can be generated for fold types not yet
available in the structural databases. 9 ACKNOWLEDGMENTS
I thank Jeremy Packer of the BioFocus Plc

8 T H E FUTURE Bioinformatics Group for casting a critical eye
over the manuscript before it went for review
Bioinformatics is a wide-ranging science that and Sue Scott for her proofreading skills.
has developed over the last 50 years, since the
discovery of the structure of DNA, a period REFERENCES
that has resulted in the sequencing of the en-
1. Millennium and Bayer Announce Genome-de-
tire genomic material of major species. Tech- rived Oncology Drug Candidate Selected For
niques of sequence comparison, database First Human Studies, available online at http://
management, design, and curation have re- www.mlnm.com/news/2001/01-10--0.htm1, ac-
sulted in a healthy base on which to build cessed on September 12,2002.
more automated systems. It is this author's 2. J. Owens, A. Hinde, B. Ramster, and R. N . Law-
view that experienced sequence analysts will rence, Drug Discov. Today, 6,229-230(2001).
always have a place in this process, guiding the 3. B. A. Kenny, M . Bushfield, D. J. Parry-Smith, S.
design of new algorithms and better knowl- Fogarty, and J. M. Treherne, Prog. Drug Res.,
edge bases. Only in this way will the true syn- 51,245-269(1998).
ergy between analyst and developer be real- 4. T . K. Attwood and D. J. Parry-Smith, Introduc-
ized and contribute to the understanding of tion to Bioinformatics, Addison Wesley Long-
the fruits of genomic research. man, Harlow, U K , 1999.
Bioinforrnatics, allied to drug discovery, is 5. P. Baldi and S. Brunak, Bioinformatics: The
used for discovering new potential drug tar- Machine Learning Approach, 2nd ed. MIT
gets through the use of standard bioinformat- Press, London, 2000.
ics techniques in assigning function to novel 6. M . S. Waterman, Introduction to Computationql
gene products at the sequence Ievel and by the Biology. Chapman & Hall, London, 1995.
informed use of structural, mutational and 7. T . Etzold and P. Argos, Comput. Appl. Biosci., 9,
biochemical data-reflected in sequence level 49-57(1993).
alignment models. Assessment of expression 8. A. Bairoch and R. Apweiler, Nucleic Acids Res.,
levels of genes and the statistical relevance of 28,45-48(2000).
differences in levels of expression at the 9. T. K. Attwood, M . D. Croning, D. R. Flower,
mRNA level has contributed to drug discovery A. P. Lewis, J . E. Mabey, P. Scordis, J. N . Selley,
programs in pharmaceutical and biotechnol- and W . Wright, Nucleic Acids Res., 28,225-227
ogy companies globally. In many respects, it is (2000).
still early days for seeing the fruits of this 10. T.K.Attwood, D. R. Flower, A. P. Lewis, J. E.
work in the products offered on the market by Mabey, S. R. Morgan, P. Scordis, J. N . Selley,
these companies. There should, however, be and W . Wright, Nucleic Acids Res., 27,220-225
many clinical candidates on trial in which (1999).
bioinformatics has contributed, albeit at a 11. T . Hubbard, D. Barker, E. Birney, G. Cameron,
level of detail that is frequently far below the Y. Chen, L. Clark, T . Cox, J. C u f f ,V . Cunven, T .
level of interest of industry publicists. The fact Down, R. Durbin, E. Eyras, J . Gilbert, M. Ham-
is that bioinformatics is now engrained in the mond, L. Huminiecki, A. Kasprzyk, H. Leh-
vaslaiho, P. Lijnzaad, C. Melsopp, E. Mongin, R.
discovery process for new drugs. The next Pettett, M . Pocock, S. Potter, A. Rust, E.
stage in its development will be integration Schmidt, S. Searle, G. Slater, J . Smith, W .
between chemoinformatics (chemical infor- Spooner, A. Stabenau, J. Stalker, E. Stupka, A.
matics) and bioinformatics, driven by the need Ureta-Vidal, I. Vastrik, and M. Clamp, Nucleic
to understand the ways drug interact with Acids Res., 30,38-41(2002).
ferences
. M. Gerstein, Nut. Struct. Biol., 7 Suppl, 960- 38. S. F. Altschul, T. L. Madden, A. A. Schaffer, J.
963 (2000). Zhang, Z. Zhang, W. Miller, and D. J. Lipman,
. A. Brazma, Bioinformatics, 17, 113-114 (2001). Nucleic Acids Res., 25, 3389-3402 (1997).
14. J. Packer, E. Conley, N. Castle, D. Wray, C. Jan- 39. A. A. Schaffer, L. Aravind, T. L. Madden, S. Sha-
uary, and L. Patmore, Trends Pharmacol. Sci., virin, J. L. Spouge, Y. I. Wolf, E. V. Koonin, and
21,327-329 (2000). S. F. Altschul, Nucleic Acids Res., 29, 2994-
3005 (2001).
15. B. Destenaves and F. Thomas, Curr. Opin.
Chem. Biol., 4,440-444 (2000). 40. M. Pellegrini, E. M. Marcotte, M. J. Thompson,
D. Eisenberg, and T. 0. Yeates, Proc. Natl.
16. R. Staden, K. F. Bed, and J . K. Bonfield, Meth-
Acad. Sci. USA, 96,4285-4288 (1999).
ods Mol. Biol., 132, 115-130 (2000).
41. E. L. Sonnhammer, S. R. Eddy, E. Birney, A.
17. R. Staden, D. P. Judge, and J. K. Bonfield,Meth-
Bateman, and R. Durbin, Nucleic Acids Res.,
ods Biochem. Anal., 43,303-322 (2001).
26,320-322 (1998).
18. D. S. Bailey, A. Bondar, and L. M. Furness,
42. E. L. Sonnhammer, S. R. Eddy, and R. Durbin,
Curr. Opin. Biotechnol., 9,595-601 (1998).
Proteins, 28, 405-420 (1997).
19. J. M. Claverie, Hum. Mol. Genet., 8, 1821-1832
43. J. G. Henikoff, E. A. Greene, S. Pietrokovski,
(1999).
and S. Henikoff, Nucleic Acids Res., 28, 228-
20. S. Audic and J. M. Claverie, Genome Res., 7, 230 (2000).
986-995 (1997).
44. J. A. Eisen, Genome Res., 8, 163-167 (1998).
21. M. R. Fannon, Trends Biotechnol., 14,294-298
45. G. Theissen, Nature, 415, 741 (2002).
(1996).
46. J. Felsenstein, Annu. Rev. Genet., 22, 521-565
22. M. Gerstein and R. Jansen, Curr. Opin. Struct.
(1988).
Biol., 10, 574-584 (2000).
47. J. Felsenstein, Methods Enzymol., 266, 418-
23. F. Sterky and J. Lundeberg, J. Biotechnol., 76,
427 (1996).
1-31 (2000).
48. J. Packer and D. J. Parry-Smith, Curr. Drug
24. Science, 291,1145-1434 (2001).
Discov., March,29-33 (2002).
25. Nature, 409,745-964 (2001).
49. L. Falquet, M. Pagni, P. Bucher, N. Hulo, C. J.
26. A. Brazma, A. Robinson, G. Cameron, and M. Sigrist, K. Hofmann, and A. Bairoch, Nucleic
Ashburner, Nature, 403, 699-700 (2000). Acids Res., 30,235-238 (2002).
27. M. Gardiner-Garden and T. G. Littlejohn, Brief 50. W. Wright, P. Scordis, and T. K. Attwood, Bioin- .
Bioinform., 2, 143-158 (2001). formatics, 15, 523-524 (1999).
28. A. D. Baxevanis, Nucleic Acids Res., 29, 1-10 51. T. K. Attwood, M. J. Blythe, D. R. Flower, A.
(2001). Gaulton, J. E. Mabey, N. Maudling, L. McGre-
29. A. D. Baxevanis, Nucleic Acids Res., 30, 1-12 gor, A. L. Mitchell, G. Moulton, K. Paine, and P.
(2002). Scordis, Nucleic Acids Res., 30,239-241 (2002).
30. A. G. Rust, E. Mongin, and E. Birney, Drug Dis- 52. R. Apweiler, T. K. Attwood, A. Bairoch, A. Bate-
cov. Today, 7, S70476 (2002). man, E. Birney, M. Biswas, P. Bucher, L.
31. R. Apweiler, Brief Bioinform., 2, 9-18 (2001). Cerutti, F. Corpet, M. D. Croning, R. Durbin, L.
Falquet, W. Fleischmann, J. Gouzy, H. Hermja-
32. W. D. Hillis, The Pattern on the Stone, Weiden-
kob, N. Hulo, I. Jonassen, D. Kahn, A. Kanapin,
feld & Nicolson, London, 1998, pp. 77-90.
Y. Karavidopoulou, R. Lopez, B. Marx, N. J.
33. D. J. Parry-Smith, A. W. Payne, A. D. Michie, Mulder, T. M. Oinn, M. Pagni, F. Servant, C. J.
and T. K. Attwood, Gene, 221, GC57-GC63 Sigrist, and E. M. Zdobnov, Bioinformatics, 16,
(1998). 1145-1150 (2000).
34. S. B. Needleman and C. D. Wunsch, J. Mol. 53. A. Bateman, E. Birney, L. Cerruti, R. Durbin, L.
Biol., 48,443-453 (1970). Etwiller, S. R. Eddy, S. Griffiths-Jones, K. L.
35. T. F. Smith and M. S. Waterman, J. Mol. Biol., Howe, M. Marshall, and E. L. Sonnhammer,
147,195-197 (1981). Nucleic Acids Res., 30,276-280 (2002).
36. W. R. Pearson, Methods Mol. Biol., 132, 185- 54. F. Corpet, F. Servant, J. Gouzy, and D. Kahn,
219 (2000). Nucleic Acids Res., 28, 267-269 (2000).
37. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, 55. J. Schultz, R. R. Copley, T. Doerks, C. P. Pont-
and D. J. Lipman, J. Mol. Biol., 215, 403-410 ing, and P. Bork, Nucleic Acids Res., 28, 231-
(1990). 234 (2000).
56. E. Zdobnov and R. Apweiler, Bioinformatics, 62. C. L. Lo, B. Ailey, T. J. Hubbard, S. E. Brenner,
17,847-848 (2001). A. G. Murzin, and C. Chothia, Nucleic Acids
57. P. Lyman and H. R. Varian, How much infor- Res., 28, 257-259 (2000).
mation?, available online a t http://www.sims. 63. C. A. Orengo, F. M. Pearl, J. E. Bray, A. E. Todd,
berkeley .edu/research/projects/how-much-info, A. C. Martin, C. L. Lo, J. M. Thornton, Nucleic
accessed on September 12, 2002. Acids Res., 27,275-279 (1999).
58. D. E. Knuth, Selected Papers on Computer Sci- 64. M. Perutz, Protein Structure: New Approaches
ence, Cambridge University Press, Cambridge, to Disease and Therapy, Freeman, New York,
UK, 1996. 1992, pp. 119-137.
59. J. M. Claverie, Science, 291, 1255-1257 (2001). 65. R. T. Miller, D. T. Jones, and J. M. Thornton,
60. E. W. Stawiski, A. E. Baucom, S. C. Lohr, and FASEB J.,10,171-178 (1996).
L. M. Gregoret, Proc. Natl. Acad. Sci. USA, 97, 66. D. T. Jones, M. Tress, K. Bryson, and C. Hadley,
3954-3958 (2000). Proteins, 37, 104-111 (1999).
61. M. Weir, M. Swindells, and J. Overington, 67. C. Venclovas, A. Zemla, K. Fidelis, and J. Moult,
Trends Biotechnol., 19, 61-66 (2001). Proteins, 45 ( Supp151, 163-170 (2001).
CHAPTER NINE
Chemical Information
Computing Systems in
Drug Discovery
DOUGLAS R. HENRY
PdDL Information Systems, Inc.
San Leandro, California
Contents
1 Introduction, 358
1.1Motivation for Chemical Information
Management, 358
1.2 Literature, References, Societies,
and Research Groups, 359
1.3 Brief History of Chemical Data Management,
360
1.3.1 Pre-1980-Flat File Storage of
Chemical Structures, 360
1.3.2 The 1980s--Flat Database Storage, 362
1.3.3 The 1990s-Relational Data Storage,
.
363
1.3.4 The 2000s, 363
2 Chemical Representation, 363
2.1 Types of Chemical Entities, 363
2.1.1 Sequences, 363
2.1.2 2D Structures, 364
2.1.3 Reactions, 366
2.1.4 3D Models, 366
2.1.5 Mixtures, 367
2.1.6 Generic Structures, 368
2.1.7 Substances, 368
2.1.8 Search Queries, 368
2.2 Types of Chemical Representation, 368
2.2.1 Linear Notation, 368
2.2.2 Tabular Storage, 369
2.2.3 Graphical Representation, 371
2.2.4 Markup Languages, 371
2.3 Chemical Structure File Conversion, 372
2.4 Representing Nonstructural Chemical Data,
373
Burger's Medicinal Chemistry and Drug Discovery 3 Storing and Searching Chemical Structures and
Sixth Edition, Volume 1: Drug Discovery Reactions, 373
Edited by Donald J. Abraham 3.1 Storing Chemical Information in Databases,
ISBN 0-471-27090-3 O 2003 John Wiley & Sons, Inc. 373
358 Chemical Information Computing Systems in Drug Discovery
3.2 Registering Chemical Information, 377 3.6 Sequence and 3D Structure Databases, 387
3.2.1 Extract the Data, 377 3.7 In-House Proprietary and Academic
3.2.2 Cleaning and Transforming the Data, Database Systems, 387
378 4 Chemical Property Estimation Systems, 388
3.2.3Loading the Data, 378 4.1 Topological Descriptors, 388
3.3 Searching Chemical Structures 4.2 Physicochemical Descriptors, 389
and Reactions, 379 4.3 Absorption, Distribution, Metabolism, and
3.3.1 Exact Match Searching, 379 Excretion Properties, 389
3.3.2Substructure Searching, 381 4.4 Property Calculations Online, 390
3.3.3Similarity Searching, 382 5 Data Warehouses and Data Marts, 390
3.3.4 Reaction Searching, 383 5.1 Data Warehouses of Chemical Information,
3.3.5 Searching Other Data, 383 390
3.4 Chemical Information Management Systems 5.2 Data Marts of Chemical Information, 391
and Databases, 384 6 Future Prospects, 393
3.5 Commercial Database Systems 7 Glossary of Terms, 397
for Drug-Sized Molecules, 384 8 Acknowledgments, 412
1 INTRODUCTION week. In the 1970s, it was estimated that 1 in

7000 compounds synthesized and tested
The term drug discovery once encompassed would eventually reach the market. That
only those activities that were traditionally number has risen over the years to about 1 in
practiced by synthetic chemists-the design, 10,000-a figure that holds true to this day,
synthesis, analysis, and testing of new chemi- despite advances in combinatorial and high-
cal entities. Until the 1980s, most drug discov- throughput chemistry, molecular modeling,
ery was conducted in a serial fashion. Thus, a structure-based drug design, diversity analy-
chemist working on a given project would de- sis, and quantitative structure-activity rela-
sign a series of structures, then synthesize tionships (QSAR). In real dollars, it almost
them one after another, in milligram quanti- certainly costs as much or more to bring a new
ties (large by today's standards), and finally drug to market than it did in the 1970s. A
send batches of the compounds for analysis
commonly quoted figure of $500 million per
and assay. Based on the assay results, the
marketable drug has been questioned by a
chemist would design new or modified sets of
Ralph Nader watchdog group, but the figure is
structures and repeat the cycle until a market-
able entity was obtained. This serial, iterative certainly in the hundreds of millions of dollars
procedure was adequate in a time when a few (1).To balance the many computational ad-
major drug companies were doing drug design, vances that have been made in the past 30
and the number of therapeutic targets was rel- years are factors of increased competition in
atively small. One consequence of this ap- the field, many more therapeutic targets, in-
proach was a much higher number of "me- creased regulation, and very importantly, the
too" drugs on the market than we see today, flood of information flowing from high-
likely because of the intensive time and re- throughput methods. The advent of high-
source that was devoted to each new chemical throughput combinatorial chemistry has in-
entity. creased the number of structures a chemist
can generate by 100- to 1000-fold, with a cor-
responding increase in the amount of data
1 .I Motivation for Chemical Information
that must be gathered, stored, and processed.
Management
To deal with the flood of information-
The serial approach to drug discovery is very chemical, biological, and clinical-it became
costly in time and resource. Figure 9.1 shows essential over the years to develop chemical
an idealized view of the drug discovery "fun- information computing systems (i.e., chemical
nel" in which a (very productive) hypothetical and reaction database systems) from which
chemist could produce 10-20 structures a the chemist and biologist could obtain up-to-
Traditional drug design-
new drug development costs
Numbers Stage Time
\ 10,000 / Synthesized 1-2 YR
1"-vitro activity 1-2 YR
$1,000,000'~ Clinical trials 3-6 YR - IND
$300,000,000/drug Market entity 0.5-2 YR - NDA
Total 6-12 YR
1 chemist = 10-20,000 compoundsllifetime= 1-2 drugs
Figure 9.1. Traditional "serial" drug design costs. The drug discovery "funnel" typically shows
about a 10-fold reduction at each stage in the process. A chemist who could produce 10-20 structures
per week would be lucky to discover a single marketable drug in a 20- to 30-year career.
date information about commercially avail- informatics (or chemoinformatics) has re-
able and in-house structures, reactions, and cently become common to describe the
data. This chapter briefly describes the history acquisition, management, and use of chemical
ofthese systems, the current state of chemical information.
information management as it applies to drug
1.2 Literature, References, Societies,
discovery, and a look at future developments
and Research Groups
in the field. The coverage is primarily aimed at
corporate applications of chemical informa- The literature of chemistry is vast, and chem-
tion management, as practiced in the pharma- ical information management occupies a small
ceutical industry. The expanding use of micro- corner of this domain. The chemical informa-
computers running Microsoft Windows or tion literature overlaps that of computer sci-
Linux operating systems means that many of ence, database management, molecular mod-
the programs and database systems now used eling, QSAR, and even mathematics. The
in industry can also be installed and applied in primary journals that publish chemical infor-
academic settings. Much of the innovation in mation articles are the American Chemical So-
chemical information management comes ciety's Journal of Chemical Information and
from academia, whereas most of the applica- Computer Sciences, Kluwer's Journal of Com-
tion has been seen in industry. This review is puter-Aided Molecular Design, and Elsevier's
limited to the management and storage of Journal of Molecular Graphics and Modeling.
chemical structure information in databases. Less frequently, chemical information articles
Other chapters deal with the generation of appear in Wiley's Journal of Computational
this information (molecular modeling, prop- Chemistry, Quantitative Structure-Activity Re-
erty calculation) and with the use of the infor- lationships, and Journal of Chemometrics, the
mation in drug discovery (library design, dock- ACS Journal of Medicinal Chemistry and the
ing and structure-based drug design, and Journal of Organic Chemistry, and Elsevier's
QSAR). By analogy with another rapidly ex- Analytica Chimica Acta, Computers and
panding field, bioinformatics, the term chem- Chemistry, and Chemometrics and Intelligent
Laboratory Systems. Other journals with Most chemical information research and
articles on chemical information include the development is conducted by commercial soft-
University of Bayreuth's Communications ware vendors and in-house at pharmaceutical
in Mathematical Chemistry (MATCH), Else- firms. A small number of academic research
vier's Drug Discovery Today, ACS's Modern groups study chemical information. The Com-
Drug Discovery, and a handful of newer peri- putational Information Systems group at the
odicals (2). University of Sheffield, under Peter Willett,
The history of chemical information man- has been very active in studying database
agement has recently been catalogued online searching (15). The Computer-Chemie-Cen-
by the Chemical Heritage Foundation (3). The trum at the University of Erlangen under Jo-
American Chemical Society has a Division of hann Gasteiger focuses on organic structure
Chemical Information (CINF), and divisional representation and reaction classification
symposia are held at national meetings of the (16). Numerous other academic groups are ac-
ACS, often in conjunction with other divisions tive in QSAR and modeling research, de-
including Medicinal Chemistry, Computers in scribed in other chapters in this series.
Chemistry, and Pesticide Chemistry. The In addition to the academic groups already
Skolnik Award is given annually by the ACS mentioned, a number of online resources deal
Division of Chemical Information for with chemical information management. Ex-
"achievement in the areas of computerized in- amples include the comprehensive CHEM-
formation systems, chemical information, INFO site at Indiana University (17), Cam-
chemical indexing and notation systems, no- bridge Health Institute's Cheminformatics
menclature, structure-activity relationships, Glossary (18), the Chemical Structure Associ-
and numerical data analysis and correlation." ation (19), the Computational Chemistry List
Herman Skolnik, who died in 1994, was the (CCL) (20), the Molecular Graphics and Mod-
first recipient. He founded the Journal of eling Society (21), the Open Molecule Founda-
Chemical Documentation, which became the tion (22), the QSAR and Modeling Society
Journal of Chemical Information and Com- (231, the Royal Society of Chemistry Chemical
puter Sciences, and he made many contribu- Information Group (24), and the UK QSAR
tions to the field (4). Besides the ACS, other and Cheminformatics Group (25).
national and international meetings on chem-
ical information include the Noordwijkerhout
.
1.3 Brief History of Chemical Data
Conference on Chemical Structures (5), the Management
Quantitative Structure-Activity Relationship
Gordon Conference (6), and the International The history of chemical information manage-
Conference on Chemical Information (7). ment parallels the history of computers. It can
Except for journal articles and some confer- be roughly viewed in terms of decades of de-
ence proceedings, very recent general books velopment (Fig. 9.2).
on chemical information management are
rather few in number. This is caused in part by 1.3.1 Pre-1980-Flat File Storage of Chem-
the rapid changes in a field so closely tied to ical Structures. Computers consisted of main-
computer hardware and software develop- frame machines (e.g., IBM 3090) and small
ment. Another reason for the paucity of texts minicomputers (Digital, Prime). Users con-
is that most chemical information manage- nected through low speed serial connections,
ment systems are commercially developed and using "dumb" terminals (no graphics capabil-
marketed, not widely used by universities, and ity) or monochrome vector graphics terminals
in many cases, they use trademarked or even such as Tektronix and Imlac. Chemical struc-
patented technology. Some texts of note in the tures were mainly stored as either (1)individ-
last decade include several by Collier (81,Mar- ual structure files, indexed by name, and han-
tin and Willett (9), and Warr and Suhr (lo), dled one or a few structures at a time or (2) in
one by Wiggins and Emry (111, and ones by a flat-file database accessed by record number
Maize11 (12) and Ash et al. (131, and a book on (26). A typical corporate database contained
chemical searching by Ridley (14). up to a few tens of thousands of structures.
1 htroductior 361
1970's - individual files of chemical structures
1990's - relational database framework (oracleTM, ~ i c r o s o faccess)

t~~
ID extreg Extreg Mwt formula Formula keys
2000's - chemical data marts and data warehouses - the "star" schema
"Fact"
table
Figure !8.2. Evolution of chemical information storage. The storage of chemical information has
typically lagged the development of database management systems, but it is catching up. In the
1970s, st ructures were stored in individual molecule files or large concatenated files. In the 1980s,
proprieta~ r databases
y of structures and reactions appeared, in which a single record contained all the
informat:ion for a given structure. In the 1990s, this information was distributed into tables in a
relationa1 database. In the 2000s, we see the application of the concepts of data warehousing and data
marts t hat consolidate information from a variety of sources for transactional andlor analytical
purposes
In-house chemical information management hensive study of the user acceptance of CAS
systems began to emerge at some of the larger ONLINE was published in 1988 (29).The first
chemical and pharmaceutical firms. These in- commercial chemical structure database sys-
cluded CONTRAST and SOCRATES at tems appeared in the late 1970s. These offered
Pfizer, SYNLIB at SmithKline, COUSIN at an in-house solution using a mainframe chem-
Upjohn, MSDRLICSIS at Merck, and CROSS- ical structure management system with a
BOW at ICI (27). The Chemical Abstracts da- graphical interface, which could be accessed
tabase was made available online in 1967 (28). by interactive graphics terminals. A standard
In 1980 this became CAS ONLINE. A compre- program in widespread use was the MACCS
Figure 9.3. MACCS-the Molecular ACCess System-an early structure indexing system. This
program originally used fixed menus for searching, registration, and reporting. Later versions al-
lowed users to customize the menus. The figure shows the result of a 3D pharmacophore search for
ACE inhibitors. Out of a database of 115,000 structures, 21 fit the 2D and 3D requirements of the ,
search query. The user could typically browse the "hits" from the search, save the list of structures to
a list file, and output the structures to a structure-data file (SDFile). The MACCS database was a
proprietary flat database system in which data of a given type, say, formula, was stored in a given file,
indexed by the compound ID number.
program (Fig. 9.3). Structures could be drawn, and workstations. Highly successful PC-base
registered, searched, and output to files. The "personal" chemical information systems ay
systems were only slightly customizable, and peared, which included chemical structur
the graphics terminals, which used vector dis- drawing and text processing programs (e.g
plays, were large and expensive. ChemDraw, ChemText) and personal chem
cal databases (e.g., ChemBase) (30). Custom
1.3.2 The 1980s-Flat Database Stor- zable mainframe systems appeared (31),as di
age. This was the era of minicomputers reaction indexing and searching systems (32
(Prime, Vax) and a period of immense growth Additional commercial chemical informatio
for chemical information, molecular model- vendors appeared including Daylight Chem
ing, and QSAR. In industry, chemical struc- cal Information Systems, Chemical Desig
ture databases consisted mainly of custom-de- Ltd., DARC-Questel, and Cambridge Scier
signed "flat" databases (where each record in tific Corporation. The Beilstein System cam
a given table refers to a given structure in the online in 1988 (33). In-house and commerciz
database-much like in a spreadsheet). Cli- database sizes were typically 100-200K struc
ent-server architectures appeared, and per- tures in size. The rapid and accurate convei
sonal computers replaced graphics terminals sion of two-dimensional (2D) structures t
2 Chemical Representation 363
three-dimensional (3D) models became possi- tional databases, to take maximal advantage
ble using the program CONCORD, introduced of the scale and performance of these systems.
by Pearlman in 1987 (34). This enabled the We see the increasing use of web-based clients,
introduction of 3D structural databases with also known as "thin" clients, because they
the ability to generate, store, and search 3D need little software other than a web browser.
molecular models on a large scale. These 3D Former single databases are turning into dis-
database systems included ALADDIN by tributed and replicated database systems, and
Daylight Chemical Information Systems, we see increasing use of data marts and data
UNITY3D by Tripos, CHEMDBS3D by Chem- warehouses, more fully integrated structure,
ical Design Ltd., and MACCS3D by MDL (35). reaction, data, and citation searching, and in-
creasingly "intelligent" database systems.
1.3.3 The 1990s-Relational Data Stor-
age. This period saw the decline of single-
2 CHEMICAL REPRESENTATION
computer mainframe chemical management
programs and the rise of server-based systems
Chemical structures and reactions can be rep-
and distributed computing. By far, the most
resented in many ways. At the most funda-
significant influences on chemical information
mental level, the parameters of the time-de-
management were the Internet, the introduc-
pendent Schriidinger equation-the atomic
tion of relational database technology, and the
and molecular orbitals-do a more or less
shift to high-throughput combinatorial chem-
complete job of characterizing a chemical com-
istry. In a relational database, information
pound. Storing and representing structures as
that formerly was kept in a single large table is
mathematical wave functions is obviously not
stored in numerous smaller tables, indexed by
suitable for thousands or millions of struc-
"keys." This is a much more flexible architec-
tures; nor is such a representation useful for
ture, and combining different fields from sev-
drug discovery, except perhaps to a molecular
eral tables into a "view" of the data gives the
modeler. Synthetic chemists still function in a
user the impression of a single large table, as
mostly 2D chemical structure space. Intuition,
before. At the end of the decade, chemical and
training, and experience allow a chemist to ex-
pharmaceutical firms could obtain chemical
trapolate from a flat representation with a few
structure, reaction, and 3D model databases
stereochemical hints-dashed and wedged
from a variety of vendors. These databases
bonds or Z/E double bonds-to a higher-di-
were even somewhat integrated with molecu-
mensional mental representation of a struc-
lar modeling, quantum mechanics, and dock-
ture. Chemical representation systems are a
ing programs, and to literature, spectra, and
compromise of several factors, including the
biological databases. The largest database of
needs of the chemist, the storage and perfor-
known chemical structures, the Chemical Ab-
mance characteristics of the chemical data-
stracts Registry, grew to about 20 million
base system, and the ultimate 3D reality of
structures, whereas a typical corporate inven-
chemical structures.
tory increased to between 100,000 and
1,&0,000 structures. A database of billions of
2.1 Types of Chemical Entities
virtual chemical structures was constructed
and made available for drug-design purposes There are several ways to look at chemical rep-
by Tripos, Inc. (36). resentation. One approach is to classify ac-
cording to the type of chemical data that is
1.3.4 The 2000s. Like the customization stored. The most basic types of chemical struc-
and distributed computing of the 1980s that ture data are shown in Fig. 9.4, including the
followed the introduction of mini-mainframe following.
systems, the 2000s are witnessing the cus-
tomization and further distribution of rela- 2.1.1 Sequences. For linear chemical sys-
tional and integrated database systems. tems, such as DNA, RNA, and proteins, the
Chemical structure-specific and reaction-spe- sequence of subunits (nucleotide bases or
cific search types can be integrated into rela- amino acids) provides most of the information
364 Chemical Information Computing Systems in Drug Discc
Types of chemical data

Sequences, names, linear notations - 1-dimensional
Structures, reactions - 2-dimensional information
3D models - 3-dimensional information
Figure 9.4. Basic types of 2D chemical structure data. The amount of information and the complex-
ity of searching increases with the dimensionality of the data.
about the structure. The deciphering of the tachment between building blocks differ
human genome and the exploding interest in simple sequence notation is not possible (
bioinformatics as a means of identifying new becomes more complex.
drug targets means there will be an increasing
growth in the use of sequence data. The use of 2.1.2 2D Structures. When the builtling
a sequence representation depends on a natu- blocks are unique or when dealing with the
ral "vocabulary" of fked building blocks. This large variety of ordinary chemical structures,
vocabulary consists of nucleotides in the case a 2D representation is used. In mathemat,ical
of nucleic acids and consists of the amino acids terms, this is a "graph" of the structure, wliich
in the case of proteins. If any of the building consists of a set of "nodes" (atoms) connec:ted
blocks are unique, or even if the bonding at- by "edges" (bonds). The important atom infor-
2 Chemical Representation
mation includes atom type (symbol or atomic The structure represents a single stereoiso-
number), its 2D coordinates, formal charge, mer among the possible ones. More than
valence state, atom stereochemistry, and iso- one collection of stereo centers may be
tope information. Note that atom stereochem- present in the structure.
istry can be local (i.e., relative) or it may follow Relative as a mixture of stereoisomers-an
Cahn-Ingold-Prelog (CIP) conventions. Local up or down bond represents the current
atom stereochemistry gives the clockwise or relative configuration, with respect to some
counter-clockwise direction of the attachment collection of other chiral centers in the
of neighboring atoms when viewed from some structure. Now, however, the structure
reference attached atom-often a hydrogen represents a mixture of the possible stereo-
atom or the lowest atomic numbered atom isomers, considering combinations of the
(37). The order of atoms in the rotation usu- stereo collections that are present.
ally depends on atomic number. CIP stereo-
chemistry is the familiar "R,S" nomenclature Examples of these alternatives are shown
that relates the stereochemistry of the given Fig. 9.5, which shows the present and the
atom to the entire structure (38). CIP stereo- newer stereochemistry options, using a ste-
chemistry requires analyzing the entire struc- roid structure as an example.
ture to determine the stereochemistry values. The bond information usually includes the
It can occasionally be ambiguous, and if any bonding atoms, the bond type, and bond stere-
part of the structure changes, the CIP stereo- ochemistry. Bond types include the common
chemistry on distant atoms in the structure single, double, triple, and aromatic types.
may switch. For these reasons, it is common in They may also include types that are unique to
chemical databases to store local atom stereo- the type of structure, including dative, ionic,
chemistry, but to perceive CIP stereochemis- hydrogen bonds, etc. The bond stereochemis-
try "on the fly." try for double bonds is usually Z (Zusammen-
A particular problem with relative stereo- together), E (Entgegen-opposite), or either
chemistry is that a given combination of "up" (indicating an unknown stereochemistry). For
and "down" bonds on a structure implies a single bonds attached to a chiral or prochiral
mixture of at least two stereoisomers. If all the
centers are specified, the structure represents
center it is typically "up" (wedge or thick .
bond), "down" (dashed or dotted bond), or "ei-
at least the two enantiomers. If some of the ther" (often a wiggly bond). Some systems al-
stereo centers are not designated, the number low the representation of extended stereo-
of isomers the structure represents is 2", chemistry, as with the terminal groups of
where n is the number of undesignated cen- allene systems, which can show a type of tet-
ters. Some database vendors (e.g., MDL) allow rahedral stereochemistry if you collapse the
a "chiral" designation on the molecule, which allene system to a point. The bondinginforma-
indicates that the structure represents only a tion-which atoms are attached to which
single stereoisomer, but does not specify other atoms and the bond types-is collected
which one. One approach to dealing with these in the "connection table" of the structure. Ta-
problems, which is being adopted in MDL pro- ble 9.1 shows a simple atom connection table
grams, is to allow three kinds of stereo desig- for camphor. The diagonal elements of the ta-
nation at a given tetrahedral center: ble describe the type of atom at a given posi-
tion in the structure. The off-diagonal ele-
1. Absolute-an atom is given a known abso- ments describe the bonding of that atom with
lute stereochemistry. If all the stereo cen- other atoms in the structure. Some informa-
ters are so designated, this represents a sin- tion about a structure can be derived implic-
gle stereoisomer of the structure, as drawn. itly from the connection table. This includes
2. Relative as a single stereoisomer-an up or the rings that are present, and the hydrogen
down bond represents the current relative atoms that could be attached. When a struc-
configuration, with respect to some collec- ture can be represented by more than one iso-
tion of other chiral centers in the structure. mer, it is common to either (1)store multiple
Chemical Information Computing Systems in Drug Discovery
Chiral
Current convention:
a single
stereoisomer with
known absolute
configuration
A single stereoisomer
whose absolute
configuration is
known
Abs
A mixture of
relative
stereoisomers
,-, RelMixl
Figure 9.5. Defining absolute, relative collec-

tion, and relative single configuration stereo-
chemistry. The older convention depends on a
"chiral" flag on the molecule to specify whether A single
a given structure represents one or several ste- stereoisomer of
reoisomers. In the newer convention, collec-
tions of stereo centers can be defined, and they
can be designated absolute, relative-part-of-a-
,-, Rell
known relative
configuration
mixture, or relative-single-configuration.
isomers in the database, or (2)run a structure 2.1.4 3D Models. These extend the struc-
search using a search query that will hit the ture representation by adding one or more
desired isomers. This is true for stereoiso- sets of 3D atomic coordinates for the various
mers, enantiomers, and tautomers. Because conformations that the molecule can adopt.
the connection table is often symmetric, it is 3D model representation may also include ad-
possible to store only, say, the upper diagonal ditional atom or bond information such as ~ a r - A
part of it in the database. tial charge or partial bond order. It is common

to generate approximate 3D models from 2D
2.1.3 Reactions. Chemical reactions ex- structures using fast abbreviated molecular
tend the structure representation by adding modeling and fragment joining methods such
information about what role the structure as CONCORD, CORINA, and CONVERTER
plays in the reaction (reactant, catalyst, sol- (39). These programs combine molecular me-
vent, product, etc.). Reaction representation chanics with rules and heuristics to generate
may also include information about what reasonable 3D structures in a fraction of
bonds are made or broken during the reaction, the time required by molecular mechanics
and which atoms are involved in reacting cen- or quantum mechanics modeling. Typically,
ters. It is also common to use a hierarchical hundreds of structures can be processed per
organization for reaction information (reac- second. Although the resulting models are not
tion > variation > reactants, catalysts, sol- the lowest energy models possible, they are
vents, products, etc.). quite suitable for 3D phannacophore searching,
Table 9.1 Connection Table for D-Camphor

10 9
11 0 8
Atom 1 2 3 4 5 6 7 8 9 10 11
and they serve as a good starting point for 9.6). These are typically used to represent
further optimization. mixtures and generic structures.
Recently, with the use of combinatorial and
high-throughput chemistry, more general 2.1.5 Mixtures. Mixtures are useful to rep-
types of structure representation, so-called resent isomers, formulations, and the prod-
chemical libraries, have become common (Fig. ucts of reactions. Their representation usually
.
Chemical libraries
Mixtures:
Generic structures:
R3
\ R1 = Ph, 2-furyl, 2-hexyl, ...
N -R2 R2 = Me, CH2COOH,...
R3 = Et, CH2CN, ...

dl
Number of specifics = n(Rl)* n(R2)* n(R3) - hence, combinatiorial
Figure 9.6. Chemical structure data for high-throughput chemistry. The generic structure repre-
sentation is often referred to as a Markush structure.
requires adding data to specify percent or ular drawing programs has recently appeared
amount content in the mixture for each com- on the Internet (45). Query structures often
ponent. contain generalized atom types, bond types,
and ring types. They may specifjr the required
2.1.6 Generic Structures. Generic or Mar- presence or absence of certain atom types or
kush structures are commonly used to represent functional groups. In the case of 3D models,
structures for patent purposes. Since the intro- queries can be devised to represent pharma-
duction of combinatorial chemistry, generic cophores for certain types of therapeutic activ-
structures and generic reactions have become a ity (46). A n important distinction must often
standard means of representing potentially be made between the query representation of
huge numbers of specific compounds in a highly a pharmacophore used for 3D searching and
compact representation that is familiar to the the conceptual pharmacophore used for drug
chemist (40). The central structure of a generic, development.
which is common to all the structures it repre-
sents, is commonly called the "root" or "parent."
2.2 Types of Chemical Representation
The variable parts of the structure (R,, &,etc.)
are referred to as the "Rgroups." The exact sub- A second way of looking at chemical represen-
stituents that make up the various Rgroups tation is to consider the manner in which the
(e.g., 4 1 , -Br, -OH) are referred to as the chemical structure data is organized and ex-
"members" of the Rgroup. Finally, a specific changed, either in some file format or in a da-
combination of root and Rgroup members- tabase. The most common ways of represent-
which constitutes a single, real structur+is re- ing structures and reactions include the
ferred to as a specific or "enumerated" struc- following.
ture. Some chemical computations, like
property and similarity calculations, can be per- 2.2.1 Linear Notation. One of the earliest
formed on the generic structure without enu- forms of chemical structure representation is
merating all the specific structures (41). Wiswesser line notation (WLN), developed in
The remaining types of chemical data that 1946. This notation used short letter codes to
need representation include substances and represent functional groups in molecules (47).
search queries. An alternative early notation is the Beilstejn
ROSDAL string (48). These two formats are
2.1.7 Substances. Less common in drug not used much today, having been replaced by
discovery, but very useful for material science the Daylight SMILES notation (49) and its ex-
and polymer chemistry, is the ability to store tensions (50). Figure 9.7 shows a drug-like
"substances." These include unspecified or molecule along with WLN, SMILES, and ROS-
uncertain chemical structures, polymers, and DAL notation. Also shown is a simple chemical
other chemical entities that cannot be classed reaction represented in SMILES. Note that
with the other chemical representations (42). SMILES and other linear notation schemes do
Polymers pose particular problems, as dis- not include 2D coordinates for display of the
cussed in the article by Schultz and Wilks (43). structure. These are either stored separately
or generated on the fly (51). The SMILES no-
2.1.8 Search Queries. For all types of tation has become especially popular for prop-
chemical representation, there are query rep- erty estimation programs, because atom coor-
resentations that can be applied to a database dinates are not usually needed for connection
to return a list of structures which match or table-based calculations. It is a very conve-
"fit" the query, or that the query "hits" in the nient method for web-based input of struc-
database. The same chemical drawing pro- tures for property calculations (52). Note that
grams that are used to input structures can the order of atoms in most linear notations is
commonly be used to input chemical structure arbitrary, depending on where in the molecule
queries. These drawing programs currently the notation generator (program or chemist)
include several programs in the commercial starts. For this reason, some linear notations
and public domains (44). A comparison of pop- have a canonical (or "uniquified") form that
2 Chemical Representation
WLN: L66J BMR& DSWQ I N l & l

ROSDAL: 1 = -5- = 10 = 5, 10-1, 1-1 IN-12- = 17 = 12, 3-18s-190, 18-200, 18 = 210, 8-22N-23,22-24
SMILES: OS(=O)(=O)clcc2ccc(cc2c(cl)Nclcccccl)N(C)C
@2):C(:C(:@ l))NC[3]:C:C:C:C:C(: @3)) N(C)C
SLN: OS(=O)(=O)C[l]:C:C[2]:C:C:C(:C:C(:
CHIME: 3aQf713AsUwQDjIMwyMWrSA7AOxHeqiAAPWRmMrZSZlJjTrAEfcsXH1JTUf...
SMILES:
Figure 9.7. Various linear notation schemes for chemical representation. Some contain only atom
types and connectivity (WLN, ROSDAL, SMILES, SLN) and are chemist-readable. Others are com-
pressed versions of molecule file formats (CHIME) and are meant for computer interpretation.
places the atoms in a topological order, usually ture. In the MDL molfile format, the atom and
reflecting their degree of branching, the types bond information is separated into separate
of neighboring atoms and bonds, etc. This ca- blocks. In the Hyperchem HIN file format, the
nonical ordering of the atoms reduces any bond information is mixed with the atom in-
user-input ordering to the same string. It can formation, resulting in fewer records in the
then be used for exact-match lookup of the file. In the PDB format, the atoms can be as-
structure, regardless of how it was drawn or signed to residues. Descriptions of various for-
typed. The SMILES notation has also been ex- mats can be found in the reference manuals
tended to include reactions as shown in Fig. for chemical management and molecular mod-
9.7 (53). Occasionally, other linear notations eling programs or in the literature (55). The
are described (54). systems that manage reactions typically have
their own file formats as well.
2.2.2 Tabular Storage. To preserve more Both linear and tabular formats are capa-
specific information about atoms and bonds, ble of being transmitted over a network be-
such as coordinates, stereochemistry, charge, tween computers. This allows passing struc-
and isotope number, it is necessary to store ture information from a server to a
molecule information in a tabular format. workstation for display purposes. It is com-
Each row of the table typically contains all the mon to compress andlor encrypt the chemical
information about a single atom or bond. In structure information before it is transmitted,
some formats, the atom and bond information and then have the workstation or display pro-
is combined on a single line. Table 9.2 shows gram uncompress or decrypt the resulting
three common file formats for a simple struc- structures. This is done for performance and
Table 9.2 Tabular Molecule File Formats

10 9
MDL Molfile format:

D-Camphor
-1SIS- 03130218162D
2D molfile
11 12 0 0 0 0 0 0 0 0999 V2000
-2.0625 -1.1833 0.0000 C 0 0 0 0
-1.5583 -0.4167 0.0000 C 0 0 0 0
-0.4208 0.1500 0.0000 C 0 0 3 0
-0.8292 -0.7542 0.0000 C 0 0 2 0
-0.7042 1.1667 0.0000 C 0 0 3 0
0.9917 -0.4333 0.0000 C 0 0 0 0
0.4667 -1.2167 0.0000 C 0 0 0 0
0.5237 1.7332 0.0000 C 0 0 0 0
-1.9208 1.6686 0.0000 C 0 0 0 0
0.9875 -2.2803 0.0000 0 0 0 0 0
-1.6004 -2.0787 0.0000 C 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
1 4 1 0 0 0 0
4 5 1 0 0 0 0
5 3 1 0 0 0 0
3 6 1 0 0 0 0
6 7 1 0 0 0 0
7 4 1 0 0 0 0
5 8 1 0 0 0 0
5 9 1 0 0 0 0
7 1 0 2 0 0 0 0
4 1 1 1 1 0 0 0
M END
Hyperchem HIN file format:
m o l l C:\TEMP\DCAMPHOR.HIN
atom 1 - C ** - 0 0.0000 0.8445 0.0000
atom 2 - C ** - 0 0.3881 1.4347 0.0000
atom 3 - C ** - 0 1.2639 1.8710 0.0000
atom 4 - C ** - 0 0.9494 1.1749 0.0000
atom 5 - C ** - 0 1.0457 2.6537 0.0000
atom 6 - C ** - 0 2.3513 1.4219 0.0000
atom 7 - C ** - 0 1.9471 0.8189 0.0000
atom 8 - C ** - 0 1.9910 3.0899 0.0000
atom 9 - C ** - 0 0.1090 3.0401 0.0000
atom 10 - 0 ** - 0 2.3481 0.0000 0.0000
atom 11 - C ** - 0 0.3557 0.1552 1.0000
endrnol 1

Protein Data Bank format:
HEADER PROTEIN
COMPND c:\temp\dcamphor.pdb
AUTHOR GENERATED BY BABEL 1.6
ATOM 1 C UNK 1 -2.063 -1.183 0.000 1.00 0.00
ATOM 2 C UNK 1 -1.558 -0.417 0.000 1.00 0.00
ATOM 3 C UNK 1 -0.421 0.150 0.000 1.00 0.00
ATOM 4 C UNK 1 -0.829 -0.754 0.000 1.00 0.00
ATOM 5 C UNK 1 -0.704 1.167 0.000 1.00 0.00
ATOM 6 C UNK 1 0.992 -0.433 0.000 1.00 0.00
ATOM 7 C UNK 1 0.467 -1.217 0.000 1.00 0.00
ATOM 8 C UNK 1 0.524 1.733 0.000 1.00 0.00
ATOM 9 C UNK 1 -1.921 1.669 0.000 1.00 0.00
ATOM 10 0 UNK 1 0.988 -2.280 0.000 1.00 0.00
ATOM 11 C UNK 1 -1.600 -2.079 0.000 1.00 0.00
CONECT 1 2 4
CONECT 2 1 3
CONECT 3 2 5 6
CONECT 4 1 5 7 11
CONECT 5 4 3 8 9
CONECT 6 3 7
CONECT 7 6 4 10
CONECT 8 5
CONECT 9 5
CONECT 10 7
CONECT 11 4
MASTER 0 0 0 0 0 0 0 0 11 0 11 0
END
for security. In MDL systems, the Chime lin- Often, the graphical format allows the con-
ear format is used to transmit structures and nection table to be stored and transferred
reactions, whereas Daylight systems simply transparently with the image-through the
use the SMILES representation and depict the computer's clipboard, for instance. This al-
structure on the fly (Fig. 9.7). lows the receiving program to "interpret" the
image as a chemical structure and manipulate
2.2.3 Graphical Representation. Occasion- it accordingly.
ally it is desirable to store chemical structures
as "pictures", usually for document purposes. 2.2.4 Markup Languages. The Internet has
For example, some chemical drawing pack- spawned a host of new "languages" that fa-
ages and many molecular modeling packages cilitate the exchange of information. The
can store structures as the following: most common of these are HTML (hypertext
markup language) and XML (extensible
0 Wordperfect or Microsoft Word document
markup language). A variation of XML that is
(.doc files)
designed for chemical information exchange is
a Extended postscript (.eps files) the Chemical Markup Language CML (56).Al-
0 Windows metafile (.wrnf files) though it is not widely used as of this writing,
0 A proprietary sketch (MDL .skc files) it bears watching as more web-based chemical
0 A variety of compressed graphics formats in- information platforms become available.
cluding JPEG (jpg files), bitmap (.bmp Problems with markup languages are that
files), GIF (.gif files), and TIFF (.tif files) they are verbose compared with structure
Table 9.3 Chemical Markup Representation of Acetic Acid

(molecule convention="MDLMol" id="acetate" title= "ACETATE")
(date day="23" month= "11"year= "1995" /)
(atomArray)
(atom id= "al")
(string builtin= "elementType")C(/string)
(float builtin="x2")0.27(/float)
(float builtin= "y2")O. 1217(/float)
(latom)
(atom id="&")
(string builtin= "elementType")C(/string)
(float builtin= "x2")- 1.27(lfloat)
(float builtin= "y2")O. 1246(/float)
(latom)
(atom id= "a3")
(string builtin= "elementType")O(/string)
(float builtin= "x2")1.0623(/float)
(float builtin="y2")- 1.2937(/float)
Vatom)
(atom id= "a4")
(string builtin= "elementType7')O(lstring)
(float builtin="x2")1.1008(lfloat)
(float builtin= "y2")1.4332(/float)
(latom)
(IatomArray)
(bondArray)
(bond id="blV)
(string builtin= "atomRef')al(/string)
(string builtin= "atomRef')&(/string)
(string builtin= "order")l(lstring)
(/bond)
(bond id= "b2")
(string builtin= "atomRef ')al(lstring)
(string builtin= "atomRef')a3(/string)
(string builtin= "order")l(/string)
(/bond)
(bond id="b3")
(string builtin= "atomRef")al(lstring)
(string builtin= "atomRef ')a4(/string)
(string builtin= "order")2(lstring)
(/bond)
(/bondArray)
(Imolecule)
files, and they are difficult for chemists to read chemist to import and export structures using
(although they are not usually meant for a variety of file formats. Commercial programs
chemist interpretation). This is evident in Ta- designed specifically for file conversion are
ble 9.3, which shows the CML for acetic acid. available (57). A widely used public domain
By comparison, the SMILES for acetic acid is program, Babel, is available in source code and
simply "CC(=O)O". in a Windows version (58). It is being extended
by the "OpenBabel" programming project
2.3 Chemical Structure File Conversion (59). It is possible, with a fair amount of accu-
Many chemical information management sys- racy, to convert a chemical structure from a
tems, especially modeling programs, permit a connection table format to an acceptable
I 3 Storing and Searching Chemical Structures and Reactions
tion. So-called Sgroup data appears as part of

the molfile, along with the name of the data
field, its value, location on the display, and the
atoms and bonds that bear the data.
3 STORING AND SEARCHING CHEMICAL

STRUCTURES AND REACTIONS
In the simplest sense, searching chemical in-

formation consists of (1)finding structures or
reactions that meet the chemist's search crite-
ria and/or (2) finding data that meets the
Figure 9.8. Structure representation with addi- search criteria. Data searching (numbers and
tional information, including atom (partial charge) text) is a well-established informatics activity,
and fragment (percent composition) data and supported by spreadsheets, word processors,
Markush structure features. and relational database systems. Chemical
structures and reactions are a unique form of
data. Searching for full or partial matches to
IUPAC name (60). The reverse conversion of
structures, models, and reactions requires
names to 2D structures is also possible (61).
highly specialized databases and search tech-
2.4 Representing Nonstructural Chemical
niques.
Data 3.1 Storing Chemical Information
Nonstructural chemical data includes any tex- in Databases
tual, numeric, or binary data that is not di- When they were first developed, chemical
rectly a part of the chemical structure. It in- structure databases consisted of record-ori-
cludes the following: ented flat files, much like a spreadsheet whose
columns have each been cut out and placed in
0 Whole-molecule data such as physicochemi- a separate file. This organization has limita-
cal properties, spectral data, literature citations in searching, access, and efficiency of,
tions, availability, biological or therapeutic storage. Also, it is not the most appropriate
activity, etc. In either the molecule file or in form for storing generic structures and reac-
a database, data are typically maintained in tions, which are more hierarchical in nature.
fields that are separate from the structure As a result, since the 1990s, chemical informa-
field, but linked by some identifier. tion has become increasingly stored in com-
0 Atom, bond, or fragment-based data such as mercial relational database systems, chiefly
partial charge, component fraction, second- Oracle and Microsoft Access. Relational stor-
ary structure, various fragment-based phys- age has the added advantage of combining
icochemical and QSAR properties, etc. Be- chemical structure storage with biological
cause these data are linked to particular data and inventory data (location, cost, units
atoms, bonds, fragments, or components in on hand, etc.) that are often stored in the cor-
the structure, they are typically stored along poration's relational databases. One of the
with some indication of the substructural first reports of the relational storage of struc-
features with which they are associated. tures was by Hagadone and Lajiness, who
modified the Upjohn COUSIN system (62).
Figure 9.8 shows a structure that contains An example of a current commercial rela-
atom (partial charge) and fragment (compo- tional chemical database is seen in Fig. 9.9,
nent percent) data of various types, as well as which shows the organization of a basic ISIS
Markush structure features (R,). Table 9.4 chemical database. Each labeled item in the
shows the corresponding structure-data file figure is a table in an Oracle relational data-
(MDL sdfile). Each Rgroup member appears in base. The tables in the database consist of the
its own "submolfile" in this file representa- following.
Table 9.4 Example Molfile Showing: Markush Features, Atom, and Fragment Data
$MDL REV 1 29AUG0117:47

$MOL
$HDR
Figure 8 molfile
MACCS-II08290117472D
$END HDR
$CTAB
1 9 1 9 0 0 0 0
-5.3301 1.0237
-5.3323 -0.5232
-3.9956 -1.2952
-2.6560 -0.5224
-2.6615 1.0307
-3.9990 1.7956
-3.9881 3.2993
-3.9960 -2.8378
-2.6701 4.1134
-1.3283 1.8070
0.8559 0.9069
0.8538 -0.6400
2.1904 -1.4121
3.5299 -0.6393
3.5247 0.9138
2.1870 1.6787
2.1823 3.2214
3.5161 3.9966
2.1900 -2.9547
8 3 1 0 0 0
2 3 1 0 0 0
7 9 2 0 0 0
5 1 0 1 0 0 0
3 4 2 0 0 0
1 1 1 2 2 0 0 0
4 5 1 0 0 0
1 2 1 3 1 0 0 0
1 3 1 4 2 0 0 0
5 6 2 0 0 0
1 4 1 5 1 0 0 0
6 1 1 0 0 0
1 5 1 6 2 0 0 0
1 6 1 1 1 0 0 0
1 2 2 0 0 0
1 6 1 7 1 0 0 0
6 7 1 0 0 0
1 7 1 8 2 0 0 0
1 3 1 9 1 0 0 0
3 Storing and Searching Chemical Structures and Reactions 375

M RGP 2 8 1 19 1
M STY 6 1 GEN 2 GEN 3 DAT 4 DAT 5 DAT 6 DAT
M SLB 6 1 1 2 2 3 3 4 4 5 5 6 6
M SAL 1 9 11 12 13 14 15 16 17 18 19
M SDI 1 4 0.0515 -3.7303 0.0515 4.7961
M SDI 1 4 4.8009 4.7961 4.8009 -3.7303
M SAL 2 10 1 2 3 4 5 6 7 8 9 10
M SDI 2 4 -6.1377 -3.6181 -6.1377 4.9083
M SDI 2 4 -0.5469 4.9083 -0.5469 -3.6181
M SAL 3 1 7
M SDT 3 PCHARGE F MQ
M SDD 3 - 5.4645 3.7303 DA ALL 1 5
M SED 3 0.12
M SDT 4 PCT F MQ
M SDD 4 -0.9021 -4.9083 DA ALL 1 5
M SED 4 60%
M SDT 5 PCT F MQ
M SDD 5 5.2122 -4.7961 DA ALL 1 5
M SED 5 40%
M SAL 6 1 17
M SDT 6 PCHARGE F MQ
M SDD 6 0.5002 3.9173 DA ALL 1 5
M SED 6 0.05
M SPL 2 4 2 5 1
M END
$END CTAB
$RGP
1
$CTAB
2 1 0 0 0 0 2 v2000
-3.7453 0.0472 0.0000 C 0 0 0 0 0 0
-2.5668 1.0385 0.0000 C1 0 0 0 0 0 0
1 2 1 0 0 0
M APO 1 1 1
M END
$END CTAB
$CTAB
3 2 0 0 0 0 2 V2000
-5.0122 0.6013 0.0000 C 0 0 0 0 0 0
-3.5311 1.0229 0.0000 C 0 0 0 0 0 0
-2.6113 -0.2666 0.0000 Br 0 0 0 0 0 0
1 2 1 0 0 0
3 2 1 0 0 0
M APO 1 1 1
M END
$END CTAB
$END RGP
$END MOL
0 The master "data dictionary" table, which A handful of tables that contain database
describes all the objects in the database, as parameters. These include substructure
well as some parameters that are specific to search key definitions, the periodic table
the database (exact match criteria, version used with the database, and a list of salt
of the database, etc.). This is sometimes re- moieties that can be considered during
f e d to as "metadata" or "data about data." searches.
D l definitions
Main data
dictionary
(metadata)
€3 Structures
€3
I
Periodic Formulas
€3
table
Flexmatch index
definitions
1 (tautomers,isomers, etc)
Substructure
Y
keJ
Fastsearch index
(substructuresearching) Structures
and
Database data
parameters
Search indices
Figure 9.9. ISIS, a relational chemical structure database.
Structure and data storage is shown on the bonyl), or more complex atombond combi-
right. A structure table contains the struc- nations (e.g., carbonyl separated from a sec-
tures, their internal identifiers, and their ondary amine by three bonds). In ISIS, a set
external identifiers, if any. The structures of 166 searchable keys can be explicitly us,ed
are stored in a compact binary representa- as filters for structure searching. A larger
tion that includes the connection table, the set of 960 keys is used for similarity calcula-
coordinates, the ring information, and any tions. For 3D models, it is common to gener-
stereochemical, valence, isomer, isotope, or ate 3D pharmacophore keys, which encode
bond information. Certain types of struc- all the possible 2- and 3-point pharmaco-
ture-specific information such as polymer or phores represented in the structure, some-
component designations are stored here, times considering multiple conformations.
whereas other types of structure-specific in-
formation (atom- or bond-specific data, and 0 A third kind of information includes indexes
more verbose text data) are stored in their to enable structure and substructure
own tables, referenced by the internal iden- searching. A "flexmatch" table contains a
tifier, and the atom or bond numbers to numerical hash (see Glossary) of certain fea-
which the data correspond. A formula table tures of the molecule, including stereochem-
contains the molecular formula and various istry, charge, and isotopes. This table can be
atom and atom-type indexes to enhance for- used to retrieve a set of candidate structures
mula searching and sorting. quickly for exact match verification (63). It
0 A table of substructure keys containing a can also be used for "fuzzy" exact-match
binary or text string of the substructure fin- searches to retrieve tautomers and isomers
gerprint that was identified in the given of the input structure.
structure at registration time. These keys Another index table contains a "fastsearch"
represent the presence of either simple index. This contains a single balanced tree
functional groups (e.g., phenyl ring, car- (see Glossary) of all the substructural frag-
3 Storing and Searching Chemical Structures and Reactions
/ / \
C- C C=C C#C' c- 0 C N...
Figure 9.10. Simplified ISIS Fastsearch index-ethanol is a leaf node that can be reached from
several substructure nodes.
ments found in structures in the database, 3.2 Registering Chemical Information
up to a fixed pathlength. These are stored in
a highly compressed binary format (Fig. Chemical structure registration is an impor-
9.10). Similar approaches have appeared in tant activity that is necessary for drug discov-
the literature (64). Leaf nodes in the tree ery. The structures that have been developed
contain identifiers of specific structures in by a pharmaceutical company constitute the
the database (simplified in Fig. 9.10). An ex- "crown jewels" of chemical information, and
act match or substructure search consists of they must be properly and securely archived.
traversing the tree to find structures in the The registration process usually involves the
database that have substructural fragments process of extracting, cleaning, transforming,
in common with the query structure. Be- and loading the data-sometimes termed
cause the fastsearch index is large-often as ECTL.
large as the rest of the structure database,
updating it for the addition or removal of 3.2.1 Extract the Data. First, the struc-
structures is time consuming. tures/reactions and corresponding data are ex-
tracted, collected, and validated. Increasingly,
This relational chemical database format is this is managed automatically, using output
extended in ISIS to include 3D models, generic from the high-throughput chemistry process.
structures, and most recently, reactions. In Laboratory information management systems
these cases, additional "trees" in the database (LIMS) that are "structure smart" can man-
hierarchy connect 2D structures with 3D mod- age chemical structure information starting
els, connect root structures with correspond- from the design of a reaction, through the syn-
ing Rgroup members, or connect molecules
thesis of the compounds, the chemical analysis
with reactions.
of the structures, the in vitro biological assay,
Other relational structure/reaction data-
and finally the storage in the chemical data-
base systems are available commercially.
These include the Thor system from Daylight base. Certain steps, such as drawing the initial
(651, Accord and RS3 Discovery from Accelrys structures/reactions, still remain an activity
(661, and Unity from Tripos (67). Personal da- for the chemist, although many chemical in-
tabase systems that can be implemented on a formation systems can take a generic struc-
desktop computer include ISIS/Base (68), Ac- ture, enumerate the many specific combina-
cord for Access (66), and Team Works from tions, and layout the structures automatically
Afferent (69). (for example, the Monomer Toolkit by Day-
light, the Central Library program by MDL, dexes, substructure or similarity keys, molec-
and CombiLibMaker by Tripos). ular formula, molecular weight, and other
structure-based properties. Substructure keys
3.2.2 Cleaning and Transforming the Data. or "fingerprints" are particularly important.
Next, the structures/reactions are passed They consist of a number of binary descriptors
through a filtering program that searches for for the presence of certain functional groups
structure anomalies and corrects the chemical or more generalized atom/bond combinations.
representation. In this step, chemical "busi- These keys can be used to filter structures
ness rules" are applied to the structures to before searching. They are also used for simi-
insure that representations that can be drawn larity calculations. Originally, substructure
in different ways, such as nitro groups and search keys were always used to filter struc-
tautomers, are represented by a single conven- tures before performing a substructure search
tion. Specialized chemical manipulation lan- of the database. If a query structure contains,
guages such as and Genie Control Language say, a carbonyl group, then only carbonyl-con-
by Daylight, Cheshire by MDL, and Sybyl Pro- taining structures should be examined during
gramming Language by Tripos are used to im- the substructure search. A key representing
plement this step. These languages are versa- the carbonyl group can be used to filter struc-
tile and easily programmed, and they can be tures that contain the group (the key turned
applied to other steps in the drug discovery on, or set to 1) from those lacking it (key set to
process, such as searching, property calcula- 0). Tree-based substructure searching does
tion, and structure manipulation in general. not require prior filtering, so today, substruc-
ture keys are primarily used for similarity cal-
3.2.3 Loading the Data. Finally the struc- culations between molecules. If the key values
tureslreactions are handed to a chemical reg- of two structures are compared, the more keys
istration system. The chemical registration they have in common, the higher their similar-
system will typically "perceive" the struc- ity value will be. When registering reactions,
tures-identify atoms, bonds, rings, stereo- the reactants and products may undergo auto-
chemistry, valence states, isotope values, and mated or semi-automated perception of react-
other chemical information as needed. In the ing bonds and atom centers (71). Generic
case of reactions, it notes which structures are structures may be analyzed and "clipped" or
reactants, which are products, and which are reverse-transformed to generate root and
agents or catalysts. Because there can be member structures, which may be stored sep-
many valid ways of drawing a structure de- arately (72).
pending on which atom you start with, a struc- Before finally storing the structure in the
ture may be given a canonical renumbering of database, the registration program may
the atoms using a variant of the Morgan algo- search the database for some level of match to
rithm (70). In the case of a linear representa- the input structure or reaction, and skip the
tion like SMILES, this canonicalization yields registration if it is a duplicate. This is some-
a unique string for the structure, which can be times termed "deduplication" through "exact
generated from any valid SMILES string de- match" searching. There is usually some re-
rived from the structure. In the case of a struc- dundancy in chemical databases, and to save
ture stored in a connection table, the Morgan search time and disk space, most companies do
algorithm results in the atoms being reor- not store duplicate structures or reactions, but
dered in the connection table to generate a rather store pointers to them.
tree, branching outwards from the most The final step, after registering the struc-
highly connected atom in the structure. Be- ture or reaction, is to assign it a unique
cause of the efficiency of indexing in modern registry identifier, which is typically used
relational chemical databases, Morgan re- throughout the company to identify the given
numbering is not used as much today as in the structurelreaction and any chemical, biologi-
past. cal, or inventory data that is associated with it.
The registration system then computes in- Some identifiers, like the Chemical Abstracts
dexes. These include structure-searching in- Service CAS number and the Beilstein BRN,
have wide application, and these may be used the chemist's experience and preferences, and
in addition to, or instead of, a corporate-as- balanced by synthetic feasibility and econom-
signed external registry identifier. ics. The reagents may be located in-house, or
they may require ordering from a chemical
3.3 Searching Chemical Structures
supplier.
and Reactions
A completely separate approach to reaction
The type of chemical structure and reaction discovery is the reaction planning approach
searching that a chemist does usually depends implemented in such programs as Logic and
on the current stage of a project. For example, Heuristics Applied to Synthetic Analysis
if the chemist is starting a new therapeutic (LHASA) (75). This program works by search-
project, a therapeutic activity search might be ing a chemical knowledge base that contains
conducted, using a database such as the Der- information on approximately 2300 retro-re-
went World Drug Index, the MDL Drug Data actions or transforms. The chemist draws a
Report, or the MDL Comprehensive Medicinal target molecule and indicates a strategy for
Chemistry database. Retrieving many search the reverse-synthetic analysis. The program
hits, the chemist might organize them by sort- then searches the transform knowledge base
ing on name, molecular weight, ring system, for transforms that satisfy the strategy the
or some topological basis. If the resulting list is chemist selected. The program decides which
too large, the chemist might perform a cluster transforms are suitable for the particular tar-
analysis of the structures to see what general get structure and displays the resulting pre-
classes of compounds have been synthesized in cursors to the chemist. The chemist can then
the past. After sampling from the various clus- select a precursor for further analysis and
ters, and identifying a handful of interesting choose another strategy option, on which the
structures, the chemist might perform a sub- program returns a second level of precursors
structure search to find structures that con- in the same way. Processing continues in this
tain the features that are felt to be important manner until the chemist is satisfied that one
to activity (i.e., the pharmacophore). If that or more of the precursors correspond to a rea-
search returns too many hits, the search query sonable starting point for a synthesis. Ret-
can be refined by making it more specific. If rosynthetic methods have not become as
the search returns too few hits, the search widely used in industry as reaction searching,
query can be relaxed, or a similarity search partly because the certainty of the reactions is
can be used to find structures in the topologi- not guaranteed. Also, searching existing reac-
cal neighborhood of the query structure. tion databases generally yields the desired re-
Eventually, a number of structures will be ob- action or something close to it. Indeed, a major
tained as candidates for synthesis and/or test- problem with search results from reaction da-
ing. tabases is often an overabundance of hits,
The next step is to design a set of reactions which typically need further organization and
to synthesize the compounds. One or more re- filtering to be useful. One approach to organiz-
action databases can be searched to find ing the results of reaction searching is to apply
whether any reactions give the desired struc- some clustering or classification to the reac-
tures as products or give structures that are tions (76).
similar to the desired ones. The chemist may To support the workflow just described, a
also use reaction similarity searching (73) and number of structure and reaction search types
searching across reaction schemes (e.g., if A + have come into use (Fig. 9.11). These are
B -+ C + D and C + E + F + G; a reaction briefly described as follows.
scheme search will find the query A +F) (74).
Once a reaction is found, the chemist needs to 3.3.1 Exact Match Searching. Here, the
decide what reagents to use in the synthesis chemist has a particular structure (or reac-
and where to obtain them. The selection of tion) that he wishes to find in the database.
reagents will usually be based on a combina- The structure/reaction is drawn using a draw-
tion of physicochemical property considering program and then passed to a search pro-
ations (i.e., QSAR and diversity), tempered by gram. The program submits the query to the
Searching chemical information

Type of chemical data
Strings Structures Reactions 3D Models Chemical libraries
Type of search
Exact match X X X X X
Substructure 1 X X X X X I
Pharmacophore X X
Similarity X X X X X
Figure 9.11. Search types depend on the nature of the chemical information.
search routine that typically generates index ture is mapped to the candidate structure us-
values from the query that are of the same ing a process known as atom-atom mapping,
type as those generated for structureslreac- which is known in topology as the "graph iso-
tions when stored in the database. The index morphism" problem. This mapping is time-
values are then used as filters to retrieve a set consuming, so the prior filtering step should
of candidate structures/reactions. In ISIS, be as efficient as possible. Each structure that
these filters include the formula, the molecu- maps exactly to the query is placed in the re-
lar weight, and the flexmatch index, a numeric sult set or "hit list." To accommodate various
hash code based on the presence of isomers, chemists' needs, exact match searching can
tautomers, isotopes, salts, charges, and stere- usually be "relaxed" to permit the finding of
ochemistry (see Glossary). The resulting fil- isomers, tautomers, salts, charged or un-
tered structures have the minimum set of re- charged species, etc. In the case of reactions,
quirements to fit the search query, but variations of the reaction can be retrieved-by
typically only a fraction of these structures relaxing the constraints on the reaction condi-
will fit the query exactly. Once this set of can- tions, solvent, and catalyst (Fig. 9.12).
didate structures is obtained, the query struc- In a Daylight Thor database, where the
N
Query Tautomer Salt
I
Figure 9.12. Different degrees of exactness I
D
D
can be defined by allowing tautomers, salts, and
isomers successively in the search. Isomers
Single Atom lists "Any" atom

atom Figure 9.13. Example 2D substructure search
queries with various atom and bond query
features. The more features that are present,
(J
(3.) R1=
f *
CI
f
s
CH2OH
R1
the more flexible the search becomes, but the
search may also require more time to com-
plete. There is a trade-off between putting the
flexibility into the database (i.e., storing and
indexing multiple forms of a structure) and
putting the flexibility into the search query
Link node Stereo bonds Markush and the search software.
structure is stored as unique SMILES, the ca- requests a particular hydrogen count or
nonical query SMILES can be compared lexi- range at a given position
cally with strings in the database using fast 0 Link node-which specifies a range of al-
string comparison and indexing techniques, to lowed atom or functional group links be-
find exact match structures and reactions. Be- tween atoms
cause a structure in a Thor database consists
0 Stereo bond-including Z/E/either or up/
of a meaningful, canonical sequence of charac-
ters, the computational efficiencies of string downleither
searching and comparison can be applied 0 Markush feature-used for patent repre-
when searching the database. This is in con- sentation, for representation of generic
trast to the highly specialized search tech- structures for combinatorial chemistry, or
niques used in other structure database for- to limit the substituents that can be present
mats. at a given position. Note that some systems
allow logical operations on Markush fea-.
3.3.2 Substructure Searching. A substructures (if 4 H at R,, then no - 4 1 at R,).
ture search is performed when a chemist has
in mind a pharmacophore consisting of a set of A specialized case of substructure search-
functional groups or a substructure which he ing is 3D pharmacophore searching, in which a
knows must be present in the structures to be substructure search is combined with the
retrieved. Only part of the molecule is drawn, measurement/generation of 3D features, to
along with query features that generalize at- identify models that could fit a 3D pharma-
oms, bonds, and rings in the structure. Figure cophore. Figure 9.14 shows an example of a 3D
9.13 shows some typical substructure query substructure search query that includes vari-
features. The features include the following: ous 3D features or constraints. A given confor-
mation of a molecule that is stored in the da-
0 Single atom-specifies a periodic table atom tabase may not exactly match a given query,
that must be present or a more generalized but it could be modified by rotation about sin-
atom (hetero, metal, etc.) or "superatom" gle bonds to fit the query. For this reason, con-
(condensedfunctional group, such as Ph, Et, formationally flexible 3D searching is a fea-
Ala, etc.) ture of most 3D database systems (77). When
0 Atom l i s t a list of atoms, any one of which searching conformers, the conformational
may be present flexibility can be incorporated into (1) the
0 "Any" atom-which simply means some query, by tethering flexible groups to fixed an-
atom must be attached at the given position. chor points in the structure, (2) the database,
As with structures, the hydrogen atoms in by storing multiple low energy conformations
substructures are implicit, unless the user for each structure, or (3)the search process, by
Figure 9.14. (a) Example 3D pharmacophore

search query, showing substructure, distance,
angle, and exclusion sphere constraints. (b)
Example result of a conformationally flexible
3D search using this query. The molecule was
"flexed" in 3D by rotating about the high-
lighted single bonds to fit the query. The at-
oms, bonds, and 3D features that match the
query are colored. One problem with confor-
mationally flexible 3D searching is that un-
wanted hits can be conformed to fit the query.
incorporating a rapid conformational analysis 3.3.3 Similarity Searching. The most gen-
into the 3D search algorithm. The last is the eralized type of structure/reaction searching- is
most common approach and is a part of data- searching for "similar" structures or reactiong
base systems from Tripos, Accelrys, and MDL. in the database. Chemical similarity has been
Many different approaches to substructure a highly debated topic for some time, mostly
searching have been devised (78). In ISIS, the from the standpoint of what constitutes good
fastsearch index file is used to retrieve candi- descriptors to use in the similarity calcula-
date structures. If needed, the query is then tions (81). Nevertheless, there are some gen-
mapped onto these structures using a "back- eral approaches that are widely used, not be-
tracking" approach. This involves succes- cause of their theoretical soundness, but
sively matching atoms and bonds in the struc- simply because they work for the chemist. For
ture to those in the query in a stepwise 2D structures, the most useful and efficient
manner. When a match fails at any given step, similarity approach is key-based similarity.
the program backtracks to the last successful This involves computing the overlap between
step and selects an alternative atom or bond. a query structure and a candidate structure
Once all the atoms and bonds have been using substructure or fragment keys. ISIS
matched, the structure is considered a hit. An uses the 960 keys that are generated when the
issue of the Journal of Chemical Information structure is registered. The overlap is typically
and Computer Sciences has been devoted to computed using the Tanimoto metric, which
substructure search methods (79). Hicks and was first used in 2D structure similarity by
Jochum reported a comparison of several sub- Willett et al. (82). Depending on the nature
structure search algorithms in 1990 (80). and number of the keys, it may be desirable to
These authors found the Beilstein-Softron S4 weight the Tanimoto calculation inversely ac-
search system to be superior in search speed at cording to the prevalence of the key in the
that time. database. Thus, a cyclopropyl key, which may
not be highly prevalent in the database, and tached, retention or inversion of stereochem-
would be "swamped" by other, less relevant istry, etc. Bond changes include making and
keys in an unweighted similarity calculation, breaking of bonds, and changes in bond order
may have more influence in a weighted calcu- and stereochemistry. When searching reac-
lation. This weighted calculation is used as the tions, the chemist can search for exact, isomer,
default in ISIS chemical databases. It is possi- or substructure matches in the reactants, in
ble for an ISIS database administrator to re- the products, or both. The structure searching
generate the keys using custom values of the can be accompanied by a search of the reaction
weights to enhance differences in the similar- text information for -yield and conditions. Sev-
ity calculations and select, say, more "drug- eral commercial reaction indexing systems are
like" molecules in the search. In the reaction available from molecule database vendors,
domain, similarity can be defined in terms of and online searching is even possible (87).
the structures, the reactions, or a combination In most reactions, the majority of the at-
of the two (83). Other similarity search sys- oms and bonds are not involved in the reac-
tems have been described in the literature, in- tion, and they remain unchanged between re-
cluding the one used by CAS (84). It is also actants and products. To avoid examining
possible to use 3D pharmacophore keys to these unchanging atoms and bonds, most re-
compute similarity, although these have typi- action indexing systems allow the user to
cally not performed as well as 2D keys. It is mark, in the reactants and products, those at-
possible that conformational flexibility so oms and bonds that are involved in the reac-
vastly expands the "chemical space" of the tion. These are termed reacting center atoms
molecules that a limited number of keys is and bonds, and when they are present, they
simply inadequate for 3D similarity calcula- enable much faster reaction searching and
tion. When attempting to predict the type of they reduce the number of false hits obtained.
therapeutic activity a compound has, Briem A simple example is seen in Fig. 9.15. Some
and Lessel concluded that 2D and 3D keys systems have semiautomatic perception of re-
have complementary information (85). acting centers, which must usually be aug-
mented or checked by a chemist, especially
3.3.4 Reaction Searching. Reaction search- with complex transformations.
ing, sometimes called reaction indexing, has As with molecules, it is also possible to do
been available for over 20 years. Originally de- reaction similarity searching. Given a reaction
veloped as online searching systems, the intro- with reactants, products, and agents, one can
duction of in-house systems like REACCS al- typically run molecule similarity searches for
lowed pharmaceutical companies to augment the reactants, the products, or both. This will
published reaction sources with their own re- retrieve reactions that have similar structures
actions and data (86).As with molecules, reac- involved in them. This does not guarantee
tion storage has moved from proprietary data- that the molecules undergo the same or even
base foundations to storage and access in similar transformations. It is possible in some
relational systems. Reaction searching encom- systems to also include the similarity of the
passes many of the same types of searches transformation as part of the overall similar-
used for molecules. A reaction typically con- ity search. This is usually carried out using
sists of three types of structures: reactants, special keys that have been generated for a
products, and catalysts or agents, along with fixed number of possible transformations. As
textual information about yield, conditions, with molecules, the more keys a query and a
etc. Reactant and product structures undergo reaction have in common, the higher will be
structural changes in the reaction, whereas the similarity.
agents do not. The atom and bond changes
that occur in a reaction are isolated in one or 3.3.5 Searching Other Data. Data other
more reacting centers of the reactants and than structures and reactions must also be
products. The atom changes consist of searched in the drug discovery process. Vari-
changes in atom valence, charge, number of ous systems exist for indexing and searching
attached hydrogens, number of bonds at- literature and journal contents (881, patents
Query
jl
-
.8. 3.
7.c;. o',
.6.
S
.5.
-, si,/O
I
&.L
.3. .2.
.I.
.4.
6.
.5.
-
.lo.
.9.
f l . 8 .
.4. 0
.5.
.6.
.7.
.lo.
\O
.9.
r"
.3./2.
.4.
Figure 9.15. Reaction substructure search query and some example hits. If no reacting center or
0 .6.
.5.
.a.
.7.
mapping information is used, all three hits are found. If reading bond information is used, hit c is
excluded. If both reacting atom and reacting bond information is included, then false hits b and c are
excluded.
(891, material safety data sheets (go), and the Internet. Some representative systems
chemical suppliers (91). Some useful tools in- that are being sold or have been discussed re-
clude the Accord ChemExplorer program, cently in the literature are discussed below.
which allows searching word processor docu-
ments and files for particular chemical struc-
tures, and the Cambridgesoft ChemFinder for 3.5 Commercial Database Systems
Word (92). for Drug-Sized Molecules
Accelrys. A subsidiary of Pharmacopeia,
3.4 Chemical Information Management Inc, Accelrys was originally a provider of mo-
Systems and Databases lecular modeling software. They recently ac-
A number of software and database vendors quired several companies that provide offer-
provide programs and database systems to im- ings in the chemical information and
plement representation, registration, and bioinformatics areas. The company provides
searching of chemical information in a corpo- unique databases including several for reac-
rate environment. Some of these vendors have tions.
smaller personal chemical database systems
that support registration and searching on a BioCatalysis-biomolecules as catalysts
personal computer. A handful of academic and BioSter-pairs of biologically similar struc-
public domain systems are also available. Fi- tures for bioisosterism applications
nally, an increasing number of chemical infor- Biotransformations-developed in conjunc-
mation systems are being made available on tion with the Royal Society of Chemistry
Failed Reactions-those that did not pro- CODENs, and patent information, are also
ceed as expected stored. The data are organized into substance,
Metabolism-developed in conjunction reaction, and citation contexts, and a user can
with the Royal Society of Chemistry easily switch from one context to the other. An
Methods in Organic Synthesis--33,000 re- ACS symposium volume devoted to the Beil-
actions, Protecting Groups-functional stein database has been published (96).
group protection with region/stereoselectiv- Chemical Abstracts Service. As a division of
ity the American Chemical Society, CAS develops
and manages the world's largest databases of
Solid Phase Synthesis-with emphasis on chemical structures and reactions.
small-molecule and combinatorial chemis-
try CAS Registry--35 million structures-19.5
million distinct structures-13 million bio-
The chemical information programs pro- sequences
vided by Accelrys include several database sys- CASREACT-4 million reactions
tems.
CHEMCATS-2.5 million commercially
available chemicals
Accord for Excel and Access-relational
chemical storage for Microsoft programs MARPAT-500,000 searchable Markush
structures
Accord for Oracle-a chemical data car-
tridge (see Glossary) The CAS databases are maintained online,
Accord Database Explorer-to access Accel- with searching allowed on a subscription ba-
rys reaction databases sis. SciFinder is a clienttserver application to
RS3 Discovery System-with programs for search CAS databases by author, keyword, ex-
chemical structure, data management, act, and substructure. It includes a "keep me
high-throughput screening, and inventory posted" update feature, reaction information
back to 1974, nucleotide and protein sequence
Accelrys also provides programs for de- searching, browsing of 1600 journals, and in-
scriptor calculation, QSAR, and data mining tegration of structure, data, and citation infor-.
(93). mation. STN International is a collection of
The Beilstein Database. The Beilstein Data- 200 databases covering chemistry, life sci-
base, with over 8 million structures, is the old- ences, engineering, patents, etc. STN Express
est in existence, based on the Beilstein Hand- provides wizard-assisted searching, and STN
book of Organic Chemistry, and contains data on the Web serves as a web client for STN. The
that extend back to 1771. The database is pro- ChemPort program provides web access to
duced by the independent Beilstein Institute journals (97).
(94). Access to the database is either through Daylight Chemical Information Systems,
Beilstein Online, available through STN and Inc. This company provides numerous third-
Dialog, or through the Web using Crossfire party databases in the Thor format. These in-
Beilstein, which is marketed by MDL GmbH- clude the following:
formerly Beilstein Inc. (95). Data that are
stored include the structure, Beilstein and Databases of organic structures: Available
CAS Registry Numbers, names, formula, Chemicals Directory-250,000 structures,
preparations, reactions, natural product isola- Asinex catalog-115,000 structures, May-
tions, and chemical derivatives. Physical prop- bridge catalog-62,000 structures, Info-
erties, if available, are also stored, including Chem SPRESI'95-2.5 million structures
optical data, mechanical properties, multi- Drug and biological databases: BioScreen
component system data, spectral and thermo- NP and SC-about 52,000 structures in-
dynamic properties, as well as biological func- cluding natural products, Pomona College
tion, ecological data, toxicity, and common Medchem-36,000 structures with mea-
uses. Citation data, including author, journal, sured LogP, National Cancer Institute-
Chemical lnformation Computing Systems in Drug Discovery
120,000 structures with cancerIHIV screen- Derwent Selection database-customized

ing data, Dement World Drug Index WDI- subsets of the WPI
60,000 drugs.
Toxicity: Aquire-5300 EPA structures The databases are available through sev-
with aquatic toxicity, TSCA-100,000 EPA eral hosting services, including STN, Dialog,
substances and Questel Orbit. User guides for the PC1
Reactions: InfoChem ChemReactIChem- chemical indexing are available online at Der-
Synth-390,000 reactions with 470,000 went (99). Chemical patents can also be
structures, InfoChem SpresiReact-2.5 mil- searched using the Merged Markush Service,
lion reactions and 1.8 million structures Micropatent, and for Japanese patents, the
Japanese Patent and Trademark Documents
Software and applications from Daylight (ISTA) among others (100).
include the following (98): The Gmelin Database. The most compre-
hensive database of structures, properties,
0 Numerous toolkits: SMILES, Depict, and citations in inorganic and organometallic
SMARTS, Fingerprint, Monomer, Thor, chemistry is the Gmelin database, based on
Merlin, X-Widgets, Program objects, Re- the Gmelin Handbook of Inorganic and Orga-
mote Access, and Reaction Toolkits (see nometallic Chemistry dating back to 1772.
Glossary) This database includes 1.4 million compounds
Daylight chemistry cartridge for Oracle: including coordination compounds, alloys,
Daycart (see Glossary) solid solutions, glasses and ceramics, poly-
0 Thor database manager-to build and man- mers, and minerals. As such, it is less valuable
age thesaurus-oriented databases to drug discovery. The current Gmelin data-
base is owned by the Gesellschaft Deutscher
Merlin searching of structures and data
Chemiker and is licensed to MDL GmbH.
0 Clustering package, with Jarvis-Patrick MDL Information Systems, Inc. Owned by
type cluster analysis Elsevier Science Publishing, MDL is a long-
0 Rubicon, a program for building 3D models time provider of in-house databases and soft-
0
using a distance geometry approach
PCModels for LogP and other physical prop-
ware. Databases include the following: .
erty calculations Available Chemicals Directory ACD-
CombiChem Package to manage high- 300,000 structures-reagents and general
throughput synthesis chemicals, with supplier information
0 Reaction Package Bioactivity databases-AIDS database-
0 DayCGI-a web development toolkit 43,000 structures and data from the Na-
tional Cancer Institute, Comprehensive Me-
0 A set of Java tools for chemical information
dicinal Chemistry (CMC)-7500 common
management
drug structures, MDL Drug Data Report
(MDDRI-120,000 patented drug struc-
Derwent Information. A division of Thom-
tures
son Scientific, Inc., Dement is the leading
supplier of value-added patent information. 0 Reactions-ChemInform-850,000 reac-
The Dement databases, which are main- tions and 1.2 million structures, Theilhei-
tained online, include the following: mer/Chiras/Metalysis-171,000 reactions
and 223,000 structures
Derwent World Patents Index-references 0 Metabolism-Metabolite-53,000 transfor-
to patents, including chemical structure and mations-34,000 structures
use patents Toxicity-EPA RTECS-based-150,000
Patents Citations Index-bibliographic and structures
citation data, the Innovations index com- 0 Material safety-OHS Material Safety Data
bined entries from WPI and PC1 Sheets
Software from MDL includes the ISIS cal diversity techniques to chemical popula-
scientific information system (ISISPraw, tions to characterize and populate chemical
ISISIBase, and ISISPirect), Cheshire for space (102)
chemical structure manipulation, and Chime
and Chemscape for Web access. Combinatorial 3.6 Sequence and 3D Structure Databases
and high-throughput chemistry programs in- Sequence databases of biological macromole-
clude Afferent, Central Library, Project Li- cules are useful when defining new therapeu-
brary, Reagent Selector, and Elan. Biological tic targets. Databases for DNA, RNA, and pro-
data management programs include Apex and teins are available from such sources as the
Assay Explorer; literature access through National Center for Biotechnology Informa-
LitLink; reaction access through Reaction tion (NCBI) (103) and the European Bioinfor-
BrowserIWeb; and finally, molecular modeling matics Institute (104). Numerous online pro-
through Sculpt (101). grams and tools are available to researchers to
Tripos, Inc. Originally the major provider search and align sequences, generate phyloge-
of molecular modeling software, Tripos now netic analyses (chemical evolutionary trees),
offers chemical information content in the map genes, and predict secondary structure
form of databases and the tools to manage (105). The Protein Data Bank stores the larg-
them. These include the following: est collection of crystallographic, NMR, and
molecular-modeling derived protein and nu-
Several Chapman and Hall databases in- cleic acid 3D models (106). The Cambridge
cluding ones for organic structures (180,000 Crystallographic Data Center is the primary
structures), inorganic and organometallic source for crystal structure data on small mol-
structures (40,000 structures), natural ecules, with more than 250,000 entries. The
products (105,000 structures), and pharma- Cambridge Database can be searched using
cological agents (22,000 structures) the programs ConQuest for searching, Mer-
cury for structure visualization, and Vista for
The National Cancer Institute structures in numerical display and statistical analysis
a Tripos-compatible format (107).
The Dement World Drug Index (60,000
structures) 3.7 In-House Proprietary and Academic
.
Database Systems
Chemical information software offered by
Tripos now also extends beyond just molecu- Larger chemical and pharmaceutical firms
lar modeling. Their programs include the fol- have, over the years, developed in-house sys-
lowing: tems with capabilities that are specific to the
chemist's needs. Today, the costs of develop-
The Unity 3D database system, which fea- ing from scratch and maintaining an in-house
tures rapid flexible 3D pharmacophore system are prohibitive, especially because
searching commercial chemical information systems are
Concord and Stereoplex-for generating 3D highly efficient and customizable. Personal
models of database structures including chemical information software is still being
multiple stereochemical isomers developed and reported in the literature. Ex-
amples include a relational database pat-
ChemEnlighten for chemical data mining terned after the Upjohn Cousin system (1081,
The AUSPYX structure data cartridge for and CheD, which is a SQL-based system with a
Oracle Web client (109).
A suite of programs for combinatorial Commercial personal database systems are
chemistry-Legion to build and store vir- available from several vendors, as described
tual libraries, CombiLibMaker to enumer- above. These products extend the productivity
ate structures, Selector to define diversity of an individual chemist or a small workgroup,
measures to select diverse subsets of struc- but are not designed for corporate or enter-
tures, and DiverseSolutions to apply chemi- prise applications. Other personal chemical
database programs that are available include is important to carefully consider the use of
ChemFinder from Cambridgesoft, Chem- any given property for drug discovery pur-
Folder from ACDLabs, ChemWindow from poses. Too often, properties are calculated
Softshell, and Aura-Mol from Cybula (110). simply because they are available, then used
in a QSAR analysis, and possibly applied to
firture predictions-all without proper consid-
4 CHEMICAL PROPERTY ESTIMATION eration of their precision, accuracy, and rele-
SYSTEMS vance to the chemical problem.
Given this caveat, it must be noted that
The design and screening of drug candidates is
there are a multitude of programs available
increasingly being conducted in silico. This is
for the calculation of properties of structures.
made possible by improvements in programs
Some programs compute only a single prop-
for property calculation and estimation. Here,
the term property calculation refers to the erty, like Lo@. Others calculate a series of
generation of some topological (depending values in a given genre of property, like molec-
only on the 2D structure), topographical (de- ular connectivity (111)or BCUT descriptors
pending on the 3D conformation), or physico- (112). Still others compute a vast range of
chemical property of a molecule-directly properties that include topological, topo-
from the structure. The term property estima- graphical, and physicochemical descriptors
tion refers to the generation of some property alike. It is beyond the scope of this chapter to
as a function of other properties-either detail all the programs and vendors that pro-
through a regression equation, a formula, vide property calculation and estimation soft-
neural network calculation, or some other in- ware. Many of the calculations are provided as
direct means. part of molecular modeling and QSAR pro-
The distinction between calculation and es- gram systems. Some programs and vendors
timation is important because some proper- whose products are solely for property calcu-
ties, like molecular weight, polar surface area, lation are described below.
molecular connectivity values, counts of
chemical functional groups, partial charges, 4.1 Topological Descriptors
and other quantum mechanical descriptors,
can be calculated precisely and de novo from Descriptors based on the 2D structure or sim-
the structure alone. Most of these properties ply on the connectivity matrix of a structure
have some fixed definition or algorithm that have long been used for chemical similarity
enables their calculation to be performed un- and for property correlations. Because they of-
ambiguously, with little or no error. What er- ten lack any relationship to mechanism, these
ror is present is usually systematic or deter- descriptors are best used within a congeneric
ministic. A second class of properties, series or at least a set of similar structures.
including LogP and other additive-constitu- They may be empirically useful for cluster
tive properties, may be calculated by fragment analysis and chemical library design, because
additivity with various correction terms. they are effective at representing structure
These properties differ from de novo proper- differences and similarities. A few programs
ties because they are approximations to the and providers of topological descriptors in-
true (sometimes measured) values. Often, clude the following:
there are multiple approaches to their calcula-
tion. The errors in the calculation of these Barnard Chemical Information-provides
properties are statistical or stochastic. A third chemical Fingerprint Generation Pack-to
class of properties includes those that can only compute fragment-based fingerprints for
be estimated from other properties, using a cluster and diversity analysis (113)
regression analysis, neural network, or other 0 DRAGON-implementation of about 1400
linear or nonlinear function of variables. The descriptors of Todeschini and Consonni
errors in these properties can be complex and (114) including constitutional, topological,
difficult to determine. For all these reasons, it autocorrelation, geometrical and functional
4 Chemical Property Estimation Systems
groups, and including simple molar refrac- QSAR, toxicology, oncology, and other bio-
tivity, polar surface area, and Moriguchi logical properties (122)
Sirius-Analytical-provider of instruments
Molconnz-EduSoR LC-pmvides MOLCONNZ for Lo@ and pKa determination, and the
molecular connectivity and electrotopologi- Absolv program to predict physicochemical
cal state descriptors of Kier and Hall (115) properties (123)
Most of the commercial molecular model-

4.2 Physicochemical Descriptors
ing systems also provide some property calcu-
As a complement to topological descriptors, lations, which range from simply calculating
physicochemical descriptors often have a the polar surface area of a structure to a full
strong relationship to mechanism, and are range of topological and physicochemical de-
widely used in lead optimization and QSAR. scriptors. These may be based on fragment ad-
The classic triad-steric, electronic, and li- ditivity, like most of the programs mentioned
pophilic descriptoreare considered the foun- above, or they may involve correlations with
dation of QSAR, and adequate coverage of the quantum mechanical or even molecular dy-
space of these factors is still a major goal in namics-based calculations.
drug discovery. The most common physico-
chemical descriptor is Lo@, the 1-octanollwa- 4.3 Absorption, Distribution, Metabolism,
ter partition coefficient. Because it is so impor- and Excretion Properties
tant, a number of programs and vendors
provide Lo@ calculations based on a variety Perhaps the most critical aspect of drug devel-
of methods. Many of these programs also com- o p m e n t t h e behavior of the drug in vivo-is
pute other physicochemical properties, such also one of the least predictable. Each year,
as pKa and solubility. many drug candidates reach the very expensive
stage of clinical trials, only to be discontinued
BioByte, 1nc.-developers of CLOGP, pre- because of problems with absorption, distribu-
mier Lo@, and molar refractivity calculator tion, metabolism, or excretion (ADME). Toxic-
(116) ity is often added to this acronym (ADMET),
a Syracuse Research Corporation-provide because we increasingly find critical differ-
KOWWIN and 11 other structure-based ences in the way children respond versus
property calculations (117) adults, males versus females, etc. Among the
CompuDrug Ltd.-the PALLAS System- hopes that accompany the deciphering of the
including programs for for pK,, lo@, lo@ human genome, is that drug selection can
predictions, metabolism and toxicity, and someday be tailored to an individual's geno-
high pressure liquid chromatography type, to lessen the possibility of untoward drug
(HPLC) development (118) response. For the present, drug designers are
a ACDLabs-physicochemical laboratory pro- focusing increased attention on the prediction
gram calculates pKa, LogP, lo@, aqueous of ADME properties, pharmacokinetics, and
solubility, boiling point and vapor pressure, in vivo behavior. Compared with topological
Hammett electronic constants, and a vari- and physicochemical predictions, ADME cal-
ety of liquid properties (119) culations are still rather crude and approxi-
XLOGP-The Peking University LogP cal- mate. They are usually based on correlations
culator-a similar version for proteins is with other properties. And, if the method for
available as PLOGP (120) obtaining the correlation is a neural network,
0 EduSoft LC-provider of Hint!-Lo@-to the predictions may be superior to simpler re-
accompany the HINT! Hydropathic interac- gression-based approaches, but the interpret-
tion modeling program (121) ability of the model is missing. Some of the
SciVision-provider of software for chemi- programs that are used to predict ADME de-
cal property calculation, and to estimate scriptors include the following:
LION Bioscience-provider of iDEA, a mod- that facilitate repeated searching. A special

ular ADME predictive system. The absorp- -
database architecture, known as the star
tion module predicts Caco-2 cell perme- schema, facilitates OLAP activity. In this de-
ation, and performs dose-response modeling sign, one or more large fact tables contain
of the oral absorption. The metabolism mod- records of frequently searched data for each
ule predicts first-pass effects and models object (e.g., structure or reaction) in the data-
metabolic parameters. Future modules are base. The fact table is joined to smaller dimen-
planned for distribution and elimination sion tables that contain the relational infor-
(124). mation. The schema is known as a star schema
PASS-prediction of biological activity spec-
-
because the architecture resembles a many-
tra-compares a test structure with those in pointed star, with the fact table at the center,
a database of about 45,000 structures with and dimension tables at the ends of the arms.
known activityltoxicity, using topological The design of the fact and dimension tables in
descriptors and probability calculations (125). the warehouse should reflect the searching
habits of the users to get the best perfor-
4.4 Property Calculations Online mance. Probably the first mention of data
warehousing in the pharmaceutical area was
Many of the providers of software and data- that of Axel and Song in 1997 (134).
bases of chemical properties also provide on-
line calculation services. These include Day- 5.1 Data Warehouses of Chemical
light Chemical Information (126), ACD labs Information
(127), and Syracuse Research (128). In addi- A data warehouse is designed to consolidate
tion, the following sites provide online calcu- structures and data from many diverse
lations of a variety of properties: sources, including relational databases, flat
databases, and structure and data files. It is
a Molinspiration-calculates LogP, polar sur- considered to be multidimensional. A true
face area, Lipinski Rule-of-5, and a drug- chemical data warehouse might contain se-
likeness index (129) quences, 2D structures, 3D models, Markush
Alogp-VCCLab online LogP calculation structures, and reactions-all in the same da-
(130) tabase. No such commercial database cur-
a PETRA-The University of Erlangen prop- rently exists, but databases presently being
erty calculation routines (131) developed at MDL and other vendors are ex-
a USEPA Suite-including implementations amples of chemical data warehouses of struc-
of the Syracuse Research software (132) tures and their reactions. The MDL data ware-
house framework is termed the concordance.
The fact table for the concordance is the source
5 DATA WAREHOUSES table, which brings together structure and re-
AND DATA MARTS action identifiers from all the various data
sources and links them to the unique struc-
Even relational databases have their limita- tures in the warehouse (Fig. 9.16). Using the
tions when dealing with huge amounts of data concordance, a substructure search can re-
and high user traffic. The burden of continual trieve a set of unique, unduplicated struc-
data updating and registration-activities tures, along with pointers to all the relevant
known as OnLine Transaction Processing identifiers and reactions in the various data
(OLTP), can considerably slow down search- sources. Similar pointers exist to the original
ing and report generation activity-known as citations and stored data.
OnLine Analytical Processing (OLAP). For Physicochemical properties that are based
this reason, it is becoming common in the da- solely on the structure are stored in the data
tabase field to build special large databases de- warehouse, but properties that are data-
signed primarily for searching purposes- source dependent, such as citation or biologi-
so-called data warehouses (133). These ware- cal activity, are only referenced. A typical use
houses have pre-computed indexes and tables of a chemical warehouse is to search for a set of
5 Data Warehouses and Data Marts
7
Master
dictionary I Moltable
dictionary
Tablename Struct-id
Structure
Source-id Formula
Location
...
pource table\,
Reactions'\,
Source-id
External-id External-id Struc-id
Role ... Prop-id
... Prop-value
...
/ \
1
I
, /
/
,
\
\
\
\
Property
dictionary
Prop-id
Method
Table
Figure 9.16. Star schema design of a chemical data warehouse. The central source table allows
access to the External-ID of every molecule, arranged by source database. These External-ID values
can be used to build multidimensional views of the data. For example, to see all the reactions with
products that can be found in source database ACD, one would combine data from the source
dictionary table (Source ID for database ACD), the reactions table (Struct-ID, and Role), and molt-
able (Struct-ID) table, using identifiers (External-ID)from the central source table.
structures that satisfy a search query, then sive, and a much smaller database is sufficient.
drill-down using a web browser to access the Such a data mart has the same architecture as
original data sources. In the case of reactions, a data warehouse, but it has only a single di-
the user might retrieve and browse a list of mension of structural data-for example, syn-
reactions that contain the structures that thetic reagents. The MDL Reagent Selector
were found in the search. In addition to drill- program is one example of a data mart of re-
down, a "hop-into" facility allows passing a set agent structures, with information on their
of structures into a search program or web price and availability from various suppliers
browser that is native to the source database (Fig. 9.17). It has a fact table that links struc-
being accessed. tures to their identifiers in the various source
databases. It stores properties that can be
5.2 Data Marts of Chemical Information
used to filter reagents, such as the molecular
For certain purposes, like reagent selection, a weight and Lo@, and it has pointers to sup-
data warehouse is too large and comprehen- plier information stored in the MDL Chemical
392 Chemical Information Computing Systems in Drug Discover)
Figure 9.17. Reagent Selector-an example of a chemical data mart. Various components of the
system are shown, including the data sources, the daemon program that automatically updates the
mart, the concordance database, and the clientlserver architecture, which is implemented in a three-
tier system.
Products Index (CPI) database. To aid in re- mon to see so-called "multitier" architectures
ducing the size of a hit list, a Reagent Selector in which the client program (the "application
user can filter reagents and sort on properties, tier") may be a very "thin" Web client that
availability, presence or absence of functional communicates to a more extensive "middle
groups, etc. (Fig. 9.18). Further list reduction tier" of programs that serve the immediate
can be achieved by clustering the structures needs of the client (see Glossary). Requests for
by means of a cluster analysis using substruc- searching and registration, which demand da-
ture keys as descriptors. tabase server resources, are passed from the
An important feature of Reagent Selector is middle tier to a "database tier" that corre-
the daemon program, which runs in the back- sponds mostly to the server part of former cli-
ground. This agent-like program "awakens" ent-server architectures. There are many ad-
on a fixed schedule and checks the various vantages to this arrangement. The programs
source databases for new or deleted structures can be distributed onto different computers to
or for changes in the structures and data. If optimize performance of the system. The mid-
any changes or additions are found, the dae- dle tier can be modified independently to ac-
mon updates the data mart accordingly, so us- commodate changes in the client and server.
ers will see the latest information when they From a development point of view, the various
run searches. Another aspect of chemical tiers in the architecture can be developed and
warehouses and data marts concerns their maintained on their own schedule, with mini-
physical architecture. It is increasingly com- mal dependence on other components.
6 Future Prospects 393
Figure 9.18. Filtering structures as part of the reagent selection process. The filter criteria include
criteria for structure complexity, logP, Hdonorlacceptor, molecular weight, formula, and substruc-
tures.
6 FUTURE PROSPECTS ries. An interesting approach pioneered by

the Merck group involves generating finger-
It is always difficult to predict the direction of prints for key words in documents, then
advances in information management. Much searching for combined structure/document
of the improvement in chemical structure similarity (136). This approach and similar
management and searching has been because ones will be simplified by increasing integra-
of advances in hardware and computer sys- tion with relational database systems, as de-
tems. Moore's Law, which states that com- scribed below.
puter power roughly doubles every 18 months,
0 Knowledge Discovery in Databases: In the
has held since the 1970s, but threatens to
past, dating back to the DENDRAL project
break soon (135). As computer manufacturers
hasten to avert the leveling off of computer (137), attempts to apply artificial intelli-
performance gains, new technologies will gence and machine learning to problems in
surely affect the way chemical information is chemistry and drug discovery have gained
stored and searched. Based on current trends, only moderate acceptance. One problem
it is possible to make some short-term predic- with expert system approaches has been the
tions about directions in this field. small base of information to access. In some
cases, this has been a single expert chemist
r Information: Integration of chemical struc- or a handful of example structures. This sit-
ture data with other types of data. There is a uation is changing as data accumulates in
welcome tendency to treat structures, mod- databases. As Fig. 9.19 shows, data can be
els, and reactions like other relational data. organized, indexed, and stored in databases
The biggest advantage of this approach is to produce information. This information,
being able to run integrated searches using depending on how relevant, unique, and
structures and data together in search que- complete it is, can be analyzed and modeled
394 Chemical Information Computing Systems in Drug Discov
Chemical structures &

data
Data warehousing
Chemical/biological
<
Organize
Figure 9.19. Turning chemical data Reduce information
into knowledge. Data becomes orga- Transform mining
Analyze
nized and indexed to produce infor- Filter Dislplay Chemical/biologic
mation. Mining and analyzing infor- Model knowledge
mation yields knowledge. Validate
to generate knowledge that might not other- lished in the marketing, sales, and telecommLU-
wise have been evident. Once this knowl- nications fields (139). Data mining is being
edge is materialized, it can be managed, used increasingly by scientists, especially in
shared, and deployed for future applica- genomics and proteomics. Example applic:a-
tions. This process is termed Knowledge tions include the clustering of DNA array da~ t a
Discovery in Databases (KDD), and it is be- and using database information for protc!in
coming more widely practiced (138). secondary structure prediction (140). Exam-
ples are starting to appear in the field of d rug
Data mining is the mechanism by which design. Depending on the stage of a drug dis-
knowledge is derived from databases. It is gen- covery project, one can mine chemical stn1C-
erally defined as the extraction of predictive ture data for diversity, similarity, or specific-
models and associations from large volumes of ity, as shown in Fig. 9.20. This figure sho7WS
data using statistical and pattern recognition that the lead discovery, refinement, and opti-
techniques, usually for some competitive ad- mization phases of drug discovery proceed
vantage. Data mining is already well estab- through mini-cycles, each with their own daka
1 receptor
Mine for Mine for Mine for

Goal: diversity similarity specificity
Figure 9.20. Mining chemical information in the drug discovery cycle. Each mini-cycle proceeds
until a sufficient number of suitable compounds become available.
6 Future Prospects
mining requirements. So far, data mining has value of the SMILES column in that table
mostly been applied to library design, QSAR, has an 80% similarity to acetic acid."
and ADME prediction (141). Whether the Again, the "= 1"parameter is an artifact.
techniques become more widely used will de-
pend on the accuracy of predictions made us- This approach greatly simplifies the devel-
ing them, on the availability of convenient opment of applications. Also, the searches can
software, and most of all, on clean and rele- take advantage of optimization that is built
vant data. into the relational database system. Fig. 9.21
shows a Web browser that uses the MDL reac-
Software: Integration with relational sys- tion data cartridge to perform structure and
reaction searches. The use of a direct search-
tems. A consequence of treating structures
ing approach with an object-relational data-
as relational data is a tighter integration of
base for combined retrieval of chemical and
once-specialized structure management biological information was reported by Cargill
software techniques with relational data- and MacCuish (142). In the field of data min-
base systems. In Oracle, so-called "data car- ing, the generation, storage, and deployment
tridges" are being increasingly used to allow of predictive models is fully integrated into
a chemist to treat structures like other rela- SQL Server 2000 and Oracle 9i, and this trend
tional data in a search. Structures, models, will soon extend to other relational database
and reactions can all be input, registered, systems (143).
and searched using standard SQL to which Another advance in chemical information
special operators have been added. SQL software that promises to have considerable
stands for Structured Query Language-the impact on drug discovery is "meta-layer"
standard language for querying relational searching, as described by Hoctor (144). In
systems (see Glossary). For example, in the this approach, queries entered by the chemist
Daylight relational data cartridge, substruc- are submitted first to a middle-tier search en-
ture and similarity searches in a reaction gine, the meta-layer, which automatically and
database can be conducted directly in SQL transparently generalizes and transforms the
as follows: query into several queries. These are then
submitted to various databases to retrieve
Substructure-to find reactions contain- "more of the same" kinds of information. The '
ing benzoic acid as a product: results are automatically formatted and pre-
SELECT * FROM RXN WHERE CON- sented to the chemist in the context of a Web
TAINS (SMILES, '>>O=C(O)clcccccl') browser (Fig. 9.22). Thus, a name search
= 1; might get converted automatically to a struc-
This statement translates to "Select ev- ture, for substructure or reaction searching, a
erything from the table named RXN, literature citation search, or a patent search,
where the SMILES field contains the sub- etc. The linking of searches across indirectly
structure string for benzoic acid as a related literature can also be used to generate
product". The "=1"clause is an artifact new knowledge (145).
of the data cartridge implementation; it
does not necessarily mean that only a sin- Hardware and Operating Systems: The
gle occurrence of the benzoic acid sub- value of parallel and distributed processing
structure should be found. was reported early in the development of
Similarity-to find how many reactions structure search systems (146). Since then,
have a solvent that is 80% or more similar some commercial products have adopted
to acetic acid: parallel processing. These mostly involve
SELECT COUNT (*) FROM MEDIUM CPU-intensive searching like conformation-
WHERE SIMILAR (SMILES, 'OC(=O)C', ally flexible 3D searching and docking. With
0.8) = 1; the exception of such tasks, the speed of
This translates to "Tell me the number of most chemical information searching is de-
rows in the table MEDIUM where the termined by data input and output (i.e, "110
Chemical Information Computing Systems in Drug Disco1
... "' , ."

Reaction Q u e y -
4
; F React~onSubstructure Search
! C React~onFlexmatch (Exact) -- r Subset
: Reaction Similarity -- Similarity thresholds:
.:, 80...............
I
20 r Sorted
Options
RSS Query H~ghl~ght~ng
Atom-Atom Number~ng
V Bond-Change Marks
Figure 9.21. Web client for an application that searches a relational reaction database. SQL state-
ments are used to select structures and reactions that satisfy the search query.
bound") because chemical structures are a dows 98 and Windows 2000 software strei
highly "verbose" type of data. As chemical and will give them continuing dominana
information systems integrate more with re- the PC market for a while. Linux is quic
lational systems, they can take advantage of catching on as an inexpensive alternative, i
the parallel and distributed processing capa- it has a strong foothold in the molecular n:
bility of the relational system. An important eling area, but it requires operating sysi
development is the "24-7" availability of expertise and lacks a business software bi
data in chemical databases (24 h/d, 7 d/wk). Small handheld personal data assistants
This can only be accomplished by distribut- becoming more capable, and wireless comr
ing and replicating databases across a ing is on the rise. The standard desktop cc
network. puter has at least a 2.0 GHz processor v
512 Mb or more of RAM, about a 30-72
It would be pure speculation to estimate hard drive, and a combination readtwrite
the impact of changes in hardware and oper- with DVD. A relational database of 1 mill
ating systems on chemical information man- structures consumes about 3-4 Gb of c
agement. Presently, Sun Microsystems is space and can be substructure searched i
probably the dominant Unix system in chem- few seconds, returning a hit list contain
ical database management, largely because of thousands of structures. It takes much lon
their network presence and their support of for a chemist to wade through the result
Java. Microsoft has released their Windows such a search or to process analytical or bic
XP operating system, which merges the Win- say results from a single combinatorial chc
7 Glossary of Terms 397
Figure 9.22. Using meta-layer searching to retrieve implicit information. A name search query is
converted to a structure, which is then transparently searched to add structure-based search results
to the literature citation. .
istry experiment than to run most data at a given position (Rgroups), specifying a
searches. In light of this, it seems evident that range of chain length size (link nodes) and
the tools that will succeed are those that will specifying atom, bond, or molecular data que-
best assist the chemist in extracting relevant, ries (Sgroups).
implicit knowledge from the data and deploy 2 0 Structure. In terms of chemical informa-
that knowledge for future benefit. tion, a collection of information about atoms
and bonds that can be displayed in a manner
7 GLOSSARY OF TERMS such that a chemist would recognize it as a
chemical structure. The atom and bond types
2 0 Query Feature. A structural feature and connections are usually explicit. The lay-
added to a 2D substructure search query to out of the atoms in the display may be explicit
generalize the query or make it more specific. (x,y coordinates) or implicit-determined at
An example atom query feature would be spec- the time of display. Hydrogen atoms may be
ifying a list of allowed atoms (Cl, Br, I) or lim- fully or partially suppressed to save storage
iting the number of attachments. A bond space.
query feature would be allowing a single or 3 0 Model. In terms of chemical informa-
double bond (S/D) or forcing the bond to have tion, all the information in a 2D structure plus
a particular stereochemistry. More complex at least one set of 3D atomic coordinates. This
query features can be used to specify which is a single conformation of the structure,
functional groups or substituents are allowed which is typically a low energy conformation
or even a crystal or spectrometrically deter- Atom List. In a substructure search query,

mined 3D structure. A 3D model may also con- a list of allowed (or perhaps disallowed) atom
tain information about multiple low-energy types. Often represented within brackets:
conformations and atom, bond, and molecular [Cl,Br,I].
properties such as partial atomic charge, HO- Atom Stereochemistry. Usually refers to
MOILUMO energy, etc. tetrahedral stereochemistry at a given atom,
3 0 Query Features. Topographical features which must be a chiral or prochiral center.
that relate atoms, bonds, and other 3D fea- The stereochemistry may be local (or relative)
tures to each other in a pharmacophore or 3D or global (based on CIP conventions). If it is
substructure search query. Typical features local, it usually is termed "parity" or some
include (1)objects such as atoms, centers of other nonspecific term, to distinguish it from
rings, electron lone pairs, and regions of exclu- true global stereochemistry (R,S).Local atom
sion and (2) measurements of distance, angle, stereochemistry is a property of the atom and
dihedral angle, radius of exclusion, etc. Mea- its nearest attached atoms. Global atom stere-
surements often have a range associated with ochemistry depends on the entire molecule
them (e.g., distance between a carbonyl oxy- and the stereochemistry at other chiral atoms.
gen and a secondary amino nitrogen is 3.4- In some systems, the atom stereochemistry is
-5.0 A). perceived from the drawing of the structure,
Agent. A computer program that can run using "up" (wedged) and "down" (dashed)
autonomously and on a schedule to perform bond marks as cues. In linear notations, char-
database searches, maintenance, and report- acters in the string can be used to specify the
ing activities that a chemist would otherwise counterclockwise (@) or clockwise (@@) ori-
have to do manually. An example would be an entation of attachments at a given center.
Internet notification service that sends the Atom-Atom Mapping. The procedure of as-
chemist an e-mail notification whenever a par- signing each atom in a substructure query to a
ticular database has been updated. given atom in a candidate structure. The as-
Application Tier. In a multi-tier architec- signed structure atoms must match the query
ture, the collection of programs that run on atom in all characteristics, including atom
the chemist's client or workstation machine. type, stereochemistry, charge, attachmeqts,
It is the tier of programs "closest" to the chem- etc. In some structures, a query may map onto
ist in the architecture. Typically this may be a the structure in many ways (multiple map-
Web client program or other program with a pings). Additionally, these mappings may
GUI that allows the chemist to interact with overlap each other in terms of atoms and
the architecture. bonds, or they may be non-overlapping. Some
Artificial Intelligence. A branch of informa- search systems stop after the first mapping,
tion science that attempts to use computer whereas other perform exhaustive mapping,
programs to perform or simulate human men- until no further mapping can be found.
tal activity. Applications in chemistry include Automap. A feature implemented in reac-
perceiving chemical structures, designing tion indexing programs like REACCS, which
structures to fit topological or topographical attempts to automatically "discover" which
criteria, designing 'novel' structures, etc. atoms and bonds are involved in a reaction
Many of the activities of A1 overlap with, or transformation (the reacting center atoms and
contain elements of pattern recognition and bonds). The chemist draws the reaction or the
data mining. reaction query as a set of reactants leading to a
ASCII. American Standard Code for Infor- set of products, then invokes the Automap fea-
mation Interchage-a widely used system of ture, which causes the reacting atoms and
encoding alphanumeric information into bonds to be marked and identified. When re-
eight-bit bitsets (bytes). The expansion of in- acting center atoms and bonds are specified in
formation to include non-English characters both the query and the reactions in the data-
requires the use of larger (16- or 32-bit) char- base, reaction substructure searching is faster
acter sets such as Unicode. and gives fewer false hits.
Backtracking. One process that is used in Biological Data. This includes the results
mapping substructure atoms and bonds to the of in vitro and in vivo assays, toxicology and
corresponding atoms and bonds in a candidate metabolism studies, DNA and protein array
structure. Given a certain query-say, an data, etc. It complements the chemical data,
amide group [--C(=O)N-1, a backtracking and increasingly, both chemical and biological
algorithm searches first for a carbon atom, data are being stored in large corporate rela-
then for an oxygen atom, then checks to see if tional databases. At any given stage in the
they are doubly-bonded, and finally checks to drug discovery process, obtaining and analyz-
see if a nitrogen is singly-attached. At any step ing the biological data has traditionally been
in the process, if the check fails (e.g., with an considered the more complex and rate-limit-
ing step in the process. The application of
ester, the final check would fail), the program
high-throughput methods to screening and
"backtracks" to the last successful step and
pharmacokinetic analysis is yielding consider-
examines another eligible atom or bond. If no
able benefit in the collection and processing of
eligible atoms or bonds are found at that step, biological data.
it backtracks to the next previous step in turn. Bitset. A contiguous set of binary digits
This procedure is guaranteed to find a map- (bits, 011) in computer memory. Bitsets are of-
ping, but it can be slow, especially with large ten used in chemical information to store col-
or highly symmetric queries or structures, lections of yestno, presencelabsence, and ac-
where a multitude of similar paths must be tivelinactive responses in a compact form.
examined. Alternative approaches that use an Bitsets are used to store substructure search
indexed tree can be faster, especially for large keys for each structure (fingerprints), which
databases. are used in similarity calculations. Bitset in-
BCUT Descriptors. Descriptors of chemical dexes are common features of relational data-
structure that are derived from an eigen anal- bases, where a collection of bits, one for each
ysis of the connection table of the structure. structure in the database, can store the pres-
The class of BCUT descriptor depends on the ence or absence of a given piece of data for
quantities that are stored in the table (simple each structure, or, in the case of a substruc-
connection information versus electronic or ture search, the a compact representation of
steric interaction values). BCUT descriptors the result set from the search. The advantage
have found value in molecular diversity and of bitset representation is that computers can
chemical library design. perform very fast logical operations (union
Binary Data. Data stored in a file or data- and intersection) on bitsets, which enables fil-
base that is not chemist-readable, and usually tering and subsetting of large lists of struc-
cannot be converted to printable characters. tures and data.
Examples include connection table storage in BLOB. A Binary Large Object data type.
a database, substructure search keys, and a This data type is used in Oracle, for instance,
graphics image of a structure. Note that some to store large amounts (e.g., up to several Gi-
other data that is also not chemist-readable, gabytes) of binary data. Storage of the connec-
like certain linear notations (e.g., a Chime tion table and all the perceived structural in-
string), may be made up of printable charac- formation for a registered structure is one
ters and is not strictly binary data. example. Another example is the storage of
Bioinformatics. The application of statisti- the entire fastsearch index for a database,
cal and mathematical techniques to turn se- which can be accessed as a single object by the
quence data into useful biological informa- Oracle data storage and retrieval routines.
tion. The general goal of bioinformatics is to Bond Stereochemistry. This complements
define the structure, location, and function of the atom and molecule stereochemistry of a
the proteins and nucleic acids that are the structure. A given double bond can be as-
products of the processing of a genome. The signed Z or E, or cis or trans stereochemistry
application of bioinformatics in drug discovery based on the attachments. If the stereochem-
is primarily the identification of new thera- istry is unknown or is a mixture, it can be
peutic targets. assigned a value of "either." In a substructure
search, the bond stereochemistry can be spec- a database, or more commonly, a subset of
ified in the query to limit the scope of the these. It might consist of diverse structure
search. In some registration systems, the bond types or it might represent the enumeration of
stereochemistry of a given structure is per- one or a few generic structures. Libraries can
ceived from the input drawing of the struc- be classified according to the stage of discov-
ture. In the case of linear notations, it can be ery-i.e., diverse libraries for lead discovery,
specified by characters in the string (e.g., focused libraries for lead development, and op-
Cl\C!=C\Clspecifies trans dichloroethene). timized libraries for lead optimization.
BRN. The Beilstein Registry Number, Chemical Space. A loosely defined concept
which can be used to access structures in the that all the known or possible chemical struc-
Beilstein database. tures define some multidimensional svace . in
Business Rule. An established convention which the structures are points. Structures
for the representation of data in a given com- that are topologically or topographically simi-
pany or laboratory. In the case of chemical lar to each other (i.e., look similar), cluster in
structures, an example of a chemical business chemical space, and by the principle of chem-
rule would be "all nitro groups should be ical similarity, should show similar physico-
drawn as -N(=O)(=O)-and not the charge chemical and biological properties. This is the
separated form -Nf ( 4 - ) ( = O ) . " In the basis for diversity analysis of chemical librar-
case of biological data, a business rule might ies. The challenge is to select or discover prop-
enforce the units in which a given piece of test erties of the structures that define the chemi-
data is reported (e.g., dosage in mmol/kg). cal space and can be used.
Business rules can be enforced by preprocess- CIP Stereochemistry. Cahn-Ingold-Prelog ste-
ingdata before it enters the database, or in the reochernistry conventions. An IUPAC approved
case of multiple, diverse data sources feeding and widely used set of rules for assigning stereo-
into a data warehouse or data mart, the data isomers based on atom and group priorities (see
can be transformed to the correct form before http://www.chern.qmw.ac.uk/iupac/stere~~.
storage in the warehouse. Cleaning and Transforming Data. When im-
Canonical Numbering. Reordering the porting data from diverse data sources (files,
numbering of atoms in a structure to a unique databases, spreadsheets, LIMS systems, etc.)
order, based on the extended counting of the into a database or data warehouse, the qata
number of attachments at each center the usually needs to be standardized, checked, and
atom and bond types, etc. sometimes transformed to some common for-
CAS Number. Chemical Abstracts Registry mat and content. This allows faster search and
identification number-very widely used to retrieval, and serves as a check of data integ-
identify chemical structures. rity. The rules that define the cleaningltrans-
Chem(o)lnformatics. By analogy with bioin- formation process are often termed "business
formatics, this is the application of statistical rules," and in the case of chemical data, they
and mathematical techniques to turn chemi- may include checking and modification of
cal structure data into useful chemical and bi- chemical structures.
ological information. It makes use of tech- Client-Server Architecture. A computer ar-
niques from statistics, pattern recognition, chitecture in which a "server" computer (usu-
artificial intelligence, and data mining to de- ally a larger and faster machine at a central
rive useful predictive relationships between location) runs programs that communicate
structures and their biological or physico- over a network with numerous workstations
chemical properties. Broadly considered, or "client" machines that reside in offices and
cheminformatics also includes the input, stor- laboratories. The server computer performs
age, management, and searching of chemical heavy duty computing tasks such as database
structure information. searching and molecular and data modeling,
Chemical Library. A collection of struc- in response to commands from the users of the
tures, real or virtual, that is the current start- client comvuters. It then communicates the
ing point for high-throughput screening or results back to the client machines. There, de-
analysis. A library may be all the structures in pending on whether the client is "thick" (a
relatively large and capable application that can be "parsed" by a freely available computer
can display and manipulate data and struc- program that can return the structural infor-
tures), or "thin" (a small program, possibly mation on demand.
running in an Internet browser), the data is Combina torial Chemistry. The application
displayed, manipulated, and reported. Client- of high-throughput, parallel methods to the
server architecture is two-tier, and is being synthesis, analysis, screening, and testing of
supplanted by more versatile multi-tier ap- materials. This approach relies on robotics
proaches. and computer-assisted methods to generate
Clipping. The computer application of a and analyze the results. Synthesis, analysis,
chemical transformation to a set of structures. and testing of samples occurs in the wells of
One example would be the conversion of a set microtiter plates, which may contain as few as
of o-subsituted phenols to a generic represen- 96 samples or as many as a few thousand.
tation with the ortho substituents collected Solid-phase and solution methods are used,
into an Rgroup attached to the parent phenol and samples may be 66 one-bead-one-com-
structure. The reverse process, going from the pound" or they may contain mixtures, which
generic structure to all the specific non-ge- require "deconvolution" to determine which
neric structures, is termed enumeration. Clip- component is responsible for observed activ-
ping also includes functional group transfor- ity.
mations, such as converting a ketone to an CONCORD. Rapid 2D to 3D conversion
alcohol. In the process of cleaning and trans- program introduced by Robert Pearlman's
forming chemical structure data, clipping may group in 1987. It generates low energy-ap-
be involved when chemical business rules are proximate 3D models from 2D connection ta-
"enforced." bles. It can also do stereo "multiplexing,"
CLOB. A Character Large Object data where multiple configurations of stereochemi-
type. This data type is used in Oracle, for in- cally ambiguous structures are generated.
stance, to store large amounts (e.g., up to sev- Marketed by Tripos, Inc.
eral Gigabytes) of character data. Storage of Concordance. A data warehouse architec-
structures in a relational database in mole- ture used in MDL relational chemical and re-
cule-file format is an example. action databases. The central "fact" table of a.
Cluster Analysis. The process of discovering concordance has a record for each unique
"natural" groupings of points in the space of structure in the database, with pointers to the
some measurements or descriptors. In chemi- instances of the structure in various "source"
cal information management, one often clus- databases.
Connection Table. A table or matrix con-
ters chemical structures for diversity analysis
taining topological information about a chem-
or to subset the results of a search. Structures
ical structure. A structure can be considered a
are most often clustered using functional "graph" in 2D space, with atoms as "nodes"
group fingerprints as the descriptors. Cluster- and bonds as "edges." The atom connection
ing methods usually consist of either parti- table has one row and one column for each
tioning methods like k-means and Jarvis- atom. The diagonal elements of the table are
Patrick, or hierarchical methods, which may usually the atomic number, and the off-diago-
work by successively dividing the points (divi- nal elements have a zero or null if two atoms
sive clustering) or by successively aggregating are not connected; otherwise they contain the
points (agglomerative clustering). Cluster order (1, 2, 3, aromatic, etc.) of the bond con-
analysis is an important part of unsupervised necting the row and column atom. A less com-
data mining and pattern recognition. mon connection table is the bond connection
CML. The Chemical Markup Language. table, in which the rows and columns are the
Based on XML and HTML, it provides a stan- bonds in the structure, the diagonal elements
dard self-documenting molecule file and infor- are the bond order, and the off-diagonal ele-
mation interchange format. Information is de- ments contain information about the atoms at
scribed by tags and values. A CML document the ends of the bonds.
CONVERTER. A rapid 2D to 3D conversion

A
network, it must then be "decompressed" by
program marketed by Accelrys. It uses a dis- reversing the steps in the compression pro-
tance geometry approach to modeling, which cess. A chemical information example of com-
covers a wider range of conformations than pression is the conversion of an MDL molfile
other methods. to a Chime string, which uses ZIP file com-
CORINA. A rapid 2D to 3D conversion pro- pression methods.
gram developed by the Gasteiger research Data Mart. Typically, a one-dimensional
group at the University of Erlangen. It can data warehouse-collecting data from multi-
handle macrocyclic ring structures, which can ple sources,
be problematic in other conversion programs. loading (ECTL) the data, and then indexing it
Chemists can access CORINA online (http:/I for analytical (OLAF') and data mining pur-
www2.chemie.uni-erlangen.de/software/ poses, to be used by a given group or depart-
corinatfree-struct.htm1). ment. In chemistry, an example would be an
Daemon. From Unix, a program that runs inventory database, with structures, location,
continually as a background process to per- and purchasing information from many ven-
form routine functions on demand or on a dors for use by synthetic chemists. Like a data
schedule. In the context of a chemical data warehouse, a data mart often has a central fact
warehouse, an example would be a registra- table with each record containing
tion program that periodically checks input into other dimension tables that contain rela-
databases to see if there are any new struc- tional data about the items in the fact table.
tures that need to be added to the warehouse. The fact and dimension tables are connected
If there are, the daemon extracts the struc- in an organization called the star schema,
tures from the source databases, transforms which is a common design for data marts and
and "cleans" them if needed, and registers warehouses. Data marts are often subsets of a
them to the warehouse. data warehouse.
Data Cartridge. A popular term for user- Data Mining. The extraction of previously
customizable search "operators" that can be unknown predictive relationships from a large
added to the SQL language of a relational da- data set or database. Data mining makes use
tabase system. An example in chemical infor- of descriptive unsupervised methods such as
mation is the addition of a substructure search association and cluster analysis, as well as we-
(SSS) operator to integrate this type of search dictive supervised methods such as decision
directly into a relational database search. One trees, curve fitting, neural networks, and
advantage of this approach is that the search Bayesian methods. Data mining was once con-
"strategy" that the relational search program sidered "data snooping" and had a poor repu-
applies can take the complexity of the custom tation. The need to analyze huge volumes of
operator into account (the "cost") when per- data and the success of these methods in mar-
forming the various search operations. keting and finance have prompted scientists
Data Compression. The process of trans- and statisticians to reconsider its use.
forming a large amount of data Data Warehouse. A large relational data-
into a smaller dataset, in such a way that re- base that collects data from multiple diverse
versing the transformation results in no loss sources and organizes it for optimal analytical
of information in the original data. A simple searching and reporting (OLAF').A data ware-
example of a compression operation is to re- house is a superset of a data mart, containing
place a string of blanks with a count plus a archival and unchanging data that is impor-
number that designates the character to be tant to several groups of researchers (i.e., mul-
repeated. Compression programs include per- tidimensional). Data that enters a data ware-
sonal computer programs like PKZIP and house does not usually come from original
WINZIP, and Unix utilities like gunzip. Com- sources (i.e., chemists, instruments, or as-
pression methods are often used before stor- says). It usually comes from intermediate data
ing data in a database and before transmitting sources and undergoes cleaning and transfor-
data over a network. When the data is re- mation (ECTL) before registry into the data
trieved from the database or received on the warehouse. Typically, data is not deleted from
7 Glossary of Terms
a data warehouse, because historical trends drill-down. The opposite process, which aggre-
are important. For this reason, the warehouse gates data, is termed roll-up.
grows very large over a long period of time, ECTL. The process of Extracting, Cleaning,
and thus its organization and indexing are Transforming, and Loading data into a data
crucial considerations. An example in chemis- mart or data warehouse. The data in a mart or
try would be a single database containing warehouse should be standardized, complete,
structures, models, reactions, and data, all unambiguous, etc. Raw data from files, instru-
cross-referenced, and used by chemists, biolo- ments, databases, the Internet, etc., must usu-
gists, and modelers. Typically, each group ally be preprocessed before it is "clean"
enough to be used in decision making. Struc-
would extract their own data mart from the
tures present special problems because tau-
warehouse, containing information relevant
tomers, isomers, salts, etc. may all represent
to their needs. Data warehouses are often used valid forms. The use of chemical processing
in decision support systems (DSS) to provide languages, which can search for substructures
data on which to base important corporate de- and make modifications of specific atoms and
cisions. bonds, enables the enforcement of chemical
Database Tier. In a three-tier programming business rules during the ECTL process.
architecture, the database tier resides on a Encryption. The conversion of data in a
server computer with access to the databases readable or decipherable code into another,
and the programs that manage them. possibly undecipherable, code. The most com-
Deduplication. When registering into a mon encryption involves sensitive pieces of
chemical structure database, the process of data like passwords and identification num-
finding whether the given structure already bers. In chemistry, it is sometimes necessary
exists in the database. This usually involves to encrypt larger pieces of information, such
performing an exact match search with the as chemical structures and the results of as-
given structure as the search query. Note that says-at least during passage of such informa-
the definition of exact match may vary with tion over networks or the Internet. Decryption
the database, and it may even be configurable. of the information typically requires one or
For example, some databases may consider more keys, which are often built into the en-
tautomers to be acceptable as exact matches, cryption and decryption software. .
whereas others may require a more strict def- Enumeration. The systematic substitution
inition. of all the Rgroup members in a generic struc-
Dimension Tables. In a data mart or ware- ture, giving each possible specific structure
house, the dimension tables store non-redun- the generic structure represents. If some of
dant information about the entries in the fact the Rgroups are not converted in the process,
table of the database. For the chemical exam- it is termedpartial enumeration.
ple of an inventory data mart, the fact table Equivalence Class. In the canonicalization
stores the various source database identifiers of structures that have some element of sym-
of each unique structure in the data mart. A metry, certain atoms that are topologically
dimension table of molecular formulas would equivalent may yield the same canonical num-
store the formula for the unique structure in ber. These atoms are considered to be in the
the mart, rather than storing the same for- same equivalence class. The concept of equiv-
mula for each occurrence of that structure in alence class is used, for example, in the Day-
the various source databases. light Chemical Information Systems handling
Drill-Down. Accessing data with increas- of reactions, to examine equivalent atoms
ing amounts of detail. When examining and when mapping reactant and product atoms.
browsing the results of a database search, a Exact Match Search. One type of structure
chemist can often request further information searching in which a query molecule is
about a structure, even though that informa- searched for in a database of structures. To
tion was not included in the search. The pro- exactly match the query, the target structure
cess of accessing further information, often must be topologically identical and not be a
stored in a hierarchical manner, is termed substructure or superstructure of the query.
404 Chenmica1 Information Computing Systems in Drug Discovery
Extended Stereochemistry. A type of tetra- Filter. A query or set of criteria designed to

hedral or higher level stereochemistry that ap- select a subset of a given set of data or results.
pears, for example, in allenes, where the ste- Filters are usually applied to limit the number
reochemical center is not a single atom but a of hits from a search, or to limit the input to
system of atoms and bonds that can be concep- some analysis. Sometimes filters are designed
tually collapsed to a single atom to yield a ste- to remove invalid rows of data, to randomly
reo center. select a subset, or to remove rows based on the
External Registry Number. A unique "exter- values of certain fields. A common application
nal" identifier assigned to a structure, of filtering is in reagent selection, where reac-
reaction, 3D model, assay, etc. The external tive groups, multi-functional structures, or
registry number is usually unique across data- cost criteria may be applied to the selection of
bases, and it can be used as a key to link data compounds for reactions. A filter can usually
from one database or table to another. Anv " be expressed as a SELECT statement in a da-
given database may have its own "internal" tabase search.
identifiers that may not be unique across da- Fingerprint. A set of measurements or de-
tabases. scriptors, usually binary, that can be used to
Fact Table. A central table in a data ware- characterize and identify an object. In the case
house whose rows each represent one unit of of a chemical structure, a common fingerprint
primary importance in the warehouse. In a is a set of substructure keys that represent the
chemical warehouse, the rows of the fact table presence or absence of specific functional
might correspond to unique structures in the groups. Such keys can be used to compute the
database. In a biology data warehouse, each topological similarity of the structure to other
row might correspond to a single experiment. structures, and can be used as filters in data-
The fields in the fact table are mainly pointers base searching. Other common fingerprints
to information stored in other tables, or they include IR and mass spectral fingerprints, and
contain data that may be repeated in other fingerprints of how a structure behaves in a
tables but is stored in the fact table (i.e., de- set of biological assays.
normalized) for rapid access. The fact table Flat Database or File. Essentially a spread-
connects to other "dimension" tables in the sheet of data, in which a given row contains all
warehouse that contain specific information the data about a structure. There are no hier-
that is not duplicated. archical relationships in a flat database. MA^
Fastsearch Index. Term used in MDL data- older and proprietary structure databases
bases for a tree-structured index of all the were flat in structure. These are in contrast to
structures in the database. The nodes in the relational databases that are more commonly
tree represent increasingly complex substruc- used at present.
tures or properties, where all the structures at Flexmatch Search. Term used in MDL
or below a given node have in common. The structure searching to allow "relaxed" exact
fastsearch index can be very large, but it match searching of structures. One can spec-
makes possible very rapid substructure ify, for instance, that everything must match
searching. except bond orders, or stereochemistry, or va-
Field. In database terminology a column of lence at atom centers, etc. By turning on or off
data in a table. Fields are commonly selected various flags, one can for a given structure
in searches of the database, such as "SELECT query, retrieve isomers of various types, salts
MOLSTRUCTURE, MOLWEIGHT FROM of a the structure, or instances of the structure
MOLTABLE WHERE SSSWOLSTRUCTURE, that may contain different values of certain
'query.mol')= 1AND MOLWEIGHT<500.0". types of attached data.
Here, MOLSTRUCTURE and MOLWEIGHT Generic Structure. A structure convention
are fields in the MOLTABLE table. SSS is a that allows representation of, say, a combina-
function that operates on the MOLSTRUC- torial library, as a single, generalized struc-
TURE field to find molecules that contain the ture. The fixed parts of the structure are rep-
structure contained in the file query.mo1, as a resented by the "root" or "parent" structure,
substructure. and variable parts are represented by
"Rgroups" or "substituent groups" that can Hit List. Older term for a list of identifiers
each contain multiple substituents or frag- of structures or other objects obtained from a
ments. database search. A more modern term is "re-
Gigabyte. One thousand megabytes, or 10' sult set."
bytes of data. The largest chemical structure HTML. HyperText Markup Language. The
databases presently contain a few tens to hun- most commonly used specification language of
dreds of gigabytes of data. A typical structure the Internet. Other markup languages of in-
in a database may require a few thousand terest in chemistry includ; XML (extensible
bytes of data to store the connection table, co- Markup Language-information in general),
ordinates, and other structure-specific data. CML (Chemical Markup L a n g u a g e chemical
Graph andsubgraph lsornorphisrn. In chem- structures), VRML (Virtual Reality Markup
istry, the mapping of a structure or substruc- Language-3D visualization), and PMML
ture query to a target structure. All the atoms (Predictive Model Markup Languagedata
and bonds in the query (the nodes or vertices mining).
and edges of the "graph" of the query struc- Index. A secondary data field generated
ture) must be mapped to corresponding atoms from one or more primary data fields, to en-
and bonds in the target structure to generate a hance the searching and retrieval of the pri-
hit. mary data. An index in a chemical database
Hash Code. Converting a set of numeric or may be a characteristic of the database, such
character properties into a single, mostly as Oracle indexes, or it may be a chemistry-
unique, number, for the purpose of rapid specific index such as a tree index for substruc-
lookup and retrieval. For example, in the case ture searching. Indexes require extra space,
of chemical structures, it is common to gener- and they typically must be created and main-
ate and store a hash of the molecular formula, tained by some administrative process in the
so that when a user requests a formula search, database.
the search query typed by the user is con- Inventory Data. Typically, information
verted to the same hash number, and a single about the availability of reagents for chemical
lookup in the index gives all the structures synthesis. This includes the suppliers, pack-
that correspond to the given formula. A hash age sizes, purity, and cost of commercial re-
code is often generated as a linear combina- agents, and the location, owner, and availabil- .
tion of the possible values of each of the prop- ity of in-house reagents. Increasingly, this
erties (e.g., nlPl+n,P,+. . ., where the n's are information is being integrated with chemical
selected such that the products never overlap). structure databases and warehouses and with
If several structures have the same hash code, automated ordering and procurement pro-
they are termed "collisions," and typically re- grams.
quire further processing-like substructure Inverted Keys. When substructure search
searching-to differentiate them. keys are generated for a structure, they may
Hierarchical Clustering. One of three main be stored in normal order (where each record
types of clustering applied to chemical struc- represents a structure, and the bits or fields
tures (hierarchical, partitioning, and fuzzy for that structure represent the keys). Alter-
clustering). In hierarchical clustering, a tree natively, they may be stored in inverted or piv-
or dendrogram is constructed, with one struc- oted order, where each record represents a
ture at each of the leaves of the tree. By "trim- given substructure key, and the bits represent
ming" the tree at a given level, one can collect structures that have that particular key set.
structures into a given number of clusters, This type of storage benefits key searching,
such that all the structures in a single cluster where a user wants all the structures that
have some level of similarity to each other. have a particular key set.
High-Throughput Chemistry. Application of Isomer and Tautorner Search. A search types
parallel processing to the synthesis, analysis, where bond order, hydrogen counts, certain
and screening of structures. A subset of high- atom valences, and bond or atom stereochem-
thoughput chemistry is combinatorial chemis- istry may be allowed to vary from those speci-
try. fied in the query. Such searching allows re-
trieval of keto-en01 tautomers, cis-trans string represents a linear notation, while a

isomers, etc. The generalization of the search Chime string represents a compressed nota-
may be a function of the query of the search tion.
process (see Flexmatch search) or both. Linux. Microprocessor version of Unix de-
lava. Currently the most popular computer veloped by Linus Torvald. Linux is presently
language for Internet and middle-tier pro- used mostly in network servers and in clusters
gramming. Java is an object-oriented lan- of microcomputers used for large-scale paral-
guage developed by Sun Microsystems, that lel computation. It is gaining status as an al-
runs on multiple platforms and contains ternative to Microsoft and to Unix, for data-
built-in features for networking, database ac- base applications, because Oracle and other
cess and security, graphics, etc. Other lan- vendors provide Linux versions.
Logic in Query Features. Using AND, OR,
guages widely used in chemical information
and NOT as modifiers on the application of
programming include C, C+ +, and Perl. Java
query features. For example, one could run a
and C+ + are called "object oriented" lan-
search to select structures that contain "halo-
guages that focus on the "business objects" of gen and not primary or secondary amine, or
the application-like molecules and reactions. not halogen and any amine." The logic can be
C and Perl are more "procedure oriented" lan- a part of the query substructure, as with
guages that focus on the things the objects do Markush queries, or it can be part of the SE-
and the process that manages them. LECT statement.
loin. The retrieval of data from more than Markush Structure. Essentially a generic
one table in a relational database, into a single structure, in which a root or parent structure
result set. Depending on the structure of the plus Rgroups and their members can repre-
data in the various tables, and the nature of sent an entire combinatorial library. Markush
the search query, extra data may need to be structures were developed for patent pur-
added to the result set to fill in certain fields, poses, and the specification of substituents are
or the fields may be unpopulated in the result often more general than in the case of generic
set. structures or database queries. Markush
KDD. Knowledge Discovery in Database* structures are also used to represent generic
application of analysis and data mining tech- reactions, in which the reactants and products
niques to discover "knowledge" that may be are represented by generic structures.
implicit but undiscovered in large amounts of Member. A single substituent or moiety in
data. an Rgroup collection. If R, consists of the sub-
Key Field. Field in a table that uniquely stituents (Me,Et, Pr, and Ph), these are
identifies rows in the table (primary key) or termed the members of R,.
contributes to uniquely identifying the rows Metadata-"Data About Data". In a data-
(secondary or composite key), or that connects base, metadata describes the structure, for-
the given row to data in another table (foreign mat, storage, access, and various properties of
key). Key fields are usually indexed for rapid other stored data, or it describes the charac-
lookup and retrieval. teristics of the database itself. Metadata is
Kmeans Clustering. Type of partitioning sometimes stored in tables termed "dictionar-
cluster analysis in which an object, such as a ies." As an example, chemical metadata might
chemical structure, is placed into one of K include the tablename, fieldname, and format
clusters, based on how similar the structure is of the field containing the chemical structures.
to the average value (or centroid) of each clus- This metadata might be stored in a master
ter. The average of the cluster may be an ac- table in the database and be stored as proper-
tual structure itself, in which case the tech- ty name and property-value pairs, which the
nique is referred to as K-medoids clustering. application that accesses the database can un-
Linear Notation. Representation of a chem- derstand.
ical structure using a linear string of numbers Middle Tier. Large-scale database applica-
and letters. A linear notation is designed to be tions often consist of three "tiers" of pro-
interpreted by a chemist. Thus, a SMILES grams: (1)the application tier, which interacts
with the user, (2) the database tier, which in- Multidimensional Database. A relational
teracts with the database management sys- database in which multiple general types of
tem, and (3) the middle tier, which sits be- data are stored, indexed, and cross-referenced,
tween the user application and the database for use by several different groups. In chemis-
management. The functions of the middle tier try, an example would be a database contain-
include (1)receiving, transforming, and passing reactions, 2D structures, perhaps generic
ing queries and commands from the applica- structures or libraries, and 3D models. Such a
tion tier to the database tier, (2) receiving, database would be used by synthetic, chemical
consolidating, and formatting data from the informatic, and molecular modeling scientists.
A data warehouse is often a multidimensional
database tier and passing it to the user, and (3)
database, whereas a data mart is usually sin-
performing tasks that may be required by the
gle-dimensional.
application but are not available in the data-
Multi-tier Architecture. An expansion of a
base tier, such as managing hit lists and que- client-server architecture to include a middle
ries. Middle tier programs are often written in layer of software. The middle tier may run on
Java, a language that runs on many platforms a computer different from either the client or
and contains Internet features to communi- server computers. The middle tier isolates the
cate with the application tier and database client and server programs, so that changes in
features ( J D B C J a v a Database Connectiv- either of them do not require corresponding
ity) to connect to the database tier. changes in the other. The middle tier acts as to
Molecular Connectivity. A class of molecu- receive, authenticate, and transform data as it
lar descriptors derived from the connection ta- passes between client and server computers.
ble of a structure. For increasing path lengths To make middle tier software easy to change
(I-,2-, 3-bonds, etc.), the molecular connectiv- and maintain, it is often written in Java, a
ity values are computed as the sum of func- modern object-oriented computer language
tions of the connectivity values (number of that is available free on most of today's com-
attachments) of the atoms in the path. Molec- puter platforms.
ular connectivity descriptors can be used to Object-Oriented Language. Procedure-ori-
distinguish structures. As such, they can be ented languages like Fortran, Basic, and C op-
correlated with physicochemical properties erate by calling functions or subroutines with
that are functions of structure size, linearity, "arguments" that tell the program what to do
and degree of branching. with certain variables in memory-such as
Morgan Algorithm. A procedure for finding calculate the molecular weight of a collection
a mostly unique (canonical) ordering of atoms of atoms. In object-oriented (00)languages
in a structure. It involves an iterative process like C+ + and Java, an object such as a mole-
that begins by assigning each atom a score, cule, has a molecular weight "method" that is
which is initially computed by counting the specific to that object and is stored with the
number of neighboring atoms attached. In object in memory. In this way, a slightly dif-
successive iterations, the score at a given atom ferent object, like a mixture, could have its
is computed by summing the previous itera- own molecular weight method-perhaps a
tion scores of all the atoms to which it is at- weighted average.
tached. Eventually, the order of the scores be- Object Relational Database. A relational
comes invariant (i.e., does not change with database in which data can be collected and
further iteration). At that point, atoms that combined with methods to fit an object ori-
are topologically equivalent have the same ented model. Searching in the database is con-
score. Atoms in the structure are then renum- ducted in the context of the objects and their
bered by the order of their Morgan number. methods, rather than the raw data fields and
One advantage of canonical numbering is that stored procedures. There is usually consider-
a given structure, drawn two different ways able overhead in building, maintaining, and
(i.e., different ordering of the atoms) can be using an object relational database, so this
reduced to the same Morgan numbering, and type of organization has not so far been widely
thus be matched quickly. used in chemical or biological databases.
OLAP. OnLine Analytical Processing. An sis, and supervised methods such as curve
activity that involves routine searching, anal- fitting and classification. Engineering applica-
ysis, and reporting on data stored in a large tions of pattern recognition include recogniz-
database. The database, which may have a ing objects in pictures, and character and voice
data mart or data warehouse organization, is recognition. Chemical applications are found
optimized for the kinds of searches and re- in the fields of drug discovery, analytical
ports that it supports. It may not be optimal in chemistry, and chem/bioinformatics.
organization for transaction processing (OLTP), Petabyte. One thousand terabytes of data
which may involve registration of small (1015bytes). At present, the largest databases
amounts of data on an irregular schedule, or of chemical and biological information are gi-
for data mining, which involves the retrieval
gabytes (lo9bytes) in size.
and analysis of large volumes of data. In chem-
Pharmacophore. The minimum amount of
istry, an example of OLAP might be an inven-
tory application in which the chemist draws in chemical functionality needed in a drug to
a structure or substructure, runs one or more elicit a given biological response. This func-
filters on the resulting result set, and retrieves tionality is defined in terms of atoms and func-
and prints a report of structures and inven- tional groups and their geometric relation-
tory data. ships to each other, including distances,
OLTP. OnLine Transaction Processing. An angles, etc. A pharmacophore query is the rep-
activity that involves registration, update, or resentation of a pharmacophore in a format
simple searching in a database of transactions. that can be used to search a chemical database
In chemistry, this might be the routine regis- for structures that can satisfy the pharmaco-
tration of a new structure and analytical data hore and may elicit the desired response.
into a chemical database. Such a database is Pharmacophore searching is usually con-
optimized for registration and may not be suit- ducted on a 3D structural database using
able either for more analytical types of search- search software that combines 2D searching
ing and reporting (OLAP) or for data mining. with conformational analysis to find struc-
Parallel Processing. A technique whereby a tures that can, by rotating about single bonds,
given computer task is distributed among sev- adopt a conformation that satisfies the phar-
eral central processing units (CPUs). The macophore.
CPUs may be part of the same computer (e.g., Pharmacophore Keys. Originally designed
a multiprocessing computer in which several to speed pharmacophore query searching,
CPUs share common memory and physical de- pharmacophore keys are bitset fingerprints
vices), or they may consist of several single- that indicate the presence or absence of given
processor computers (a "cluster") that are 3- or Cpoint pharmacophores in a structure
networked to rapidly share information and stored in a database. The 3-point pharmacoph-
disk space. In database management, it is be- ore keys represent triangular arrangements of
coming increasingly common to have parallel atoms and functional groups separated by
copies (replications) of a given database at sev- given distances or distance ranges. The
eral sites, perhaps worldwide. Special data- 4-point pharmacophore keys represent tetra-
base and networking software provides rapid hedral arrangements of atoms and functional
updates of certain information like data and groups. When a structure is registered into a
periodic updates of other information like 3D database, a rapid conformational analysis
search indexes. is performed involving key single bonds in the
Pattern Recognition. The application of structure. From the interatomic distance
computers to build descriptive or predictive ranges between given atoms and functional
models i . . , find patterns) of information groups in the structure, the various bits in the
from input datasets. The techniques of pat- pharmacophore keys are set. These keys can
tern recognition overlap those used in statis- be used as filters in pharmacophore searching,
tics, chemometrics, and data mining, and in- or increasingly, as filters before docking the
clude data display, description, and reduction, structures into a known receptor (virtual
unsupervised methods such as cluster analy- screening). Pharmacophore keys have also
7 Glossary of Terms
been used less commonly as descriptors in cently, tools such as partial least squares
QSAR and data mining. (PLS), neural networks, and a variety of data
Physicochemical Properties. Originally these mining methods like decision trees and sup-
were just measured properties like melting port vector machines have come into use.
point, pKa, solubility, and octanoVwater LogP. Reacting Center. An atom in a reactant
Increasingly, they are obtained from pro- which is modified during the course of a reac-
grams that can calculate them from the 2D or tion. Specifying reacting center information
3D structure. In QSAR, the classical triad of when searching for reactions can speed the
steric, electronic, and lipophilic properties is search and reduce the number of incorrect
i still widely used, but it has been enhanced to hits.
include enery-based descriptors, measures of Reaction Scheme. A series of one-step reac-
binding interaction, 3D-QSAR multivariate tions that lead from a given reactant to a given
descriptors (CoMFA), and others. Quantum product, by way of intermediate steps. A reac-
mechanical calculations are being used in- tion search system should be able to find reac-
creasingly to estimate physicochemical prop- tant/product combinations that span several
erties. Once calculated, the properties are intermediate reactions.
used to filter structures which may have unde- Refine a Search Query. The process of add-
sirable ADMET criteria (e.g., the "rule of ing or modifying constraints of a search query
five"), or they may be used directly in models to reduce or increase the number of hits. Con-
to estimate the type or level of biological activ- straints may be added, removed, relaxed, or
ity (QSAR). tightened to achieve the desired search re-
Pivoting Data. Changing data from row to sults.
column values or vice versa. This technique Registry Number. A unique identifier as-
can be a very useful tool for summarizing data. signed to a chemical structure or other piece of
One example of pivoting is to convert sub- data when it is registered into a database. The
structure keys that are stored by structure registry number may be internal, primarily
(with a bit turned on for each key the struc- for use by the database search system, or ex-
ture contains), to storage by key (with one bit ternal, to be used by chemists and to link the
turned on for each structure that has the given data to other databases and files.
key). Another example is to convert assay data Relational Database. A common database.
that is stored by structure, to data that is architecture in which related data items are
stored by assay. In the process of pivoting stored in separate tables, accessed by key
data, it is common to consolidate values, for fields, and indexed for rapid search and re-
example, converting raw assay results to ED,, trieval. The dominant relational database sys-
values, or taking the average of some physico- tems used in pharmaceutical discovery in-
chemical property. clude Oracle, Microsoft Access and SQL-
Proteomics. The conversion of protein se- Server, and IBM DB2.
quence data into useful biological informa- Result Set. A list of records resulting from a
tion. In general, the goal of proteomics is to database search. The result set commonly con-
characterize a gene product-i.e., protein-as sists of a list of record identifiers (sometimes
to its structure, subcellular location, and func- called a cursor), which can be navigated to se-
tion. Additional information includes how a lect records. In some systems, a result set may
protein interacts ("networks") with other re- also contain related data for each record.
actions and cell processes. Retrosynthetic Analysis. An approach to
QSAR. Quantitative structure-activity re- computer-assisted synthesis design that starts
lationships-the science of deriving quan- with the products of a reaction or sequence of
titative linear or nonlinear mathematical rela- reactions and works backwards toward the re-
tionships between physicochemical and topo- actants. An example program that imple-
1ogicaVtopographical properties of chemical ments retrosynthetic analysis is the LHASA
structures and their biological activity. Origi- program of E. J. Corey's group.
nally, regression analysis was the only tool Rgroup. In a generic or Markush structure,
used to derive QSAR equations. More re- generalized substituents or moieties are given
the representation R,, R,, etc. These Rgroups light Chemical Information Systems software
represent collections of specific substituents and widely supported by other systems.
or moieties (members) that can be replaced at SQL-Structured Query Language. The
the given position. standard query specification language for
Roll-up. The agglomeration, summariza- searching relational databases. Most database
tion, or consolidation of data into a summary systems support the SQL standard but then
presentation. Roll-up often involves summa- add extensions particular to their implemen-
rizing data at a given level in a data hierarchy. tation.
Examples would include the average of several Star Schema. A standard data warehouse
ED,, values, or a simple yes/no indication that architecture, characterized by Ralph Kimball,
in which a central "fact" table is connected to
toxicity data for a given structure exists some-
various "dimension" tables.
where in the database.
Structural Data Mining. Application of data
Root Structure. The invariant portion of a
mining methodology to chemical structure
Markush or generic structure. The attached and reaction databases. Currently in its in-
Rgroups contain the substituents that vary fancy, it remains to be seen whether a "data
from one specific structure to the next. Some- snooping" approach to information and
times termed a parent structure. knowledge discovery can be as useful in drug
ROSDAL. Linear notation scheme devised discovery as it has proven to be in finance,
by the Beilstein Institute. It can contain just marketing, and merchandising.
connection table information, or it may also Substance. In some structure databases, an
contain atom coordinates. Several chemical entry that lacks a structure completely (a
information systems can convert ROSDAL "nostructure"), is only partially character-
strings to other structure file formats. ized, or is an unspecified mixture of known
Sgroup Data. In MDL structure storage, structures. Substances pose obvious problems
the attachment of structure-differentiating in database searching.
data directly to the structure. Such data may Substructure Search (SSS) Keys. Originally
relate to the structure as a whole, or to atoms, developed to facilitate substructure searching,
bonds, fragments, or collections of atoms and these consist of a string of bits that represent a
bonds. Examples would include atomic partial fingerprint of the structure with respect to ei-
charges on 3D models or percent composition ther (1)a set of known and defined functidnal
attached to components of a formulation. groups (e.g., MDL), or (2) a set of discovered
Similarity Search. A type of "fuzzy" struc- atom-bond paths that the structure contains
ture searching in which molecules are com- (e.g., Daylight). In MDL systems, the sub-
pared with respect to the degree of overlap structure keys are currently either 166 or 960
they share in terms of topological and/or phys- bits in length. In Daylight systems, the sub-
icochemical properties. Topological descrip- structure keys are of varying length and can
tors usually consist of substructure keys or be "folded" to achieve a higher density of bits
fingerprints, in which case a similarity coeffi- turned on. Although SSS keys were originally
cient like the Tanimoto coefficient is com- developed to screen candidates for substruc-
puted. In the case of calculated properties, a ture searching, they are currently used more
simple correlation coefficient may be used. for similarity calculations.
The similarity coefficient used in a similarity Substructure Search. Application of "sub-
search can also be used in various types of graph isomorphism" search to chemical struc-
cluster analysis to group similar structures. tures. This consists of finding a particular ar-
SLN-Sybyl Line Notation. Linear notation rangement of atoms and bonds as they are
used in conjunction with Tripos SPL (Sybyl embedded in a chemical structure. The ar-
Programming Language) to manipulate rangement being searched for is termed the
chemical structures. It is similar in syntax to query substructure, the structures being
SMILES notation. searched are termed the candidates, and any
SMILES. Simplified Molecular Input Line particular structure in that set is termed a
Entry System-linear notation used in Day- target structure. If the query substructure is
7 Glossary of Terms
found in the target, the target structure is sive activity like graphics and calculations on
added to the result set (or hit list). In display- individual molecules. The alternative thin-cli-
ing the results of the search, the atoms and ent architecture either does not require much
bonds in the substructure as mapped onto the local computing, or it uses a built-in resource
hit may be highlighted (shown darkened or in like an Internet browser as a client.
a different color) in the structure display. In Toolkit. A collection of computer routines
general, more than one occurrence of a sub- that each perform one or a small number of
structure may be found in a given structure, information management tasks. The routines
and substructure mappings may be overlap- are provided as a library and they can be in-
ping or non-overlapping. corporated into custom user-written applica-
Superstructure Search. Modification of sub- tion programs to carry out tasks that ordinary
structure search in which the substructure application programs may not perform. The
query becomes the target structure, and the interface between the toolkit routines and the
target structure in the database becomes the user-written program is referred to as the Ap-
substructure search query. The search finds plication Programming Interface, or API.
structures in the database that are substruc- Topographical. Structure data that is
tures of the query. A similar extension to based on the connection table and the 3D
structure similarity searching yields superstructure of a molecule. Examples include sur-
structure similarity searching. face area and volume and pharmacophore dis-
Supervised Data Mining. Searching large tances between atoms.
volumes of data for hidden predictive relation- Topological. Structure data that is based
ships. Supervised analysis requires one or only on the connection table of the structure,
more "dependent" or response variables, to be without regard to 2D or 3D coordinates of the
predicted from a set of "independent" or pre- atoms. Examples include molecular weight
dictor variables. The techniques used include and formula, counts of substructures, and in-
various classification methods (decision tree, dices like molecular connectivity.
support vector, Bayesian) and various estima- Tree. A data structure that is widely used
tion methods (regression, neural nets). in chemical information storage. Commonly
Tanimoto Coefficient. Standard coefficient viewed with the root of the tree at the top (or
for computing the similarity of chemical struc- to the left), successive levels of branching lead .
tures. If structure A has 20 bits turned on in a to the "nodes" of the tree and ultimately to its
fingerprint, and structure B has 30 bits turned "leaves" (terminal nodes). Depending on how
on, and the two structures have 10 bits in com- they split at a node, trees may be binary or
mon, the Tanimoto coefficient is 10/(20+ 30 - n-nary, and depending on how their nodes are
2 X 10) or 0.33. Its value can range from 0 (no distributed, they may be balanced or unbal-
similarity) to 1.0 (perfect match). Other simi- anced. A tree is usually traversed from the
larity coefficients are also used, and in some root to the leaves, and this traversal can be
systems (such as MDL) the various bits are depth-first (followinga single path until a leaf
weighted inversely according to their occur- node is reached), or breadth-first (looking at
rence in the database, so that very common all the nodes at a given level). An example of a
substructures do not contribute much to the tree data structure is the fastsearch index
similarity. used in MDL substructure searching.
Terabyte. One thousand gigabytes, or 1012 Unicode. A 32-bit successor to the ASCII
bytes of data. The largest relational databases character set. With Unicode, foreign alphabets
of any kind are currently a few tens of ter- and special characters can be encoded.
abytes in size. At present, the largest data- Unix. Widely used operating system for
bases of chemical and biological information workstations and server computers. Various
are gigabytes (lo9bytes) in size. computer vendors supply their version of
Thick or Thin Client. A thick client architec- Unix, which typically descends from either the
ture is one in which a significant amount of Bell Labs or Berkeley versions. A microcom-
computing is done on the user's workstation. puter version of Unix is Linux, which is rap-
This is appropriate for user-interface-inten- idly growing in acceptance.
Unsupervised Data Mining. Searching large 4. W .V . Metanomski, J. Chem. Inf Comput. Sci.,
volumes of data for hidden descri~tiverela-
A
35,173-174(1995).
tionships. Unlike supervised data mining, no 5. The 5th International Conference on Chemical
response variables are used. The techniques Structures, June 6-10, 1999, Leeuwenhorst
used include various display and data reduc- Congress Center, Noordwijkerhout, The Neth-
tion methods, as well as cluster analysis and erlands, available online at http://www.
chemweb.com/conference/5iccs/5iccs.html,ac-
association analysis. cessed on September 10,2002.
VARCHAR, VARCHAR2. SQL data types
6. The 2001 Gordon Research Conference on
used to store character data in a relational da- Quantitative Structure Activity Relationships,
tabase system. Storage is limited to about August 5-10,2001, Tilton School, Tilton, N H ,
4000 characters, so larger pieces of data must available online at http://www.grc.uri.edu/
be stored as CLOB data. programs/2001/qsar.htm, accessed on August
Virtual Screening. Using computer model- 28,2001.
ing to screen leadsfor abtivity. The screening 7. The 2001 International Chemical Information
may be through some QSAR or data mining Conference, October 21-24, 2001, Nimes,
model, which typically requires only 2D struc- France, available online at http://www.
tures and data, or it may involve 3D molecular infonortics.com/chemic~index.html, accessed
modeling and docking with a known or puta- on August 26,2002.
tive receptor. The speed and increasing accu- 8. H. R. Collier, Chemical Information, Springer-
Verlag, New York, 1990;H . R. Collier, Ed., Re-
racy of virtual screening make it a vital step in
cent Advances in Chemical Information, Royal
the drug discovery process. Society of Chemistry, London, 1993;H . R. Col-
XML. Extensible Markup Language--a lier, Ed., Further Advances in Chemical Infor-
widely used standard for producing self-docu- mation, Royal Society o f Chemistry, London,
menting text. Documents that subscribe to the 1994.
XML standard can be freely exchanged over 9. Y. C. Martin and P. Willett, Eds., Designing
networks and between applications, using Bioactive Molecules: Three-Dimensional Tech-
standard parsing programs to interpret the niques and Applications, American Chemical
document. CML is an extension of XML that Society, Washingbon, DC, 1998; P. Willett,
can be used to transport structures, reactions, Three-Dimensional Chemical Structure Han-
and chemical and biological data. A query lan- dling, vol. 1,Wiley, New York, 1991.
guage, XMLQuery, is being developed to allow 10. W . Warr and C. Suhr, Chemical Informaiion
Management, Wiley, New York, 1992; W.
searching of XML documents in a manner sim-
Warr, Ed., Chemical Structures. The Interna-
ilar to the use of SQL to search relational da- tional Language of Chemistry, Springer-Ver-
tabases. lag, New York, 1988;W . Warr, Ed., Chemical
Structures 2. The International Language of
8 ACKNOWLEDGMENTS Chemistry, Springer-Verlag, New York, 1993.
11. G. D. Wiggins and K. Emry, Eds., Chemical
Grateful acknowledgement is made to Guenter Information Sources, McGraw Hill, New York,
Grethe, Stephen Heller, Lingran Chen, and Tim 1991.
Hoctor of MDL Information Systems, Inc. for 12. R. E. Maizell, How to Find Chemical Informa-
useful discussions during the preparation of this tion, Wiley, New York, 1998.
chapter. 13. J . E. Ash,W . A.Warr, and P. Willett, Chemical
Structure Systems: Computational Techniques
REFERENCES for Representation, Searching, and Process of
1. J. Knight, Nature, 412,571(2001). Structural Information, Ellis Horwood, New
2. Some examples include Drug DiscoveryWorld, York, 1991.
New Drugs, Nature Reviews Drug Discovery, 14. D. D. Ridley, Online Searching: A Scientist's
Current Drug Discovery and Current Opinion Perspective; A Guide for the Chemical and Life
in Drug Discovery and Development. Sciences, John Wiley & Sons, Chichester, New
3. The Chemical Heritage Foundation, available York, 1996.
online at http://www.chemheritage.org/His- 15. Computational Informatics Research Group,
toricalServices/cheminfo.htm, accessed on available online at http://www.shef.ac.uW-is/
September 10,2002. researcNcirg.htm1, accessed on June 19,2002.
References
16. J. Gasteiger Research Group, available online 32. P. Willett, Ed., Modern Approaches to Chemi-
at http://www2.ccc.uni-erlangen.de,accessed cal Reaction Searching, Gower Press, Brook-
on March 6,2002. field, VT,1986.
17. G. D. Wiggins, available online at http://www. 33. S. R. Heller, Ed., The Beilstein Online Data-
indiana.edu/-cheminfol, accessed on Septem- base: Implementation, Content, and Retrieval,
ber 16,2002. American Chemical Society, Washington, DC,
18. Cambridge Health Institute, Cheminformatics 1990.
Glossary, available online at http://www. 34. R. S. Pearlman, Chem. Design Automation
genomicglossaries.com/default.html, accessed News, 1,5-7(1987).
on August 21,2002. 35. Y . C. Martin, M . G. Bures, and P. Willett in
19. The Chemical Structure Association, available K. B. Lipkowitz and D. B. Boyd, Eds., Reviews
online at http://www.chem-structure.org/, ac- in Computational Chemistry,VCH Publishers,
cessed on September 10,2002. New York, 1990, pp. 213-264.
20. The Ohio State University, Computational 36. R.D. Cramer, D. E. Patterson, R. D. Clark, F.
Chemistry Listserver, available online at http:// Soltanshahi, and M . S. Lawless, J. Chem. Inf.
www.ccl.net/chemistry/, accessed on June 26, Comput. Sci., 38,1010-1023(1998).
2002. 37. For discussions of relative stereochemistry,
21. The Molecular Graphics and Modeling Society, see the documentation forvarious chemical da-
available online at http://www.mgms.org/, ac- tabase systems. For example, MDL stereo-
cessed on September 10,2002. chemistry is described online at http://www.
22. The Open Molecule Foundation, available on- mdli.com/downloadsniterature/ctfile.pdf, and
line at http://www.xml-cml.org/, accessed on Daylight conventions are described at http://
August 14,2002. www.daylight.com/release/f -manuals.htm1.
23. The QSAR and Modeling Society, available on- CAS stereochemical conventions have been de-
line at http://www.ndsu.nodak.edu/qsar_soc/, scribed in L. M. Staggenborg in H . Colier, Ed.,
accessed on September 17,2002. Recent Advances i n Chemical Information,
24. The Royal Society o f Chemistry Chemical In- Royal Society o f Chemistry, Cambridge, UK,
formation Group, available online at http:// 1993, pp. 89-112.
www.rsc.org~lap/rsccom/dab/scafOOl.htm, ac- 38. R. S. Cahn, C. K. Ingold, andV. Prelog, Angew.
cessed on September 10,2002. Chem., 78, 413-447 (1966);V . Prelog and G.
25. The U K QSAR and Cheminformatics Group, Helmchen, Angew. Chem., 94, 614-631
available online at http://www.iainm.demon. (1982); D. Seebach and V . Prelog, Angew:
co.uk/indexnew.htm, accessed on February 1, Chem., 94,696-702(1982).
2002. 39. J. Sadowski and J . Gasteiger, Chem. Rev., 93,
26. M.F. Lynch, J. M . Harrison, W . G. Town, and 2567-2581(1993).
J . E. Ash, Computer Handling of Chemical 40. M . F. Lynch, J. M . Barnard, S. M . Welford,
Structure Information, MacDonald, London, J. Chem. Inf. Comput. Sci., 21, 148-150
1971. (1981);J. D. Holliday, G. M . Downs, V . J . Gil-
27. W . J. Howe, M. M . Milne, and A. F. Pennell, let, and M. F. Lynch, J. Chem. Inf. Comput.
Retrieval of Medicinal Chemical Information, Sci., 33,369477(1993).
ACS Symposium Series 84, American Chemi- 41. G. M. Downs and J . M. Barnard, J. Chem. Inf.
cal Society, Washington, DC, 1978. Comput. Sci., 37,59-61(1997).
28. Chemical Abstracts Service, available online at 42. A. J. Gushurst, J . G. Nourse,W . D. Hounshell,
http://www.cas.org, accessed on September 16, B. A. Leland, and D. G. Raich, J. Chem. Znf.
2002. Comput. Sci., 31,447-454(1991).
29. W . A. Warr and A. R. Haygarth Jackson, 43. J. L. Schultz and E. S. Wilks, J. Chem. Inf.
J. Chem. Inf. Comput. Sci., 28,68-72(1988). Comput. Sci., 37,436-442(1997).
30. D. E. Meyer, W . A. Warr, and R. Love, Eds, 44. ChemDraw from Cambridgesoft: http://www.
Chemical Structure Software for Personal cambridgesoft.com/; ChemWindow from Bio-
Computers, American Chemical Society, Rad: http://www.bio-rad.com; ChemSketch
Washington, DC, 1988. and Structure Drawing Applet from ACDLabs:
31. E. K.F. Ahrens in W . A. Warr, Ed., Chemical http://www.acdlabs.com; Peter Ertls Java mo-
Structures, Springer-Verlag, Berlin, 1988, pp. lecular editor: http://www.elsevier.com/inca/
97-111. homepage/saa/eccc3/paper6/; ISISDraw from
41 4 Chemic:al Information Computing Systems in Drug Discovery
MDL: http://www/mdl.com; JChemPaint from 59. OpenBabel Project, http://openbabel.source-

the JChemPaint Project: http://sourcefor- forge.net.
ge.net/projectsljchempaint; and Marvin from 60. Converting structures to IUPAC names:
ChemAxon: http://www.chemaxon.com/mar- Nomenclator from Cheminnovation, http://
vid. www.cheminnovation.com; Autonom from MDL
45. A comparison of chemical drawing programs, GmbH, http://www.beilstein.com/products/
available online at http://dragon.kite.hu/ autonom.
-gundat/rajzprogramoWdprog.html,accessed 61. Converting names to structures: NamExpert
on September 13,2002. from Cheminnovation, h t t p : / h . c h e m i n n o -
46. 0. F. Giiner, Ed., Pharmacophore Perception, vation.com; CD/Name from ACD Labs, http://
Development, and Use in Drug Design, Inter- www.acdlabs.com.
national university Line, San Diego, CA, 2000. 62. T. R. Hagadone and M. S. Lajiness, Tetrahe-
47. W. J . Wiswesser, J. Chem. Inf. Comput. Sci., dron Comp. Methodol., 1,219-230 (1988).
25,258-263 (1985). 63. B. D. Christie, B. A. Leland, and J. G. Nourse,
48. J. M. Barnard in P. V. R. Schleyer, Ed., The J. Chem. Znf. Comput. Sci., 33,545-547 (1993).
Encyclopedia of Computational Chemistry, 64. M. Z. Nagy, S. Kuzics, T. Veszpremi, and P.
vol. 4, Wiley, New York, 1999, pp. 2818-2826. Bruckin W. A. Warr, Ed., Chemical Structures,
49. D. Weininger, A. Weininger, and J. L. Wein- Springer-Verlag, Berlin, 1988, pp. 127-130.
inger, J. Chem. Znf. Comput. Sci., 29, 97-101 65. Daylight Thor database system: http://www.
(1989);M. A. Siani, D. Weininger, C. A. James,
daylight.com.
and J. M. Blaney, J. Chem. Znf. Comput. Sci.,
35,1026-1033 (1995). 66. Accelrys Accord and RS3 database systems:
http://www.accelrys.com.
50. R. G. A. Bone, M. A. Firth, and R. A. Sykes,
J. Chem. Znf. Comput. Sci., 39, 846-860 67. Tripos Unity database system: http://www.
(1999). tripos.com.
51. D. Weininger, J. Chem. Znf. Comput. Sci., 30, 68. MDL ISIS/Base: http://www.mdli.com
237-243 (1990). 69. Afferent database system for Access: http://
52. For example, the University of Erlangen www.afferent.com.
TORVS research team offers 3D models, phys- 70. H. L. Morgan, J. Chem. Doc., 5, 107-113
icochemical properties, IR and Raman spectral (1965);J. Figueras, J. Chem. Znf. Comput. Sci.,
simulation, etc., available online at http:// 33,717-718 (1993).
www2.chemie.uni-erlangen.de/services/index. 71. T. E. Moock, J . G. Nourse, D. Grier, and W. D.
htm, accessed on September 6,2002. Hounshell in W. E. Warr, Ed., Chemical Struc-
53. Daylight Chemical Information Systems, Inc., tures 2, Springer-Verlag, New York, 1993, pp.
Daylight Reactions, available online a t http:// 303-313; Daylight Theory Manual, Part 7, Re-
www.daylight.com/dayhtml/doc/theory/theory. action Processing, available online at http://
rxn.htm1, accessed on September 6,2002. www.daylight.com, accessed on September 6,
54. W. C. Herndon and S. H. Bertz, J. Comp. 2002.
Chem., 8 , 3 6 7 4 7 4 (1987);S. Ash, M. A. Cline, 72. B. A. Leland, B. D. Christie, J. G. Nourse, D. L.
R. W. Homer, T. Hurst, and G. B. Smith, Grier, R. E. Carhart, T. Maffett, S. M. Welford,
J. Chem. Znf. Comput. Sci., 37,71-79 (1997). and D. H. Smith, J. Chem. Znf. Comput. Sci.,
55. A. Dalby, J. G. Nourse, W. D. Hounshell, 37,62-70 (1997).
A. K. I. Gushurst, D. L. Grier, B. A. Leland, and 73. T. E. Moock, D. L. Grier, W. D. Hounshell, G.
J. Laufer, J. Chem. Inf. Comput. Sci., 32,244- Grethe, K. Cronin, J. G. Nourse, and J. Theo-
255 (1992). dosiou, Tetrahedron Comput. Methodol., 1,
56. Peter Murray-Rust and Henry S. Rzepor, 117-128 (1988).
Chemical Markup Language, available online 74. B. Christie and T. Moock in W. E. Warr, Ed.,
a t http://www.xml-cml.org, accessed on Au- Chemical Structures 2, Springer-Verlag, Ber-
gust 14, 2002. lin, 1993, pp. 469-482.
57. For example, Chemeleon from Exographics 75. E. J. Corey, Chem. Soc. Reu., 17, 111-133
Inc., P. 0.Box 655, West Milford NJ 07480, (1988).
and MolZMol, http://www.compuchem.com. 76. Program CLASSIFY: InfoChem reaction clas-
58. Babel Chemical File Conversion Program, sification, InfoChem GmbH, Munich, http://
http://www.smog.com/chem/babel. www.infochem.de/classify.htm; H. Satoh, 0.
References
Sacher, T. Nakata, L. Chen, J. Gasteiger, and 95. Beilstein Distribution: http://www.beilstein.

K. Funatsu, J. Chem. Inf. Comput. Sci., 38, com.
210-219 (1998). 96. S. R. Heller, Ed., The Beilstein System: Strate-
77. D. R. Henry and A. G. Ozkabak in P. V. R. gies for Effective Searching, American Chemi-
Schleyer, Ed., The Encyclopedia of Computa- cal Society, Washington, DC, 1997.
tional Chemistry, vol. 3, Wiley, New York, 97. Chemical Abstracts Service: http://www.cas.
1999, pp. 543652. org.
78. J. M. Barnard, J. Chem. Znf. Comput. Sci., 33, 98. Daylight Chemical Information Systems, Inc.:
532-538 (1993). http://www.daylight.com.
79. G. W. A. Milne, J. Chem. Inf. Comput. Sci., 33, 99. Dement: http://www.dement.com/user-
531-648 (1993). guides/chem.html.
80. M. G. Hicks and C. Jochum, J. Chem. Inf. Com- 100. Merged Markush Service: http://www.
put. Sci., 30,191-199 (1990). i-mms.com/; Micropatent: http://www.micro-
81. P. Willett, J. Chem. Znf. Comput. Sci., 38,983- pat.com, and ISTA: http://www.paterra.com.
996 (1998). 101. MDL Information Systems, Inc: http://www.
82. P. Willett, V. Winterman, and D. Bawden, mdli.com.
J. Chem. Inf. Comput. Sci., 26,36-41(1986).
102. Tripos, Inc.: http://www.tripos.com.
83. G. Grethe and T. E. Moock, J. Chem. Inf. Com-
103. R. M. Woodsmall and D. A. Benson, Bull. Med.
put. Sci., 30, 511-520 (1990).
Libr. Assoc., 81,282-284 (1993).
84. W. Fisanick, A. H. Lipkus, and A. Rusinko 111,
J. Chem. Inf. Comput. Sci., 34, 130-140 104. D. B. Emmert, P. J. Stoehr, G. Stoesser, and
(1994). G. N. Cameron, Nucleic Acids Res., 22, 3445-
3449 (1994).
85. H. Briem and U. F. Lessel, Perspect. Drug Disc.
Des., 20,231-244 (2000). 105. A. D. Baxevanis and B. F. F. Ouellette, Bioin-
formatics-A Practical Guide to thehalysis of
86. J. E. Mills, C. A. Manryanoff, K. L. Sorgi, R. Genes and Proteins, Wiley-Interscience, New
Stanziione, L. Scott, L. Herring, J. Spink, B. York, 1998.
Baughman, and W. Bullock, J. Chem. Inf.
Comput. Sci., 28,155-159 (1988). 106. H. M. Berman, J. Westbrook, Z. Feng, G. Gilli-
land, T. N. Bhat, H. Weissig, I. N. Shindyalov,
87. J. B. Hendrickson and T. L. Snyder, WebReac-
tions-Organic Reaction Retrieval System, and P. E. Bourne, Nucleic Acids Res., 28,235-
available online at http://webreactions.net, ac- 242 (2000).
cessed on September 6,2002. 107. The Cambridge Crystallographic Data Center:
88. Literature searching-for example: Chemical http://www.ccdc.cam.ac.uk.
Abstracts Service: http://www.cas.org; LitLink 108. T. R. Hagadone and M. W. Schulz, J. Chem.
literature server: http://www.litlink.com; Sci- Inf. Comput. Sci., 36,879-884 (1995).
ence Citation Index: http://www.isinet.com; 109. S. V. Trepalin and A. V. Yarkov, J. Chem. Inf.
Scirus: http://www.scirus.com; Thomson/ISI Comput. Sci., 41, 100-107 (2001).
Web of Science: http://www.isinet.com/isi/
products/citatiodwos/. 110. CambridgeSoft: http://www.camsoft.com; AC-
DLabs: http://www.acdlabs.com; Softshell:
89. H. Tokuno, J. Chem. Znf. Comput. Sci., 33,
www.softshel1.com; Cybula Inc.: http://www.
799-804 (1993).
cybula.com.
90. Forexample,Vermont SIRIMSDS: http://www.
hazard.com/msds; Cornell University: http:// 111. L. H. Hall and L. B. Kier in D. B. Boyd and K.
msds.pdc.cornell.edu/msdssrch.asp. Lipkowitz, Eds., Reviews of Computational
Chemistry, vol. 2., VCH Publishers, New York,
91. For examples, see the Washington University
1991, pp. 367-422.
Libraries Chemical Supplier Catalogs: http://
library.wustl.edu/subjects/chemistry/chem/ 112. R. S. Pearlman and K. M. Smith, Perspect.
chemical-supplier.htm1. Drug Disc. Des., 9, 339-353 (1998).
92. Accord ChemExplorer: h t t p : / h . a c c e l r y s . 113. J. M. Barnard and G. M. Downs, J. Chem. Inf.
com; CambridgeSoft ChemFinder for Word: Comput. Sci., 37, 141-142 (1997); http://www.
http://www.cambridgesoft.com. bci.gb.com/.
93. Accelrys: http://www.accelrys.com. 114. R. Todeschini and V. Consonni, Handbook of
94. The Beilstein Institute: http://www.beilstein- Molecular Descriptors, John Wiley & Sons,
institut.de. New York, 2001.
115. L. B. Kier and L. H. Hall, Molecular Structure 135. D. Normile, Science, 293,787 (2001).
Description: The Electrotopological State, Aca- 136. S. B. Singh, R. P. Sheridan, E. M. Fluder, and
demic Press, New York, 1999. R. D. Hull, J. Med. Chem., 44, 1564-1575
116. Biobyte, Inc: http://www.biobyte.com. (2001).
117. Syracuse Research Corporation: http://www. 137. R. K. Lindsay, B. G. Buchanan, E. A. Feigen-
syrres.com. baum, and J. Lederberg, Applications of Arti-
118. CompuDrug Ltd: http://www.compudrug.com. ficial Intelligence for Chemistry: The DEN-
119. ACD Labs, Inc: http://www.acdlabs.com. DRAL Project, McGraw-Hill, New York, 1980.
120. Peking University: http://cheminfo.pku. 138. U . Fayyad, G. Piatetsky-Shapiro, P. Smyth,
edu.cdcalculator/xlogp/. and R. Uythurusamy, Eds., Advances in
121. EduSoft LC: http://www.eslc.vabiotech.com. Knowledge Discovery and Data Mining, MIT
122. SciVision, Inc: http://www.scivision.com/. Press, Cambridge, MA, 1996.
123. A. M. Zissimos, M. H . Abraham, M. C. Barker, 139. M. J . A. Berry and G. Linoff, Data Mining
K. J. Box, and K. Y . Tam, J. Chem. Soc., Perkin Techniques: For Marketing, Sales, and Cus-
Trans., 2,1-9 (2002). tomer Support, Wiley, New York, 1997.
124. LION Bioscience, Inc: http://www.lionbio- 140. M. J . Zaki, H. T . T . Toivonen, and J . T . L.
science.com/. Wang, Eds., Proceedings of BIOKDD'Ol:
125. V . V . Poroikov, D. A. Filimonov,V . B. Y u , A. A. Workshop on Data Mining in Bioinformatics,
Lagunin, A. Kos, J. Chem. Inf Comput. Sci., 7 t h ACM SIGKDD International Conference
40,1349-1355 (2000). on Knowledge Discovery and Data Mining-
KDD'O1, San Francisco, CA, Association for
126. Daylight Chemical Information online calcula-
Computing Machinery, New York, August 26,
tor: http://www.daylight.com/cgi-bidcontribl
2001.
pcmodels.cgi.
127. ACD Labs online calculator: http:// 141. R. W . Snyder, "Symposium on Structure-
www2.acdlabs.com/ilab/. Based Data Mining", Abstracts, American
Chemical Society 221st National Meeting, San
128. Syracuse Research online calculator: http://
Diego, CA, April 1-5, 2001, American Chemi-
esc.syrres.com/interkow/kowdemo.htm.
cal Society, Washington, DC, 2001.
129. Molinspiration:http://www.mollnspiration.com/.
142. J. F. Cargill and N. E. MacCuish, Drug Discov.
130. Alogp-VCCLab online LogP calculation:
Today, 3,547-551 (1998).
http://vcclab.orgAab.
131. PETRA: http://www2.chemie.uni-erlangen.de/ 143. B. de Ville, Microsof? Data Mining, Digital
services/petra/. Press, Boston, MA, 2001.
132. USEPA Suite: http://www.epa.gov/opptintr/ 144. T . Hoctor, "Linking Context-similar Informa-
exposure/docs/episuite.htm. tion", Abstracts, American Chemical Society
133. R. Kimball and M. Ross, The Data Warehouse 222nd National Meeting, Chicago, IL, August
Toolkit: The Complete Guide to Dimensional 2 6 3 0 , 2001, American Chemical Society,
Modeling, 2nd ed., Wiley, New York, 2002. Washington, DC, 2001.
134. M . G. Axel and I.-Y. Song, "Data Warehouse 145. D. R. Swanson and N. R. Smalheiser, Artificial
Design for Pharmaceutical Drug Discovery Re- Intelligence, 91,183-203 (1997).
search," in Proceedings o f the 8th Interna- 146. N. Farmer, J. Amoss, W . Farel, J . Fehribach,
tional Workshop in Database and Expert Sys- and C. Eidner in W . A. Warr, Ed., Chemical
tems Applications, IEEE Press, Piscataway, Structures, Springer-Verlag, Berlin, 1988, pp.
NJ, September 1997. 283-295.
CHAPTER TEN
Structure-Based Drug Design

LARRY W. HARDY
Aurigene Discovery Technologies
Lexington, Massachusetts
DONALD J. ABRAHAM
Richmond, Virginia
MARTIN K. SAFO
Richmond, Virginia
Contents
1 Introduction, 418
2 Structure-Based Drug Design, 419
2.1 Theory and Methods, 419
2.2 Hemoglobin, One of the First
Drug-Design Targets, 419
2.2.1 History, 419
2.2.2 Sickle-Cell Anemia, 419
2.2.3 Allosteric Effectors, 421
2.2.4 Crosslinking Agents, 424
2.3 Antifolate Targets, 425
2.3.1 Dihydrofolate Reductase, 425
2.3.2 Thymidylate Synthase, 426
2.3.2.1 Structure-Guided Optimization:
AG85 and AG337,426
2.3.2.2 De Novo Lead Generation:
AG331,428
2.3.3 Glycinamide Ribonucleotide
Formyltransferase, 429
2.4 Proteases, 432
2.4.1 Angiotensin-Converting Enzyme and
the Discovery of Captopril, 432
2.4.2 HIV Protease, 433
2.4.3 Thrombin, 442
2.4.4 Caspase-1, 444
2.4.5 Matrix Metalloproteases, 445
2.5 Oxidoreductases, 446
2.5.1 Inosine Monophosphate
Dehydrogenase, 447
Burger's Medicinal Chemistry a n d Drug Discovery 2.5.2 Aldose Reductase, 448
Sixth Edition, Volume 1: Drug Discovery 2.6 Hydrolases, 449
Edited by Donald J. Abraham 2.6.1 Acetylcholinesterase, 449
ISBN 0-471-27090-3 Q 2003 John Wiley & Sons, Inc. 2.6.2 Neuraminidase, 450
417
2.6.3 Phospholipase A2 (Nonpancreatic, 2.8.1 Mitogen-Activated Protein Kinase

Secretory), 452 p38a, 456
2.7 Picornavirus Uncoating, 454 2.8.2 Purine Nucleoside Phosphorylase, 459
2.8 Phosphoryl Transferases, 456 2.9 Conclusions and Lessons Learned, 461
1 INTRODUCTION (such as BioCryst and Vertex) were soon

founded to apply similar approaches. More re-
Structure-based drug design by use of struc- cent companies, such as Structural Genomix
tural biology remains one of the most logical (3) and Astex (41, and the High Throughput
and aesthetically pleasing approaches in drug Crystallography Consortium, organized by
discovery paradigms. The first paper on the Accelrys (5), have emerged to carry on struc-
potential use of crystallography in medicinal ture-based drug discovery in a high through-
chemistry was written in 1974 (1)and was pre- put environment. Third-generation synchro-
sented at Professor Alfred Burger's retire- tron sources, such as the Advanced Photon
ment symposium in 1972. The excerpted last Source (APS)at Argonne National Laboratory
paragraph in the paper, reproduced below,
outside Chicago, and new detectors, have
foresaw the integration of X-ray crystallogra-
enormously increased the speed of data collec-
phy into the field of medicinal chemistry.
tion. It is now possible to collect high resolu-
It is reasonable to assume then that the future of tion data from protein crystals, solve, and re-
large molecule crystallography in medical chem- fine the structure in days to a few months.
istry may well be of monumental proportions. This information is covered in an adjacent
The reactivity of the receptor certainly lies in the chapter. Simultaneous advances in computing
nature of the environment and position of vari- have added to the speed of obtaining three-
ous amino acid residues. When the structured dimensional structural information on inter-
knowledge of the binding capabilities of the ac-
esting drug design targets. These develop-
tive site residues to specify groups on the agonist
or antagonists becomes known, it should lead to ments, coupled with the sequencing of the
proposals for syntheses of very specific agents human genome and the advent of bioinformat-
with a high probability of biological action. Com- ics, provide workers in structure-based drug
bined with what is known about transvort of . design with a plethora of opportunities for
drugs through a Hansch-type analysis, etc., it is success.
feasible that the drugs of the future will be tailor- The utility of any drug discovery tool is
made in this fashion. Certainly, and unfortu- measured, in the final analysis, by the output
nately, however, this day is not as close as one of the tool's use. New tools are burdened with
would like. The X-ray technique for large mole- unrealistically high expectations. As their ap-
cules, crystallization techniques, isolation tech-
plication begins, the impact is sometimes
niques of biological systems, mechanism studies
of active sites and synthetic talents have not more limited than was originally envisaged.
been extensively intertwined because of the ex- Structure-based design methods have had
isting barriers (1). great utility for the design of enzyme inhibi-
tors, tight-binding receptor ligands, and novel
Since that time there have been numerous proteins. The utility of these methods for the
successes in advancing new agents into clini- design of drugs is somewhat more limited,
cal trials by combining crystallography with simply because there are so many factors that
associated fields in drug discovery. Currently, must be balanced in the successful design of a
more structures are solved every year than drug. Nonetheless, structure-based drug de-
were in the entire Protein Data Bank in 1972. sign (SBDD), distinct from the (far easier)
Although almost every major pharmaceutical structure-based inhibitor design, is now a re-
company has an X-ray diffraction group, Ag- ality and has had significant impact. Aspects
ouron (now Pfizer) was the first biotechnology of the methods and utility of SBDD have been
startup company to make drug discovery described in several excellent review articles
based on X-ray crystallography a central and and monographs (6-12). This chapter focuses
primary theme (2). Other startup companies on the utility of SBDD in the cases of drugs
that have been launched as products, or that the biological action with precise structural
have at least entered human clinical trials. In information. It makes good sense at the early
some cases, SBDD has been a remarkable sue- stages of design to use lead molecule struc-
cess. In others, it has failed in the sense that, tural scaffolds that retain low toxicity profiles,
despite its use, the candidate produced did not given that the latter most often derails suc-
gain approval to become a marketed drug. In cessful drug discovery. The most active deriv-
the latter cases, this was usually not truly a ative(~)from this cyclic process can be for-
failure of SBDD, but rather attributed to the warded for in vivo evaluation in animals.
complex criteria that drugs must meet, and to
the complex regulatory hurdles that candi-
dates and companies face. 2.2 Hemoglobin, One of the First
In addition to providing a measure of the Drug-Design Targets
impact of SBDD on the creation of actual
drugs, these examples will also provide lessons 2.2.1 History. Perutz and colleagues de-
about how to apply SBDD in drug discovery. termined the first three-dimensional structures
The chapter is not completely encyclopedic, of proteins. Through use of X-ray crystallogra-
and some significant instances of SBDD will phy Kendrew determined the structure of myo-
be missed by the informed reader. However, globin (13), whereas Perutz determined the
the discovery programs with drugs and drug structure of hemoglobin (Hb)(14-16). At the
candidates that are discussed will provide suf- present time, new protein and nucleic acid
ficient diversity that general trends can structures and complexes are published
emerge. In a few cases, relevant results for weekly. However, for a long period after the
preclinical compounds that seem likely to en- first protein structures were solved, progress
ter human trials are described. A growing was slower. Hb was of interest for drug discov-
number of the drugs to which structural de- ery purposes because of the early identifica-
sign methods are applied are themselves pro- tion of the mutant 6 Glu -,Val, which causes
teins (e.g., cytokines, immunomodulators, sickle-cell anemia. The crystal structure of
monoclonal antibodies). However, this chap- sickle Hb (Hbs) was published by Wishner et
ter is restricted to small organic molecules al. (17) and was later solved at a higher reso-
that are designed by use of the three-dimen- lution by Harrington et al. (18).
sional structure of a target protein.
2.2.2 Sickle-Cell Anemia. In 1975, through
use of the three-dimensional Hb coordinates,
2 STRUCTURE-BASED DRUG DESIGN
two groups initiated SBDD studies to discover
an agent to treat sickle-cell anemia: Goodford
2.1 Theory and Methods
et al. in England and Abraham et al. in the
The concept of structure-based drug discovery United States. Goodford's group was the first
combines information from several fields: X- to develop an antisickling agent (BW12C),
ray crystallography and/or NMR, molecular based on structure-based drug design, which
modeling, synthetic organic chemistry, quali- reached clinical trials (19, 20). However,
tative structure-activity relationships (QSAR), Wireko and coworkers were unable to confirm
and biological evaluation. Figure 10.1 repre- the BW12C binding site proposed by Goodford
sents a general road map where a cyclic pro- (21). The second antisickling agent proposed
cess refines each stage of discovery. Initial by Abraham et al. to advance to clinical trials
binding site information is gained from the was the food additive vanillin (compound la)
three-dimensional solution of a complex of the (22). The crystallographic binding site of
target with a lead compound(s). Molecular BW12C (lb)was found to be at the N-terminal
modeling is usually next applied with the in- amino groups of the a-chains (21), whereas
tent of designing a more specific ligandk) with that of vanillin shows binding close to aHisl03
higher affinity. Synthesis and subsequent in and also at a minor site between PHis117 and
vitro biological evaluation of the new agents PHis117 (22). A recently redetermined bind-
produces more candidates for crystallographic ing site of vanillin at a higher resolution shows
or NMR analysis, with the hope of correlating weak binding to the N-terminal amino group
Figure 10.1. Schematic of the structure-based drug discovery/design process. The figure maps out
the iterative steps that make use of X-ray crystallography, molecular modeling, organic synthesis,
and biological testing to identify and optimize ligand-protein interactions.
CHO CHO
I
(lb) BW12C
(la) vanillin
(21, a diuretic agent, and clofibric acid (3),an
antilipidemic agent, were reported to have
of the a-chain (23). A derivative of vanillin has strong antigelling activity (24, 25), and
been patented and is a candidate for clinical through X-ray analyses of cocrystals, the bind-
trials. ing sites of these agents to Hb were elucidated
Two marketed medicines, ethacrynic acid (26). Unfortunately, it was found that high
Structure-Based Drug Desij
d Clinical Trials
Figure 10.1. Schematic of the structure-based drug discovery/design process. The figure maps out
the iterative steps that make use of X-ray crystallography, molecular modeling, organic synthesis,
and biological testing to identify and optimize ligand-protein interactions.
CHO CHO
I
OH (lb) BW12C
(la) vanillin
(21, a diuretic agent, and clofibric acid (3),a
antilipidemic agent, were reported to hav
of the a-chain (23).A derivative of vanillin has strong antigelling activity (24, 25), an
been patented and is a candidate for clinical through X-ray analyses of cocrystals, the bind
trials. ing sites of these agents to Hb were elucidate1
Two marketed medicines, ethacrynic acid (26). Unfortunately, it was found that higl
ucture-Based Drug Design
logical role is to right shift the Hb oxygen-

binding curve to release more oxygen. The
binding site of 2,3-DPG, determined by Ar-
none (30) lies on the dyad axis at the mouth of
the p-cleft (Fig. 10.2) interacting with the N-
terminal PVall, PLys82, and PHis143 of deoxy
Hb. A more recent study at a higher resolu-
tion, by Richard et al. (31), found DPG to in-
(2) ethacrynic acid teract with the residues PHis2 and PLys82.
Goodford and colleagues were the first to de-
sign agents that would bind to the 2,3-DPG
site (32-34). An effective allosteric effector
that can enter red cells might be used to treat
hypoxic diseases such as angina and stroke, to
enhance radiation treatment of hypoxic tu-
mors, or to extend the shelf life of stored blood.
Many antigelling agents left shift the oxy-
(3) clofibric acid
gen binding curve, producing higher concen-
trations of oxy-HbS. Given to patients with
corm:ntrations of ethacrynic acid were needed sickle-cell anemia, this should result in less
to intieract with Hb in deformed red cells (27). polymerization, and therefore less red blood
Clofil~ r i cacid, when administered in a 2 gm/ cell sickling. It was a surprise therefore when
day (lose (as the ethyl ester clofibrate), ap- clofibric acid, which blocks sickle-cell Hb poly-
pear6:d to be an ideal potential treatment for merization, was found to shift the Hb oxygen
sicklc?-cell anemia, but was subsequently binding curve to the right, in a manner similar
founcI to be highly bound to serum proteins to that of 2,3-DPG (25). The clofibric acid
and nlot transported in quantities sufficient to binding site was found to be far removed from
inter;~ c with
t sickle Hb. Furthermore. struc- the 2,3-DPG site (25, 35). The determination
ture-1based derivatives were not found to be of the clofibric acid binding site on Hb was the
effective (28, 29). first report of a tense state (deoxy state) allo- .
Tkle major problem with designing a small
molec:ule to treat sickle cell anemia is not so
much an issue of specificity, but arises from
the tr,eatmentof a chronic disease. The poten-
tial cumulative toxicity from the amount of
drug needed to interact with approximately
two plounds of hemoglobin S over a homozy-
gous patient's lifetime is the major concern
(22) 1:for a review, see Vol. 3, Chapter 10.
Sicklt? Cell Anemia, by Alan Schecter et al).
2.2!.3 Allosteric Effectors. $&Diphospho-

glycer.ate (2,3-DPG, compound 41, found in
most Imammalian red cells, is the naturally oc-
currir~gallosteric effector for Hb. Its physio-
Figure 10.2. View of (4) (2,3-DPG) binding site at

the mouth of the p-cleft of deoxy hemoglobin. See
color insert.
steric binding site different from that of 2,3- solely related to .their binding constant, pro-
DPG (compound 4). Perutz and Poyart tested viding a structural basis for E. J. Ariens' the-
another antilipidemic agent, bezafibrate (com- ory of intrinsic activity (42).
pound 5), and found that it was an even more By use of X-ray crystallographic analyses,
the key elements linking allosteric potency
with structure were uncovered. In addition,
the computational program HINT, which
quantitates atom-atom interactions, was used
to determine the strongest contacts between
various bezafibrate analogs and Hb residues.
These analyses revealed that the amide link-
age between the two aromatic rings of the
compounds must be orientated so that the car-
(5) bezafibrate bony1 oxygen forms a hydrogen bond with the
side-chain amine of aLys99 (41, 43). Three
other important interactions were found. The
potent right-shifting agent than clofibrate first are the water-mediated hydrogen bonds
(36). Perutz et al. (26) and Abraham (35) de- between the effector molecule and the protein,
termined the binding site of bezafibrate and the most important occurring between the ef-
found it to link a high occupancy clofibrate site fector's terminal carboxylate and the side-
with a low occupancy site. Lalezari and Lalez- chain guanidinium moiety of residue olArgl41.
ari synthesized urea derivatives of bezafibrate Second, a hydrophobic interaction involves a
(37), and with Perutz et al. determined the methyl or halogen substituent on the effec-
binding site of the most potent derivatives tor's terminal aromatic ring and a hydropho-
(38). Although these compounds were ex- bic groove created by Kb residues aPhe36,
tremely potent, they were hampered by serum aLys99, aLeu100, aHisl03, and pAsnl08.
albumin binding (39,40). Third, a hydrogen bond is formed between the
Abraham and coworkers synthesized a se- side-chain amide nitrogen of Asnl08 and the
ries of bezafibrate analogs (39-42). One of electron cloud of the effector's terminal aro-
these agents, efaproxaril (RSR 13, compound matic ring (40,41,43).Abraham first observed
6a) is currently in Phase I11 clinical trials for this last interaction while elucidating the Hb'
radiation treatment of metastatic brain tu- binding site of bezafibrate (35). Burley and
Petsko had previously pointed out this type of
hydrogen bond in a number of proteins, indi-
cating that this contact is involved in a num-
ber of other receptor interactions (44,451. Pe-
rutz and Levitte estimated this bond to be
about 3 kcal/mol (46). Figure 10.3 shows the
overlap of four allosteric effectors (6a, 6b, 7a
and 7b) that bind at the same site in deoxy Hb
but differ in their allosteric potency.
mors (see, Vol. 4, Chapter 4. Radiosensitizers

and Radioprotective Agents, by Edward Bump
et al). The binding constants and binding sites
of a large number of these bezafibrate analogs
were measured and agreed with the number of
crystallographic binding sites found (42). The
degree of right shift in the oxygen-binding
curve produced by these compounds was not
itructure-Based Drug Design 423
Figure 10.3. Stereoview of allosteric binding site in deoxy hemoglobin. A similar compound envi-
ronment is observed at the symmetry-related site, not shown here. (a) Overlap of four right-shifting
allosteric effectors of hemoglobin: (6a) (RSR13, yellow), (6b)(RSR56, black), (7a) (MM30, red), and
[7b)(MM25,cyan). The four effectors bind at the same site in deoxy hemoglobin. The stronger acting
RSR compounds differ from the much weaker MM compounds by reversal of the amide bond located
between the two phenyl rings. As a result, in both RSR13 and RSR56, the carbonyl oxygen faces and
nakes a key hydrogen bonding interaction with the m i n e of mLys99. In contrast, the carbonyl
xygen of the MM compounds is oriented away from the mLys99 amine. The aLys99 interaction with
;he RSR compounds appears to be critical in the allosteric differences. (b) Detailed interactions
~etweenRSR13 (6a) and hemoglobin, showing key hydrogen bonding interactions that help con-
strain the T-state and explain the allosteric nature of this compound and those of other related
:ompounds. See color insert.
424 Structure-Based Drug Design
Over the course of these studies, an inter-

esting anomaly was solved. Allosteric effectors
(such as 8a and 8b)can bind to a similar site
H3C CH3
Figure 10.4. Stereoview of superimposed binding

(8a)DMHB sites for (8b)(5-FSA, yellow) and (8a)(DMHB, ma-
genta) in deoxy hemoglobin. A similar compound
environment is observed at the symmetry-related
CHO site and therefore not shown here. Both compounds
form a Schiff base adduct with the cvlVall N-termi-
nal nitrogen. Whereas the m-carboxylate of 5-FSA
forms a salt bridge with the a2Arg141 (opposite sub-
unit), this intersubunit bond is missing in DMHB.
The added constraint to the T-state by 5-FSA that
ties two subunits together shifts the allosteric equi-
librium to the right. On the other hand, the binding
of DMHB does not add to the T-state constraint.
Instead, it disrupts any T-state salt- or water-bridge
interactions between the opposite a-subunits. The
and yet effect opposite shifts in the oxygen- result is a left shift of the oxygen equilibrium curve
binding curve. Agents such as 5-FSA bind to by DMHB. See color insert.
the N-terminal Val and provide groups for hy-
drogen bonding with the opposite dimer stitutes and Blood Products, by Andeas Moz-
(across the twofold axis) right shift the oxy- zarelli et al.). Another crosslinked Hb engi-
gen-binding curve. In contrast, agents that neered by Nagai and colleagues, at the MRC-
disrupt the water-mediated linkage between LMB in Cambridge, was developed into a
the N-terminal aVal with the C-terminal blood substitute that was clinically investi-
&gl41 left shift the curve (47) (Fig. 10.4). gated at Somatogen, now Baxter (53). Boyiri et
Structure-based stereospecific allosteric effec- al. synthesized a number of crosslinking
tors for Hb have also been developed and pos- agents (molecular ratchets, such as 9) whose
sess activities and profiles appropriate for clin-
ical efficacy (48,49). OHC
2.2.4 CrosslinkingAgents. The first crosslink-

ing agent that possessed potential as a Hb-
based blood substitute was described by
'Qo~~~
Walder et al. (50). Bis(4-formylbenzy1)ethane
and bisulfite adducts of similar symmetrical
aromatic dialdehydes, previously studied by
Goodford and colleagues, provided the start- potency was directly related to the length of
ing points that led to these compounds. Chat- the crosslink: the shorter the crosslink (three
terjee et al. identified the binding site to de- atoms), the stronger the shift of the oxygen
oxy-Hb, and found that the two Lys99 side binding curve to the right (54) (Fig. 10.5).
chains were crosslinked (51). One of the deriv- Perutz's hypothesis (55) and the MWC
atives was proposed as a blood substitute (52), model (56) for allostery, that the more tension
and has been explored commercially (see Vol. is added to the tense (deoxy) state of Hb, the
3, Chapter 8. Oxygen Delivery and Blood Sub- greater the shift to the right of the oxygen-
2 Stru~cture-BasedDrug Design 425
Fi gure 10.5. Stereoview of the binding site for (9) (n = 3, TB36, yellow) in deoxy Hb. A similar
co:mpound environment is observed at the symmetry-related site, not shown here. One aldehyde is
CO'valently attached to the N-terminal alVall, whereas the second aldehyde is bound to the opposite
subunit, a2Lys99 ammonium ion. The carboxylate on the first aromatic ring forms a bidentate
hy.drogen bond and salt bridge with the guanidinium ion of a2Arg141 of the opposite subunit. The
efiTector thus ties two subunits together and adds additional constraints to the T-state, resulting in a
shift in the Hb allosteric equilibrium to the right. The magnitude of constraint placed on the T-state
by the crosslinked aLys99 varies with the flexibility of the linker. Shorter bridging chains form
tig:hter crosslinks and yield larger shifts in the allosteric equilibrium. See color insert.
DHFR
bindi~ig curve, are generally consistent with Tetrahydrofolate
the biehavior of the allosteric effectors and
cross1inking agents. [Purines]
intifolate Targets I t TS
I
C1-Tetrahydrofolate -7 -
Dihydrofolate
2.3 .I Dihydrofolate Reductase. The re- 1
duced form of folate (tetrahydrofolate) acts as Thymidylate
a one-carbon donor in a wide variety of biosyn-
Scheme 10.1.
thetic transformations. This includes essen-
tial st;eps in the synthesis of purine nucleo-
tides 2md of thymidylate, essential precursors The first crystal structure of a drug bound
to DNIA and RNA. For this reason. folate-de- to its molecular target was provided by the
pendent enzymes have been useful targets for pioneering X-ray diffraction study of the com-
the dlevelopment of anticancer and anti-in- plex between DHFR and methotrexate (57),
flamrrlatory drugs (e.g., methotrexate) and - was a bacterial
albeit in this case the target
anti-irlfedives (trimethoprim, pyrimethamine). surrogate for the actual target (the human en-
During the reaction catalyzed by thymidylate zyme). Once X-ray structures of DHFR from
synthiase (TS), tetrahydrofolate also acts as a eukaryotic sources were also solved, compari-
reducitant and is converted stoichiometrically sons of the bacterial and eukaryotic
" DHFR
to dikydrofolate. The regeneration of tetrahy- structures revealed the structural basis for
drofolate, required for the continuous func- the selectivity of the antibacterial drug tri-
tioning of this cofactor, is catalyzed by dihy- methoprim for the bacterial enzyme. This un-
drofolate reductase (DHFR). derstanding allowed Goodford and colleagues
to rationally design trimethoprim analogs When the design of inhibitors of human TS

with altered potencies (58). Retrospective at Agouron Pharmaceuticals began, the
studies such as those done by David Matthews amounts of the human enzyme required for
and others on DHFR (see, for example, Ref. crystallographic study were unavailable. Be-
59) set the stage for the iterative process of cause the active site of the enzyme is so highly
structure-based inhibitor design as it was conserved, it was assumed that an acceptable
later developed at Agouron Pharmaceuticals, surrogate for human TS would be the crystal
targeted against another folate-dependent en- structure of a bacterial TS (60, 62). Figure
zyme, TS (60, 61). 10.6 shows the conformation of the quinazo-
line folate analog 10 (N10-propynyl-5,
2.3.2 Thymidylate Synthase. There are two
8-dideazafolate), bound within the active site
main modes in which structure-based meth-
of the Escherichia coli enzyme with the nucle-
ods for inhibitor design have been employed.
The first mode is structure-guided optimiza- otide substrate, 2'-deoxyuridine-5'-mono-
tion of the design of a previously known chem- phosphate (63, 64). This folate analog, de-
ical scaffold. The scaffold could be a known signed by classical medicinal chemistry as an
drug or inhibitor, substrate analog, or a hit analog of the TS substrate, 5,lO-methylene-
from screening of a random library. The prop- tetrahydrofolate (111, is a potent TS inhibitor.
erty, which is modified during the optimiza- Nevertheless, (10) failed as an anticancer drug
tion, may be, for example, potency, solubility, because of its insolubility and resulting neph-
or target selectivity, or the more challenging rotoxicity (65).
aim may be to optimize several properties si- 2.3.2.1 Structure-Guided Optimization: AG85
multaneously. A second and potentially more andAG337. In the crystalline complex with E.
powerful mode is for the de nouo design of in- coli TS, the quinazoline ring of compound (10)
hibitory ligands, sometimes referred to as lead binds on top of the pyrimidine of the nucleo-
generation. This mode relies strictly on the tide, in a protein crevice surrounded by hydro-
structure of the target enzyme or receptor as a phobic residues (Fig. 10.6). The bound mole-
template. A substrate or an inhibitor may be cule bends at right angles between the
bound to the crystalline target, and deleted to quinazoline and 4-aminobenzoyl rings (at
provide the template. This is advantageous NlO), with the D-glutamate portion extending
when, as in the case of TS, a substantial con- out to the surface of the enzyme. Hydrogen
formational change occurs when ligands bind. bonds are made with several enzyme side-
After a de nouo design process has provided a chains, the terminal carboxylate, and several
new inhibitor that is structurally unique, the tightly bound waters. This compound, like fo-
properties of the new scaffold can be optimized late and most folate analogs, gains entry into
by continued structural guidance. Both modes cells through a transport system that recog-
of SBDD have been used to generate TS inhib- nizes its D-glutamate moiety, and intracellular
itors that have entered clinical trials. concentrations are elevated because of trap-
(also known as PDDF or CB3717)

(10) N10-propynyl-5,8-dideazafolate
lure-Based Drug Design
ping of'the compound as highly charged forms which is now approved for treatment of colo-
after a1ddition of several additional glutamates rectal cancer in European markets.] Removal
by a cellular enzvme.
" of the glutamate reduced the potency by 2 to 3
TS inhibitors were designed by Agouron orders of magnitude (Table 10.1, 12 versus
scienti:sts with the aim of providing a drug 13). The crystal structure solved by use of (10)
that could enter cells passively and thus avoid indicated potential interactions that were ex-
the neted for transport or polyglutamylation. ploited by substituents such as the m-CF, in
The fil.st were designed by structure-guided compound (14). The phenyl moiety in (15)was
modific:ation of known antifolates, and others added to interact with Phe176 and Ile79 (Fig.
were dlesigned de novo. Starting with (12), the -
10.6). Combining substituents does not neces-
glutamlate moiety was deleted from the struc- sarily produce the expected sum of binding
I
ture. [Compound (12), the 2-desamino-2- free energy (compare 16 with 14 and 15).
methyl analog of (lo), had been found to be Structures of the complexes with several of
much more water soluble than (10). This these compounds revealed that ideal place- .
eventually led (65) to AstraZeneca's Tomudex, ment of one group does not always accommo-
Figure 10.6. Binding site for (10) (N10-propynyl-5,8-dideazafolate), within the active site of thymi-
dylarte synthase from Escherichia coli. The surface of the inhibitor is shown in the left view. The red
sphc?resin the left view are tightly bound water molecules. See color insert.
Inhibitors of TS a
Table 10.1 SAR for 2-Methyl-4-0x0-quinazoline
Kim CLM Kim CLM

Compound R (E.coli TS) (human TS)
-
(12) para-CO-glutamate 0.005 0.009

(13) H 4 2.2
(14) meta-trifluoromethyl 0.45 0.4
(15) para-SO,-phenyl 0.025 0.013
(16) meta-trifluoromethyl, para-SO2-phenyl 0.037 0.05
(17) para-SO2-(N-indolyl) 0.15 0.07
"From ref. 60.
date the best interaction for another. (This is a binding mode. Two dozen 5-substituted
general problem for rigid scaffolds.) Com- quinazolines were made to explore the SAR for
pounds (15-17) had significant activity in in this scaffold. However, the eventual clinical
vitro cell-based assays, which could be re- candidate (19) was only two steps away from
versed by exogenous thymidine. Compound (18).The methyl group at position 6 was in-
(17) (AG85)was tested in human clinical trials corporated for favorable interaction with
for treatment of psoriasis (9). Trp80. This also favorably restricted the tor-
The structure shown in Fig. 10.6 also sug- sional flexibility for the 5-substituent, and in-
gested another approach to alter the structure creased the inhibitory potency against human
of (12) to generate a lipophilic inhibitor of TS.
TS by 10-fold. The 2-methyl was replaced by
The hydrophobic cavity filled by the aromatic
an amino group, to create a hydrogen bond to
ring of the para-aminobenzoyl group could be
a backbone carbonyl in the protein, and in-
filled instead by a substituent attached to po-
sition 5 of the quinazoline nucleus. Four dif- creased potency another sixfold. Compound
ferent 5-substituted 2-methyl-4-oxoquinazo- (19) (AG337, also known as nolatrexed, and as
lines were made to test this idea, and one of the hydrochloride, Thymitaq) advanced into
these (18)was a 1 inhibitor of human TS human testing and had progressed into later-
(66). stage clinical trials as an antitumor agent by
The X-ray structure of the bacterial en- 1996 (67).
zyme with (18) confirmed the hypothetical 2.3.2.2 De Novo Lead Generation: AG331.
The de novo design effort was initiated
through the use of a computational method,
Goodford's GRID algorithm (68,69), to locate
a site favorable to the binding of an aromatic
system within the TS active site (70). Using
computer graphics, naphthalene was visual-
ized and manipulated within this favorable
site (Fig. 10.7). This facilitated alterations of
the naphthalene scaffold to a benz[cd]indole
to provide hydrogen-bonding groups to inter-
act with the enzyme and a tightly bound wa-
ter. Elaboration from the opposite edge of the
naphthalene core to extend into the top of the
active site cavity, toward bulk solvent, re-

sulted in (20). The use of an amine for the
groups attached to position 6 of the benz-
tion of glycinamide ribonucleotide, through

use of N-10-formyltetrahydrofolate as the
one-carbon donor. Because this is an essential
step in the synthesis of purine nucleotides,
GARFT is a target for blocking the prolifera-
[cdlindole improved the synthetic ease for tion of malignant cells. Several potent GARFT
variation of these groups. Compound (20) had inhibitors, such as pemetrexed (22, ALIMTA,
a Ki, value of 3 p M for inhibition of human TS
and was about 10-fold less potent against the
bacterial enzyme.
The X-ray structure of (20) bound to E. coli
TS revealed that the compound actually binds
more deeply into the active-site crevice than
had been anticipated. Instead of interacting
favorably with the enzyme-bound water indi-
cated in Fig. 10.7, the oxygen at position 2 of
the benz[cd]indole displaces it. This forced the (22) pemetrexed
Ah263 carbonyl oxygen to move by about 1 A.
Replacement of the oxygen at position 2 with
nitrogen provided a significant increase in in-
hibitory potency. Structural studies revealed
that this also resulted in recovery of the dis-
placed water, and restoration of the original
position of the Ah263 carbonyl oxygen. The
substituents at position 5, on the tertiary
arnine nitrogen, and on the sulfonyl group
were also varied during the iterative optimiza- (23) lometrexol
tion process. The process yielded (21)
(AG331), which has a Ki,value of 12 n M for
LY231514) and lometrexol (23, 5,lO-dideaz-
inhibition of human TS. Compound (21) en-
tetrahydrofolate, LY-264618), have been
tered clinical trials as an antitumor agent (71).
shown to be effective antitumor agents in clin-
2.3.3 Clycinamide Ribonucleotide Formyl- ical trials (71, 72).
transferase. Glycinamde ribonucleotide formyl- These were designed through traditional
transferase (GARFT) catalyzes the N-formyla- medicinal chemistry approaches, in which an-
Figure 10.7. Conceptual design of compound (201, by use of the active site of E. coli TS as a
template. W represents a tightly bound water molecule. [Adapted from Babine and Bender (91.1
dogs of folate were synthesized and then Agouron began with consideration of the
tested as inhibitors of tumor cell growth or of structure of the complex between the E. coli
the activity of various folate-dependent en- enzyme and 5-deazatetrahydrofolate (77). An
zymes (73-75). A recent paper reported the active and soluble fragment of a multifunc-
formation in situ of a potent bisubstrate ana- tional human protein that contained the
log inhibitor of GARFT, from glycinamide ri- GARFT activity was provided by recombinant
bonucleotide and a folate analog, apparently approaches (78), and its structure was also
catalyzed by the enzyme itself (76). The sub- solved (79) in complex with novel inhibitors.
strate analog was designed based on consider- Comparison of the two structures subse-
ation of enzyme structure and the GARFT quently validated the use of the bacterial en-
mechanism. This emphasizes the potential to zyme as a model for the human GARFT. The
exploit the interplay between binding and cat- design of novel inhibitors also relied on previ-
alytic events in the design of new inhibitors. ous studies of the structure-activity relation-
The development of GARFT inhibitors at ships (SAR) for substitutents around the core
2 Structure-Rased Drug Design
of (23),including some GARFT inhibitors in the 5-thia position were much less active. Sev-
which the ring containing N5 was opened (80). eral other analogs, such as (261, were made in
Inspection of the structure of the bacterial attempts to fill the active site more fully, and
GARFT-inhibitor complex revealed several to restrict the conformational flexibility of the
important features. The pyrimidine portion of linker. Molecular mechanics calculations
the pteridine was fully buried within the failed to correctly predict the conformation on
GARFT active site, forming many hydrogen the 5-thiamethylene group of (25) bound to
bonds with conserved enzymic groups. The D- GARFT because of unforeseen conformational
glutamate moiety was largely solvent exposed, flexibility of the enzyme revealed by an X-ray
with no immediately obvious potential for structure of this complex. This again empha-
building additional interactions. Retention of sizes the importance of interative experimen-
the D-glutamate unmodified was also desirable
tal confirmation of molecular designs. Several
for pharmacodynamic reasons. A significant
functional criteria in addition to GARFT inhi-
opportunity was presented by the fact that the
bition and cell-based assays were evaluated
active site might accommodate a bulkier hy-
drophobic atom than the methylene group in during the several cycles of optimization.
5-deazatetrahydrofolate that replaces the nat- These included the ability of exogenous purine
urally occurring N5 in tetrahydrofolate. To to rescue cells (which indicates selective
test this idea, a series of 5-thiapyrimidinones GARFT inhibition), and the ability of the in-
were synthesized, including compound (24). hibitors to function as substrates for enzymes
These analogs were more readily prepared involved in the transport and cellular accumu-
than the corresponding cyclic derivatives. lation of antifolate drugs. Balancing these cri-
This compound had a potency of 30-40 nM in teria has resulted in the choice of compounds
both a cell-based antiproliferation assay and a (26) and (27) (AG2034 and AG2037, respec-
biochemical assay for human GARFT inhibi- tively) for clinical development at Pfizer. (In
tion. A crystal structure of human GARFT, 1999, Agouron Pharmaceuticals was acquired
complexed with (24) and glycinamide ribonu- by Warner-Lambert, which was subsequently
cleotide, confirmed the structural homology acquired by Pfizer.) It is as yet unclear
between E. coli and human enzymes. whether the considerable toxicity of these and
Compounds with one fewer methylene in other GARFT inhibitors will allow these com-
the linker connecting the thiophenyl moiety to pounds to be acceptable as anticancer drugs.
(26) X = H
(27) X = methyl
2.4 Proteases 10.8). This model was based on the already

known X-ray structure of bovine pancreatic
2.4.1 Angiotensin-Converting Enzyme and carboxypeptidase A. Both enzymes are C-ter-
the Discovery of Captopril. The design of cap- mind exopeptidases that require zinc ion for
topril was a landmark in the application of activity, but differ in that carboxypeptidase A
structural models for developing enzyme in- releases an amino acid, rather than a dipep-
hibitors (81,82). This discovery rapidly led to tide. Hence, the binding site for the angioten-
the development of a family of therapeutically sin-converting enzyme was postulated to be
useful inhibitors of angiotensin-converting longer, and to contain groups to interact with
enzyme for the treatment of hypertension the central peptide linkage. The suggestion
(83). The story has been reviewed thoroughly had been made (87) that the inhibition of car-
(for a historical perspective, see either Ref. 84 boxypeptidase A by benzylsuccinate could be
or Ref. 85), and is briefly summarized here. explained by viewing benzylsuccinate as a "by-
Angiotensin 11, a circulating peptide with po- product analog" (Fig. 10.8, top). The hypothe-
tent vasoconstriction activity, is generated by sis was that one of the carboxylates bound into
the C-terminal hydrolytic cleavage of a dipep- a cationic site, whereas the other interacted
tide from angiotensin I, catalyzed by angioten- with the active site zinc. If this were true, then
sin-converting enzyme. Therefore, inhibitors a similar model for angiotensin-converting en-
of angiotensin-converting enzyme are vasodi- zyme predicted that slightly longer diacids, de-
lators. [An important aside: Angiotensin I is signed with some regard for the sequence pref-
generated from a precursor by the action of erences of the converting enzyme, should
renin, another exopeptidase that is an aspar- inhibit that enzyme. This hypothesis was
tyl protease. An orally available renin inhibi- quickly confirmed by the inhibitory activity of
tor remains an elusive goal, although there are succinyl-proline (28a).
still efforts under way that use SBDD methods Peptide sequences related to those of snake
(86). Renin inhibitors were early tools in the venom peptides had already been used to de-
study of the essential aspartyl protease of hu- fine the structural requirements for peptide
man immunodeficiency virus (HIV), which is inhibitors of angiotensin-converting enzyme.
discussed later.] Peptides are unstable in vivo and poorly ab-
Asp-Arg-Val-Tyr-Ile-His-Pro-Phe-His-Leu+ Asp-Arg-Val-Tyr-Ile-His-Pro-Phe + His-Leu

Angiotensin I Angiotensin I1
A key tool in the discovery of captopril at sorbed intestinally, and thus are not good drug
Squibb was the use of a model for the active candidates. However, the best peptide inhibi-
site of angiotensin-converting enzyme (Fig. tor was 500-fold more potent than (28a). The
2 Structure-Based Drug Design 433
substrate cleavage
---N
Yc , 0-
H 0
L Angi
Figure 10.8. Active site models for car-

boxypeptidase A (top) and angiotensin-
converting enzyme (bottom). The design
of the dipeptidyl derivative that led to the
0 discovery of captopril is shown bound to
the latter enzyme.
infornlation provided by the peptides, the maturation of infective virus particles, the
struct-ural model for the active site of angio- cleavage of polyprotein precursors to yield ac-
tensin-converting enzyme, and biochemical tive products. After this was demonstrated i n *
and tissue-based pharmacological assays for the mid to late 1980s, HIV-P became a target
the en zyme's function were used to guide an for the development of antiviral drugs to treat
iterative design process to improve the po- acquired immunodeficiency syndrome (AIDS).
tency, selectivity, and stability of small mole- Several HIV-P inhibitors have been approved
cules inhibitors. The R1 and R2 substitutents for human therapeutic use in the past 10
were optimized, and the zinc ligand was years, and the speed with which they were de-
changc3d to a thiol, which significantly in- veloped is attributed in part to the successful
creased potency (Table 10.2, compare 28a use of SBDD methods. There are excellent re-
with 18c). This process yielded the orally cent reviews of this area (88, 89). There are
availald e and stable small molecule captopril numerous reviews of the early work on HIV-P
(28d) within 18 months of the creation of the inhibitors (8,9, 90, 91).
model, HIV-P is a symmetrical homodimer of iden-
Thc following quotation [from the original tical 99 residue monomers, structurally and
research report (81) on the design of captopril] mechanistically similar to the pseudosymmet-
predicted the great promise of SBDD: "The ric pepsin family of proteases (92-941, whose
studie;s described above exemplify the great members include renin. Because the protease
heuristic value of an active-site model in the is a minor component of the virion particle,
design of inhibitors, even when such a model is intensive structural studies required overpro-
a hypc~theticalone." duction through recombinant DNA methods.
One of the first structures was determined
2.4.,2 HIV Protease. The aspartyl endopro- with material synthesized nonbiologically
tease e!ncodedby human immunodeficiency vi- (through peptide synthesis). As of June 2002,
rus (H:IV-P) catalyzes essential events in the there were over 100 X-ray structures repre-
Table 10.2 Key Compounds in the Development of Captopril

Compound Structure IC,, for inhibition of ACE ( p M l
(28a)(succinyl-proline) 330
sented by coordinate sets in the Protein Data surface area. The minor differences between
Bank, and many hundreds more have been de- the HIV proteases from two major strains of
termined in proprietary industrial studies. HIV (HIV-1 and HIV-2) are not addressed
The active site of the enzyme is C2 symmet- here. More significant are the HIV-P sequence
ric in the absence of substrates or inhibitors variants with much reduced sensitivity to ex-
(Fig. 10.9a),and contains two essential aspar- isting drugs that have evolved because of se-
tic acid residues (Asp25 and Asp25'). The en- lective pressure and the rapid mutation rate of
trance to the active site is partly occluded by the virus. The reader interested in the differ-
"flaps" constructed of two beta strands (resi- ences between the proteases from HIV-1 and
dues 43-49 and 52-58) from each monomer, HIV-2, or in the issues surrounding drug-re-
connected by a turn. In the absence of sub- sistant variants, is referred to Ref. 91 and Ref.
strate or inhibitor, the flaps seem to be rather 89, respectively.
flexible. Upon binding of inhibitors and pre- The early work on inhibition of HIV-P was
sumably of substrates, the residues within the much influenced by previous structural and
flaps undergo movements up to several ang- mechanistic work on pepsin and its inhibitors.
stroms to interact with the bound ligand (Fig. Both enzymes are thought to catalyze peptide
10.10). A single tightly bound water is ob- hydrolysis through a tetrahedral transition
served in the structures of most HIV-P-inhib- state, shown below as (29).The previous work
itor complexes, accepting hydrogen bonds
from the backbone amides of both flap resi-
dues Ile50 and Ile50' and donating to carbon-
yls of the bound inhibitors. This is referred to
as the "flap" water. Despite the presence of
this water and several tightly bound water
molecules on the floor of the active site, the
cavity also contains extensive hydrophobic
ucture-Based Drug Design
Figure 10.9. (a) Residues

in the active site of H N pro-
tease. The C2 axis that re-
lates the residues of the two
monomers is indicated. The
carboxylates of Asp25 and
Asp25' are the catalytic
groups. Not shown in this
view are several flap resi-
dues (Ile47/Ile47', Ile501 '
Ile501),which move in to in-
teract with inhibitors. (b)
Active site with bound (31)
[saquinavir (PDB code
1HXB)I. Note the asymme-
try of inhibitor binding. The
flap water that is shown
very close to saquinavir is
labeled W. See color insert.
- - inhibitors
ansition state mimics as pewin
the sequence of some cleavage sites for
.P led to the discovery at Roche of the R
5 versions of (30)as submicromolar inhib-
of HIV-P, with the R enantiomer being
?fold more potent (95). These inhibitors
oy a hydroxyethylamine moiety to re-
! the PI-P1' linkage that is normally
red (the scissile bond) with a stable group.
lead molecules were optimized without
Cbz-Asn-N
H J?? OH
(30)
C02-t-Butyl
dedge of the HIV-P crystal structure, to

uce (31)(Ro 31-8959, saquinavir, Forto- Saquinavir (31)was the first HIV-P inhib-
1. itor approved for human use. Figure 10.9B
Figure 10.10. Comparison of the

structures of HIV-P apoenzyme
monomer (top, PDB code 3PH.V)
and the complex between HIV-P
and (32) (U-85548; bottom, PDB
code 8HVP). The inhibitor is
shown as a ball and stick structure.
Note the rearrangement of the flap
residues; Ile50 is indicated for ref-
erence. The van der Wads surface
of Asp25 is shown in both struc-
tures. The flap water (red ball) is
also shown between Ile50 and
U-85548. In the bottom structure,
the locations of theN and C termini
of HIV-P are noted. See color in-
sert.
shows the asymmetrical binding mode of the HIV-P inhibitor drugs are less than ideal, the
molecule in the HIV-P active site. Because the search for better ones has continued. Many of
metabolic and pharmacokinetic characteris- the deficits arise from the large size and pep-
tics of this compound and several other early tidic nature of the inhibitors. Another early
(31) saquinavir, Ro 31-8959

Val-Ser -Gln-Asn-N Ile-Val
inhibitor was the modified octapeptide (32, ily site S,'. The optimal stereochemistry at the
U-85548) developed at Upjohn (96). hydroxymethyl center appears to be which-
This subnanomolar inhibitor was used to ever one will allow the interaction of the hy-
define the extensive hydrophobic and hydro- droxyl with both catalytic aspartates while ac-
gen bonding interactions available in the commodating the placement of inhibitor
HIV-P active site (97). A common feature in moieties in the S,, S,, S,', and S,' sites with
the binding of (31)and (32) to HIV-P is the minimal conformational strain on the inhibi-
interaction of the central hydroxyl group of tor (9).
the inhibitors with the carboxylates of both Both (31)(Fig. 10.9b) and (32) (Fig. 10.11)
Asp25 and Asp25'. This hydroxyl group re- bind to the HIV-P active site asymmetrically.
places a water molecule that likely binds be- However, after the X-ray studies of crystalline
tween these aspartyl side chains during pep- HIV-P apoenzyme revealed it to be a symmet-
tide hydrolysis by HIV-P. The inhibitors can rical dimer, C2 symmetric inhibitors were de-
therefore be seen as mimics of a "collected signed to take advantage of this structural fea-
substrate." The liberation of this water to ture (Fig. 10.12). Both alcohol diarnines and
bulk solvent probably contributes about 5 kcal diol diamines were examined. For example,
mol-I to the free energy of inhibitor binding, the C2 symmetric compound (33) (A-77003)
based on the studies by Rich and his colleagues was synthesized at Abbott and entered clinical
on similar inhibitors of pepsin (98,991. An in- trials as an antiviral agent for intravenous
teresting difference between (31)and (32) is treatment of AIDS (100).
that (31) has R stereochemistry at the hy- The X-ray structures of complexes between
droxymethyl center, whereas in (32) this is an HIV-P and diol diamine derivatives like (33)
S center. Part of the reason for this is that showed (101) that, although one of the hy-
when (31) binds to HIV-P, the decahydro- droxyl groups bound between the catalytic as-
quinoline ring system induces a conforma- party1 carboxylates and made contacts with
tional change in the protein, affecting primar- both, the second hydroxyl made only one such
Figure 10.11. Orthogonal views of

the complex between HIV-P and (32)
(U-85548).The view in panel a is ro-
tated approximately 90" (around the
long axis of the protein) from the
view in panel b. Van der Wads sur-
faces of Asp25, Asp25', and the flap
water (W)are shown. In panel b, the
solvent-accessible surface of the in-
hibitor is shown. See color insert.
, diol diamine
hydroxyethylene diamine
Figure 10.12. Design principle for C2 symmetric inhibitors of HIV-P and the related hydroxyeth-
ylene diamine scaffold.
contact. Thus the cost of desolvating the sec-

ond inhibitor hydroxyl upon binding is not
compensated by strongly favorable interac-
tions in the complex (8). This led to the dele-
tion of the second hydroxyl, as seen in com-
pound (34), another compound in this
program at Abbott. Further structural modi-
fications, to enhance solubility and metabolic
stability, were guided by the fact that the
"ends" of the protease-bound inhibitors were
relatively solvent exposed and made fewer
contacts with the enzyme (102). Deletion of a
d i n e residue (33 3 34) gave a smaller com- water (103). The compounds interacted with
pound, presumably aiding solubility and ab- HIV-P in a highly symmetrical fashion, as
sorption. The eventual product of this pro- they had been designed to do, with the urea +
gram was ritonavir (35,A-84538, ABT-538, or oxygen replacing the flap water. Compound
Norvir), which has been successfully launched. (36) was licensed to Triangle Pharmaceuti-
Another C2 symmetric HIV-P inhibitor, cals, and the mesylate advanced into Phase I
discovered at Dupont Merck is compound (36) clinical trials. Its future is uncertain after the
(DMP-450). This was one of a series of cyclic trials were put on hold because of animal tox-
ureas designed to interact with both the aspar- icity (http://www.tripharm.com/dmp45O.html).
tyl carboxylates and the Ile50 and Ile50' back- One of problems common to many of the
bone amides that hydrogen bond with the flap HIV-P inhibitors already discussed is their
(35) ritonavir
(37) indinavir
low solubility, which translates to low bio- tion of HIV-P by (38) was discovered by
availability. The discovery of (37) (indinavir, screening. Classical medicinal chemistry
L-735,524) was the result of the successful ap- methods allowed a reduction in size, and the
plication of SBDD at Merck to directly address discovery of an amino-2-hydroxyindan moiety
this problem. During an iterative optimization to replace the terminal dipeptide (correspond-
process, the physicochemical properties of ing to P,', thought to bind into the s,' site).
HIV-P inhibitors were modified within con- This approach (105, 106) resulted in the gen-
straints that were established structurally eration of (39)(L-685,434).Although (39) had
(104). Crixivan (the sulfate of 37) was success- a subnanomolar IC,, for inhibition of HIV-P,
fully launched for use as an antiviral drug. it also had very low aqueous solubility, like
The process leading to indinavir (Fig. most peptidomimetics. One way to improve
10.13) began with (381, a hydroxyethylene- solubility is to insert a charged functional
containing heptapeptide mimic, originally de- group into the molecule. The tertiary amino
signed as a renin inhibitor (105). The inhibi- group in the HIV-P inhibitor saquinavir (31)
Phenyl
Phenyl boc, OH -
OH
boc
%Leu-
,
Phenyl
Phenyl /
0
(boc= tert-butyloxycarbonyl)
4" Phenyl
\
(41
(cbz = benzyloxycarbonyl)
Figure 10.13. Structures of HIV-P protease inhibitors during the optimization process leading to
the discovery of (37) (indinavir).
was already identified. Piracy of the decahy- be accommodated by the S, pocket and to
droisoquinoline tert-butylamide from (31) further improve aqueous solubility, yielded (37).
provided the idea for the hybrid molecule (40). Several other approved AIDS drugs that
In addition to the charged group, use of this act by inhibition of HIV-P have also been de-
ring system would partly "preorder" the in- veloped through use of SBDD methods. Com-
hibitor's structure, lessening the entropic cost pound (42) (amprenavir, Agenerase, also
of binding. Molecular modeling was used with known as VX-478) is the most recent addition
known structures of HIV-P-inhibitor com- to the HIV-P inhibitors approved for human
plexes to evaluate this idea, and it was judged antiviral treatment, and differs significantly
to be reasonable enough to justify the synthe- from earlier inhibitors. Compound (42) was
specifically designed by Vertex scientists to
sis of (40) (104). This compound was subse-
minimize molecular weight to increase oral
quently shown to have much better pharma-
cokinetic behavior than its antecedents,
consistent with improved solubility and
dissolution.
A convergent synthetic route was devised
to generate (40) to improve the accessibility of
important analogs. Although (40) was an 8 nM
inhibitor of the isolated enzyme, better po-
tency was needed for acceptable cell-based ac-
tivity, and still better solubility characteristics
were needed. A method for structure-based
computational estimation of the interaction
energy for HIV protease inhibitors with the
enzyme was developed and used to help esti-
mate inhibitor potency before synthesis (107).
Variation of the group contributing the ter-
tiary m i n e led to the discovery of the pipera-
zine derivative (41) (L-732,747), which had
subnanomolar potency against HIV-P. The X- bioavailability (108). Compound (43) (nelfina- ,
ray structure of the HIV-P complex with (41) vir, AG-1343, also known as LY3128571, like
confirmed the binding mode predicted by mo- the precursors to the earlier drug (37) (indina-
lecular modeling, with the molecule filling the vir), copied the decahydroisoquinoline tert-bu-
S,, S,, S,', and S,' pockets, and the S, pocket tylamide group from the first marketed HIV-P
occupied by the terminal benzyloxycarbonyl inhibitor (31) (saquinavir). Compound (43)
moiety. Replacement of the benzyloxycar- was developed in a collaboration between sci-
bony1 with more polar heterocycles, chosen to entists at Lilly and Agouron (log), and is mar-
(42) amprenavir
keted by Pfizer as Viracept, the mesylate salt naphthylsulfonylglycy1)-4-amidinophenylala-

of nelfinavir. In both (42) and (43), the scien- nine piperidide] is a moderately potent inhib-
tists involved used iterative SBDD methods to itor of human thrombin, but was found to
alter the physicochemical properties of the have an unacceptably short plasma half-life in
drug molecule while maintaining potency by animals (115). However, (46) has been a use-
optimizing interactions with the active site of ful experimental tool and a variety of analogs
the enzyme. An important feature shared by have been made. The structures of (44) and
these compounds is the fact that the bound (46) bound to human thrombin show that they
inhibitors appear to be in low energy conform- bind somewhat differently, as shown in Figure
ers, so that minimal conformational energy 10.14b (112,116). However, both form hydro-
costs must be paid upon binding to the en- gen bonds with the backbone at Gly216 (part
zyme. of the oxyanion hole), and both fill the S, spec-
ificity pocket with a permanent cation at-
2.4.3 Thrombin. Thromboembolic diseases tached to an extended hydrophobic group.
such as stroke and heart attack are major Compound (46) was the starting point at
health problems, especially in many Western Boehringer Ingelheim for the development of
countries. This has led to searches for drugs the orally bioavailable prodrug (47) (BIBR-
that are effective inhibitors of various serine 1048) that generates in vivo a potent inhibitor
endoproteases in the blood-clotting cascade, of human thrombin (117). Compound (47) is
such as factor Xa and thrombin. Existing ther- currently in human clinical trials.
apeutic agents such as the coumarins (like Scientists at Boehringer Ingelheim used
warfarin), heparin, and hirudin have prob- the crystal structure of the complex between
lems related to their absorption or unpredict- (46) and human thrombin to design a replace-
able metabolism and clearance. Recently, new ment for the central bridging glycine moiety.
small molecule inhibitors of thrombin have The hypothesis that a trisubstituted benz-
become available for human use in the United imidazole could correctly place groups into the
States, including (44) (argatroban, MD-805, S,, S,, and S, pockets was confirmed. The first
developed by Mitsubishi) and (45) (melagat- such compound made was (48). The IC,, for
ran, developed by AstraZeneca) (110, 111). thrombin inhibition by (48) was only 1.5 pM,
These nanomolar inhibitors of human throm- but the compound had an improved serum
bin were optimized by classical medicinal half-life in rats. Determination of the cryst'al
chemistry, starting with peptidomimetics sim- structure of t h e thrombin-(48) complex
ilar to the thrombin cleavage site in fibrinogen showed that (48) binds in a similar fashion to
(see Fig. 10.14a). Poor absorption by an oral (46). The N-methyl on the benzimidazole fit
route requires that they must be administered into the P, pocket, and the phenylsulfonyl
intravenously or at best subcutaneously. At group extended into S,. The low affinity is
present, the only direct inhibitor of thrombin likely attributable to the fact that (48) forms
suitable for oral administration is ximelagat- no hydrogen bonds with the backbone of
ran, a prodrug form of melagatran in late de- Gly216. An iterative optimization process
velopment for various cardiovascular indica- (Fig. 10.15) was used to regain the lost affinity,
tions by AstraZeneca as of mid-2002. The eventually surpassing the thrombin affinity of
therapeutic need and the availability of high the starting point (46) (0.2 a).
quality crystal structures for human throm- Surprisingly, the N-methyl group could not
bin bound to inhibitors such as (44) make this be replaced with larger alkyl substituents, de-
an attractive target for SBDD (112). The sig- spite what appeared to be room for them in the
nificant efforts at Merck to use SBDD ap- P, pocket. However, replacing phenyl with
proaches to develop orally available inhibitors larger aryl groups such as naphthyl or quino-
of thrombin, which have yielded compounds linyl on the sulfonamide provided favorable
that have entered clinical trials, have been re- interactions in the P, pocket. The crystal
viewed (113,114). For a good overview of this structure suggested that the increased li-
area, see the review by Babine and Bender (9). pophilicity of such aryl groups could be bal-
Compound (46) [NAPAP, N-alpha-(2- anced by appending charged substituents to
(44)
argatroban
(45)
melagatran
NH2+
(46)
NAPAP
Figure 10.14. (a) Sequence in

fibrinogen at the thrombin
cleavage site (top), and struc-
tures of several inhibitors of
human thrombin.
the sulfonamide nitrogen. Such substituents viding a 10-fold increase in potency (com-
appeared likely to extend into solvent and pound 50). By this point, the structural basis
therefore to be tolerated without compromis- for interaction of this compound series with
ing affinity. This was confirmed (i.e., com- thrombin was understood sufficiently to sug-
pound 491, and this decreased the undesirable gest that the amidosulfonyl group could be re-
affinity for serum-binding proteins. X-ray placed by a carboxamide. This was confirmed
studies with some of the inhibitors at this by use of several compounds, such as (51).
point indicated that a longer linker between Compound (61)(BIBR 953) was quite active as
the central benzimidazole and the benzami- an anticoagulant in animals dosed intrave-
dine moiety in the S, pocket might provide nously, but required conversion to prodrug
some advantage. This was confirmed with sev- (compound 47) to mask its charge and allow
eral analogs, with the methylamino linker pro- oral dosing.
Figure 10.14. (b) Schematic

comparison of the binding in-
teractions for (44) and (46) in
X-ray structural models of
crystalline thrombin.
.
2.4.4 Caspase-1 Caspase-1 (interleukin developed as a caspase-1 inhibitory therapeu-
1-pconverting enzyme, or ICE) is a member of tic agent through use of SBDD in a collabora-
a family of cysteine proteases that catalyze the tion between Vertex and Aventis. Although
cleavage of key signaling proteins in such pro- the details of the discovery process have not
cesses as inflammatory response and apopto- been published, (52) probably functions as a
sis. Genetic methods have provided evidence prodrug. The cleavage of the lactone of (52)
supporting a role for caspase-1 in diseases would yield a hemiacetal that could hydrolyze
such as stroke (118) and inflammatory bowel to release ethanol and the aldehyde form of
disease (119). The X-ray structure of crystal- the drug, which then can form a covalent thio-
line human caspase-1 was solved in 1994 by acetal with the active site thiol of caspase-1,
several groups (120,121),and has been a valu- leading to pseudoirreversible inhibition. Clin-
able tool in intensive efforts to design potent ical trials of compound (52) as an anti-inflam-
and bioavailable inhibitors of the enzyme. matory agent for treatment of rheumatoid ar-
Compound (52) (pralnacasan, VX-740) was thritis began in 1999 (122). In April 2002, the
/
n-hexyl
0
(47) BIBR 1048
tj (51) (BIBR 953)

IC50 = 0.01 pM
Figure 10.15. Optimization of structure leading to the discovery of (51) (BIBR 953).
companies announced that these trials would vasiveness of tumor cells. There are publicly
continue and would be expanded to include available X-ray structures of enzyme-inhibi-
treatment of osteoarthritis. tor complexes for at least seven different
MMPs, as of this writing. Several detailed re-
2.4.5 Matrix Metalloproteases. Matrix metal- views of the SAR and binding modes for inhib-
loproteases (MMPs) are a large and diverse itors of matrix metalloproteases are available
family of zinc endoproteases. Several mem- (9, 123). All MMP inhibitors contain a moiety
bers of this family (such as the collagenases that binds to the active site zinc, such as the
and the stromelysins) are thought to have im- hydroxamates of (53) (prinomastat, AG3340)
portant roles in proliferative diseases, includ- and (54) (CGS-27023) and the carboxylic acid
ing arthritis, retinopathy, and metastatic in- of (55) (tanomastat, BAY 12-9566). These
(55) tanomastat, Bay 12-9566

(52) pralnacasan
based inhibitor design targeted against the
bacterial zinc-protease thermolysin. Com-
pound (55),with particularly high affinity for
the gelatinases, was also developed with con-
sideration of the structures of other MMP-
inhibitor complexes, but not through use of
iterative SBDD (127). The clinical trials of
compounds (54) and (55) have been sus-
pended because of their disappointing efficacy
(124). It remains somewhat uncertain which
(53) prinomastat, AG3340
MMP is responsible for specific diseases, and
the possibility for biological redundancy sug-
gests that inhibition of several MMPs may be
required for treatment of some diseases.
SBDD clearly could have a major impact on
the discovery of selective MMP inhibitors.
These could be useful tools in dissecting the
disease relevancy of these targets, as well as
providing the selectivity and bioavailability
required of effective drugs.
(54) CGS 27023 2.5 Oxidoreductases

Oxidoreductases catalyze the oxidation or re-
compounds each have affinities in the nano- duction of carbon-carbon, carbon-oxygen, or
molar to picomolar range for several MMPs. carbon-nitrogen bonds. Frequently, nicotin-
The inhibitory profiles and ongoing clinical amide cofactors are involved, with the oxi-
trials of a variety of drug candidates that in- dized and reduced forms (respectively, NADt
hibit MMPs were reviewed in 2000 (124). or NADP+ and NADH or NADPH) receiving
Compound (53)was developed at Agouron or donating the equivalent of a hydride during
through use of SBDD (125) and is under clin- this process. Nicotinamide-linked oxidoreduc-
ical investigation by Pfizer as an anticancer tases that have been targeted for the discovery
drug and as a treatment of proliferative reti- of new therapeutic agents include aromatase,
nopathy. Compound (54) is a stromelysin in- dihydrofolate reductase (mentioned above),
hibitor discovered at Novartis (1261, without aldose reductase, and inosine monophosphate
explicit structural guidance. However, the dehydrogenase. SBDD methods have been
lead molecule from which (54) was developed successfully applied recently to the latter two
was originally obtained by X-ray structure- enzymes to discover agents that are currently
in human testing. The efforts with these two sants. Other utilities that have been suggested
targets are described briefly below. for IMPDH inhibitors are antiviral and anti-
cancer therapies.
2.5.1 lnosine Monophosphate Dehydroge- The structure of hamster IMPDH in com-
nase. Proliferative cells such as lymphocytes plex with IMP and (56)was solved at Vertex in
have high demands for the rapid supply of nu- the mid-1990s (129). This allowed the visual-
cleotides to support DNA and RNA synthesis, ization of a covalent intermediate, in which a
as do viruses during their proliferative phase. cysteine thiol from the enzyme adds to C2 of
The first dedicated step in the de novo biosyn- the purine ring of the nucleotide substrate. An
thesis of guanine nucleotides is conversion of analogous covalent adduct is postulated to be a
inosinate to XMP, catalyzed by inosine mono- key catalytic intermediate during normal
phosphate dehydrogenase (IMPDH). turnover (130). The structure was a key tool in
the discovery of (57) (VX-497,merimepodip),a
IMP + NAD+ +XMP + NADH novel potent inhibitor of human IMPDH suit-
able for oral administration (131).
A prodrug form of (56) (mycophenolicacid), a An experimental screen of a diverse library
noncompetitive inhibitor of IMPDH, is ap- of commercially available compounds for in-
proved for human therapeutic use as an im- hibitors of IMPDH identified molecules with
the phenyl, phenyloxazole urea scaffold (58)
as weak inhibitors. Through use of the compu-
(56) mycophenolic acid
munosuppressant (mycophenolate mofetil,

CellCept).The use of this drug is hampered by tational program DOCK (1321, the initial in-
gastrointestinal side effects probably related hibitors were built as models into the ex~eri-
to the metabolism of the drug. A second class mental structure of the crystalline complex of
of IMPDH inhibitors is represented by the nu- IMPDH, IMP, and (56). Structural analogs
cleoside analog mizoribine (also known as bre- were generated to improve potency in an iter-
dinin), a prodrug approved for human use in ative process, guided by the structural model-
Japan. Such compounds competitively inhibit ing and the observed changes in potency for
IMPDH in vivo after phosphorylation (128). inhibition of human IMPDH.
These drugs validate the strategy of targeting After this process yielded compound (59),
IMPDH for the discovery of immunosuppres- with nanomolar potency, an X-ray structure
pRTNy /
(O,
N
(57) merimepodip
was determined of (59) bound to the hamster

enzyme with IMP. This revealed both similar-
ities and differences between the binding (60) tolrestat
modes of (56) and (59). Aryl groups of both
compounds pack against the covalently teth- agents that target aldose reductase should not
ered purine of the nucleotide. Several hydro- inhibit the closely related aldehyde reductase,
gen bonding and hydrophobic interactions an essential hepatic enzyme.
with the enzyme are also common between the The structure of (60) and other inhibitors
two inhibitors. However, there are several hy- bound to porcine aldose reductase (136) pro-
drophobic and van der Wads interactions seen vided a rich lode of information on the require-
in the complex with (59) that are not present ments for potent and selective inhibition of
with (56). Importantly, the urea moiety of (59) aldose reductase. This was mined by scientists
forms a network of hydrogen bonds with an at the Institute for Diabetes Discovery, in a
aspartyl carboxylate that is not present in the project that began in 1996. The Institute for
complex with (56).Further modification of the Diabetes Discovery filed an IND application
structure was guided by the X-ray study by use for (61) (lidorestat, IDD 676), a potent aldose
of (59), to gain potency in a cell-based assay for
inhibition of lymphocyte proliferation. This
provided compound (57), which Vertex has ad-
vanced into clinical trials for treatment of hep-
atitis C infections.
2.5.2 Aldose Reductase. Aldose reductase

has been implicated in many of the pathologies
resulting from elevated tissue levels of glucose
in diabetes mellitus (133, 134). This nicotin-
amide-dependent enzyme catalyzes the con-
version of glucose to sorbitol, accumulation of
which ultimately results in damage to the
eyes, the nervous system, and the kidneys.
Given the enormous damage caused by this (61) lidorestat
disease and the difficulty in regulating blood
glucose, selective and potent inhibitors of hu- reductase inhibitor, for treatment of diabetic
man aldose reductase offer great potential complications, within 30 months of initiating
benefit. However, existing drugs that target the discovery project on this target. The speed
aldose reductase have unreliable efficacy with which this was achieved appears in large
(135). For example, compound (60)(tolrestat) part because of the use of SBDD methods.
was withdrawn by Wyeth in 1996 because of The X-ray structures showed the cofactor
poor clinical response. Hence, there is still a NADP+ buried within the enzyme, with its C4
need to provide an inhibitor of this enzyme redox center exposed at the bottom of a deep
that fulfills the potential in the clinic. To min- hydrophobic cleft. An anionic binding site is
imize the risk of undesired toxicities, clinical located near NADPf. Several potent inhibi-
tors bind within the hydrophobic cleft and in- geted in SBDD projects that have produced
teract with the anionic site. The binding of compounds that are either launched or in clin-
potent inhibitors induces a conformational ical trials.
change, opening an adjacent hydrophobic
pocket. The conformation induced by (60) dif- 2.6.1 Acetylcholinesterase. A pronounced
fers from that caused by other, less selective decrease in the level of the neurotransmitter
inhibitors. This "specificity" pocket was acetylcholine is one of the most pronounced
thought to offer an opportunity for selective changes in brain chemistry observed in the
inhibition of aldose reductase while sparing sufferers of Alzheimer's disease (139). Several
aldehyde reductase. Hence, this structural drugs that are approved for the treatment of
study provided an initial pharmacophore for the dementia thought to result from this neu-
both potency and selectivity. rotransmitter deficit act by inhibiting acetyl-
The SAR for this pharmacophore was de- cholinesterase. These include (63) (tacrine, or
veloped with a series of synthetically accessi-
ble salicylic acid derivatives that were scored
for potency and selectivity with the purified
enzymes, and efficacy in a diabetic rat model
(137). One of the most potent and selective of
the derivatives was (62), containing the benz-
(63) tacrine
(64) donepezil
thiazole heterocycle. The SAR was employed, Cognex, a Pfizer drug that was the first such
guided by the structures of selected inhibitor agent approved for this indication), (64) (don-
complexes, to design a novel indole scaffold to ezepil), and (65)(rivastigmine). Several other
present the pharmacophoric elements (M. Van agents are in clinical trials. Disappointing ef-
Zandt, personal communication). The optimi-
zation of this series provided the clinical can-
didate (61) (138).
2.6 Hydrolases
Some other hydrolytic enzymes, in addition to
proteases, that are important drug targets in-
clude protein phophatases, phosphodiester-
ases, nucleoside hydrolases, acetylhydolases, (65) rivastigmine
glycosylases, and phospholipases. Structure-
based inhibitor design is currently being ap- ficacy is observed with the existing drugs, aris-
plied to a number of these enzymes. The last ing from dose limitations that are likely attrib-
three mentioned have been successfully tar- utable to the inhibition of acetylcholinesterase
in peripheral tissues (140). This may be a con- ing site. The length of both alkyl linkers was
sequence of the high serum levels required to varied, and the effect of adding a third alkyl
get these highly cationic molecules to pene- substituent was examined. The phthalimide
trate the blood-brain barrier. portion of the structure was chosen to improve
In a discovery project that is reminiscent of the synthetic accessibility of the analogs
the discovery of captopril, scientists at Takeda needed for this exercise. The compounds were
created a hypothetical structure for the active tested not only for inhibitory potency toward
site of acetylcholinesterase, based on SAR rat cerebral acetylcholinesterase, but also for
from previous biochemical and medicinal peripheral response and toxicity in dosed in-
chemical work (141). The model consisted of tact rats. After the work was under way, Suss-
(in addition to the serine protease-like cata- man and coworkers solved the atomic struc-
lytic machinery) an anionic binding site sepa- ture of acet~lcholinesterase
- from the electric
rating two discrete hydrophobic binding sites. eel, including complexes with several inhibi-
This model was then used to design inhibitors tors, by X-ray crystallography (143). The
of the enzyme (reviewed in ref. 142). One set of availability of this structure made it possible
analogs examined were based on the N-(w- to retrospectively analyze the basis for the
phthalimidylalky1)-N-(w-phenylalky1)-amine S A R in this series of compounds, by use of
(scaffold 66). An iterative process of testing, DOCK (144).
2.6.2 Neuraminidase. Influenza virus in-

fections cause severe human suffering
throughout the world and economic damage in
the billions of dollars annually, although some
years are worse than others. In 1918 a pan-
demic caused by this disease killed an esti-
mated 40 million people (145). An important
protein in the infectious process is the viral
neuraminidase, an integral membrane protein
whose catalytic domain is exposed on the viral
analysis, design, and synthesis, by use of this surface. Neuraminidase catalyzes the hydrp-
and closely related scaffolds, resulted in the lytic cleavage of sialic acid (68, N-acetylneur-
production of (67) (TAK-147), which is cur- aminic acid) from glycoproteins and extracel-
lular mucin on the surface of the host cell. A
different viral surface protein tightly binds to
terminal sialic acid residues. which ~romotes
the initial infection, but prevents release of
viral progeny from the host cells, unless and
until the terminal sialic acids are hydrolyti-
cally cleaved by viral neuraminidase. Thus,
neuraminidase enables the infection to
propagate.
The first X-ray structure of influenza neur-
aminidase was determined in the early 1980s
(146).Ten years later, a landmark paper (147)
rently in clinical trials for treatment of the described a highly efficient drug design project
dementia resulting from Alzheimer's disease at Monash University in Australia. This
(142). project yielded antiviral compound (69) (zana-
The design of (66) was partially based on mivir, Relenza, or Flunet), which was devel-
the structures of previously known inhibitors. oped into one of the first drugs to be created
The two aryl substituents were intended to through use of SBDD. Previous structural
bind to the hydrophobic binding sites, placing work had revealed that the active site of neur-
the central m i n e cation into the anionic bind- aminidase has several rigid pockets and nu-
i&
1 2 Structure-Based Drug Design
(68) sialic acid
(70) Neu5Ac2en, DANA
neuraminidase active site. In the case of the

merous charged groups. Electrostatic interac- guanidine substitution, the binding affinity
tions significantly affect the conformation of for neuraminidase was increased about 5000-
bound sialic acid, which is deformed into a fold and provided (69), which inhibits viral re-
high energy conformer, attributed in part to lease in cell cultures and decreases the sever-
the interactions between the 1-carboxylate ity of influenza virus infections in humans.
and arginine side-chains of the protein. This Subsequently, the X-ray structures of neur-
deformation may play a key role in catalysis. aminidases from several different influenza
Synthesis of a sialic acid analog that is dehy- subtypes complexed with (69) were analyzed
drated across the C2-C6 bond of (68)had pro- (149). Although the positions of protein resi-
vided the putative transition state mimic (70) dues were well conserved, the water structure
(sometimes referred to as Neu5Ac2en, or seen in these different complexes was quite
as 2-deoxy-2,3-dehydro-N-acetylneuraminic variable. This may explain the varying po-
acid, DANA). tency of (69) against different strains of virus.
Compound (70) inhibits neuraminidase One problem with (69) is that it is not well
with micromolar potency (148). Examination absorbed by an oral route, and so must be ad-
of the binding mode of (70) in the active site of ministered as an aerosolized powder inhaled
neuraminidase (Fig. 10.16) led to the replace- into the virus-infected lungs. Two other neur-
ment of the 4-hydroxyl by cationic groups, aminidase inhibitors with nanomolar affini-
first an amino and then a guanidino group ties (71 and 72) have been developed through
(147). These groups strongly interact with an- the use of SBDD methods to yield orally bio-
ionic amino acid side chains (corresponding to available drugs. The development of these
Glu120 and Glu229 shown in Fig. 10.16) in the agents was facilitated by the fortuitous discov-
ery by scientists at Biocryst, that analogs of

(69) in which the cyclic scaffold is a phenyl
moiety are much more potent inhibitors if
they lack the glycerol side chain! This was
subsequently discovered by X-ray structural
studies to be attributed to the creation of an
unanticipated hydrophobic pocket upon rear-
rangement of the Glu278 side chain carboxylic
acid, which forms several hydrogen bonds
with the glycerol portion of (69) (Fig. 10.16).
Replacement of the permanently cationic
guanidine by an m i n e (71) promoted better
intestinal absorption, but also greatly de-
creased the affinity for neuraminidase. Struc-
ture-guided modification of the carbocycle's
substituents was used to recover this lost po-
tency. Compound (71) (GS 4071) was devel-
oped by Gilead Sciences (150). The ethyl ester
of (71) is a prodrug (oseltamivir or GS 4104)
Figure 10.16. View from above: Polar amino acid
side-chains surrounding ('701,bound into the active
that has been approved for oral dosing to treat
site of influenza virus neuraminidase (Scheme 10.1 influenza infection. Another amphiphilic car-
based on PDB code lNNB, the coordinates of an bocycle, compound (72) (peramivir, RWJ-
X-ray structure described in Ref. 148). 270201, or BCX 1812) was developed by Bio-
Cryst (151) through use of SBDD, and is in
clinical trials. The use of clever synthetic
routes, biochemical assays for neuraminidase
inhibition, a mouse infection model, and X-ray
structural information were all valuable tools
in the development of both (71) and (72). Op-
timization of the affmity required the exami-
nation of avariety of alkyl substituents in bdth
cases, to exploit the new hydrophobic pocket
created by the conformational change primar-
ily involving Glu278. The ability of the cyclo-
pentyl ring in (72) to replace the six-mem-
bered ring illustrates that differing central
scaffolds can display the essential interacting
groups in an effective way.
2.6.3 Phospholipase A2 (Nonpancreatic, Se-

cretory). Phospholipases A2 (PLA2s) are a di-
verse family of hydrolases that cleave the sn-2
ester bond of phospholipids. The fatty acid
produced is frequently arachidonate, the pre-
cursor to the proinflammatory eicosanoids. In
several human inflammatory pathologies
(e.g., septic shock, rheumatoid arthritis), a
nonpancreatic secretory form of PLA2 (hnps-
PLA2) is present in extracellular fluids at lev-
els many-fold higher than normal (152). The
design of bioavailable inhibitors of this Ca2+-
(72) BCX 1812 dependent isoform of PLA2 as inflammatory
1
!
drugs is therefore an attractive goal (153). To

be an effective drug, such an inhibitor would
also need to be selective for hnps-PLA2 vs. the
closely related pancreatic PLA2. Whether se-
lectivity is needed against the quite different
-
cytosolic PLA2 is unclear.

Investigators at an AstraZeneca laboratory
(previously Fisons) have used multidimen-
sional NMR and computational techniques to
develop an active site model for cytosolic PLA2
(154, 155). Synthesis of compounds based on (75) indomethacin
this model led to (73) (FPL-67047), reported
The crystal structures of recombinant
hnps-PLA2 bound to (74) and (75) were solved
(158), and compared with the previously
known structures of PLA2s complexed with
substrate mimics (159, 1601, including the
phosphonate-containing transition state ana-
log (76). The earlier structures revealed sev-
to be a development candidate for treatment

of inflammation (156).
Investigators at Eli Lilly began a project to
develop PLA2 inhibitors by investing the ef-
fort to clone, overproduce, purify, crystallize,
and determine the structure of hnps-PLA2 (76) hnps-PLA2 transition-state analog
(157). This also provided the reagent needed
for a massive screening campaign to identify
hnps-PLA2 inhibitors. They were thus pre- era1 key features. These were: (1)the filling of
pared to apply SBDD methods when the a significant hydrophobic crevice, (2) the dis-
screening of Lilly's small molecule collection placement (by the sn-2 alkyl moiety) of the
yielded a weak inhibitor. The hit (74) was sur- His6 side-chain into a solvent-exposed posi-
tion to create an adjacent cavity, (3) the coor-
dination of the active site calcium, and (4) for-
mation of hydrogen bonds to His48 and Lys69.
The polar contacts were provided by the non-
bridging phosphate and phosphonate oxygens
in the complex with (76).
The screening hit (74) bound in the hydro-
phobic crevice, similarly to the substrate mim-
ics, with the 1-benzyl moiety of (74) bound in
the adjacent cavity and displacing the en-
zvme's
" His6 imidazole. However, there were
two surprising findings. First, despite the
presence of 10 mM calcium in the crystalliza-
tion liquor, there was no bound calcium, an
prisingly similar to indomethacin (751, a nonessential active-site component, although
steroidal anti-inflammatory drug that acts by weakly binding (K,= 1.5 rnM). Second, the
inhibiting cyclooxygenase. carboxylic acid of (74) formed a hydrogen
bond with another active-site acid, the side LY315920), which has 6500-fold greater
chain of Asp49. The latter finding again em- ity for hnps-PLA2 than did the original hit
phasizes the importance of experimental molecule (74). LY315920 effectively inhibits
structures to guide improvements of inhibitor hnps-PLA2 in the serum of transgenic mice
potency, given that placing two presumed an- dosed with the compound orally or i.v., and is
ions so close together would likely never have undergoing clinical trials in the United States
been predicted by a computational model. and Japan (162,163).
Other slight conformational changes were ob-
2.7 Picornavirus Uncoating
served to accommodate the 5-methoxy group
of (74). Picornaviruses, which include the rhinovi-
The inhibitor's 3-acetate moiety was con- ruses and enteroviruses. are RNA viruses that
verted to an acetamide in a successful attempt cause several infectious human diseases.
to restore the active site calcium, form a hy- These diseases include common colds as well
drogen bond to His48, and increase potency. as life-threatening infections of the respira-
The crystal structure of the complex with the tory and central nervous systems. Effective
amide version of (74) also revealed a signifi- treatments of these diseases would relieve
cant reorientation of the indole core and 5-me- much human suffering, save many lives, and
thoxy substituent, resulting in an unantici- have great economic benefit. There are over
pated 5-A movement of the terminal methyl. 100 serotypes of rhinoviruses alone, making it
Further changes in inhibitor structure were impossible to generate a vaccine effective
guided by iterative structural studies and against infections by all variants of the virus
functional assays of potency and selectivity. (164).
These changes involved the use of substitu- The Achilles heel of ~icornaviruseshas
ents at positions 3 or 4 to optimize coordina- been suggested to be that part of the virus
tion of the metal ion, extension of the van der structure that interacts with the cell surface
Wads interaction by lengthening the receptor because those structural features
2-methyl to an ethyl, and conversion of the must be well conserved (165). The virus parti-
3-acetamide to glyoxamide (159,161). This re- cle consists of a positive-strand RNA coated by
sulted in the synthesis of (77) (compound an icosahedral shell, containing 60 copies of
four distinct 0-barrel proteins (166). Thege
structural proteins contain the binding site
for the cellular receptor and undergo signifi-
cant conformational changes to liberate the
viral RNA genome during infection of the cell.
A series of isoxazoles that inhibit this picoma-
virus "uncoating" process were discovered in
the early 1980s by scientists at Sterling
Winthrop, by use of an in vitro cellular assay
for antiviral activity (167-170). One of these,
compound (78) (WIN-51711, disoxaril), gave a
50% suppression of viral plaque formation in
this assay at 0.3 $. Compound (78) was also
effective in animal models (171) and entered
phase I clinical trials, but failed to advance
because of its toxicity. Compound (78) was
shown (172) to bind to viral capsid protein
(78) WIN-51711,disoxaril
ure-Based Drug Design
Figure 10.17. Structure of

rhinovirus capsid protein VP1
showing the bound conforma-
tion of antiviral isoxazole com-
pounds (78) [disoxaril, WIN-
51711: panel a, top], (79) W N -
54954: panel b, middle], and
(80) [pleconaril, WIN-63843:
panel c, bottom]. The PDB
codes for the X-ray structural
model coordinates used to cre-
ate these views are: lPIV (for
781, 2HWE (for 79), and 1C8M
(for 80). On the left side of each
panel, the inhibitors are shown
as van der Wads surfaces, and
the protein as a ribbon diagram.
On the right side, the struc-
tures of the inhibitor alone are
shown, from the same view, as
ball and stick representations.
See color insert.
ithin a hydrophobic pocket in the floor ance potency and selectivity, and the struc-
*
canyon" that contains the binding site tural information helped to guide compound
cell surface receptor (Fig. 10.17A). design in pursuit of this balance.
ral changes induced in the canyon A second-generation compound, (79)
on binding of such molecules may also (WIN-54954) also advanced into clinical tests,
receptor binding directly (173). X-ray but had disappointing efficacy in Phase I1 tri-
ographic studies of (78) and analogs als, probably because of extensive metabolism.
;o the target protein VP1 were an es- Modification of the phenylisoxazole, guided by
part of the iterative optimization pro- both structural and metabolic considerations
~tled to safer and more effective anti- (177), allowed the creation of a stable and po-
znts (174-176). The goal of the process tent antiviral, the third-generation compound
generate a compound that is potent, (80)(WIN-63843, pleconaril, or Picovir) (178).
dly and metabolically stable, and effec- This compound was evaluated in Phase I11
inst as many serotypes of the virus as clinical trials and showed efficacy in humans.
!. There was therefore a need to bal- Oral dosing of virally infected patients with
(80) VP63843, pleconaril
(80) three times daily decreased the average oxidants. MAPK p38a has a central role in
time needed to become free of cold symptoms integrating the inputs from a complex signal-
from 10 days to between 8 and 9 days, and also ing network. Activation of MAPK p38a re-
reduced the duration of severe cold symptoms quires the dual phosphorylation of conserved
from 4.5 to 3.5 days (179). During the clinical threonine and tyrosine residues on a loop near
studies to support the new drug application the enzyme's active site (180). The unacti-
for (80)' about a quarter of the clinical isolates vated (nonphosphorylated) enzyme has a very
(of rhinovirus present initially or during the low affinity for ATP, but can bind to pyridinyl-
treatment) were resistant to the compound. imidazole inhibitors (181, 182). The activated
The majority of these resistant viruses had a enzyme in turn phosphorylates numerous
single mutation at VP1 residue Ile98, which substrates, including several transcription
directly interacts with (80) bound to VP1 in factors. This leads to activation of the tran-
wild-type virus. The clinical data also showed scription of many genes and causes the release
the elevation in some patients of hepatic cyto- of proinflammatory cytokines, primarily in-
chrome P450 levels during treatment with terleukin-lp (IL-1p) and tumor necrosis fac-
(go), raising concerns about potentially haz- tor (TNFa). MAPK p38a was identified as a
ardous drug-drug interactions. ViroPharma central player in this inflammatory pathway
sought and failed in early 2002 to gain the in a key study by scientists at SmithKline
approval of the U.S. Food and Drug Adminis- Beecham (183). The study involved the molec-
tration for its new drug application for (80)for ular cloning of the genes encoding proteins
treatment of the common cold. that bind to anti-inflammatory pyridinyl-imi-
dazole compounds already known to block the
2.8 Phosphoryl Transferases
biosynthesis of IL-1p and TNFa. The binding
Protein kinases and phosphatases play vital proteins turned out to be members of a known
roles in intracellular signaling pathways and kinase family. Since this finding, the enzymes
in the integration and control of major cellular in the MAPK pathway, and especially MAPK
processes. Kinases and other phosphoryl p38a, have been attacked by many scientists
group transferases are essential in the metab- seeking to discover anti-inflammatory drugs
olism of lipids, nucleotides, and other small (184).
biomolecules. The use of SBDD methods on Compound (81) (SB 2035801, a specific in-
such targets has expanded as more of their hibitor of MAPK p38a, is a prototype for the
X-ray structures have been solved, and will pyridinyl-imidazole compounds (185). This
continue to grow as more targets are validated compound is active in animal models of several
for their involvement in human diseases. inflammatory diseases (186),but was not itself
pursued as a clinical candidate because of its
2.8.1 Mitogen-Activated Protein Kinase p38a inhibition of other enzymes, including hepatic
Mitogen-activated protein kinase (MAPK) cytochrome P450 reductases. The pyridinyl-
p38a is a member of a family of SerIThr-spe- imidazole compounds have dissociation con-
cific protein kinases that are activated upon stants for MAPK p38a in the nanomolar
exposure of cells to mitogens such as bacterial range, competing with ATP for binding to the
lipopolysaccharide or environmental stresses enzyme. Because these compounds bind
such as exposure to W irradiation or chemical tightly to the unactivated enzyme, which has a
the inhibitor binding pocket. This structure

suggested that ThrlO6 is an important struc-
tural determinant of the selective inhibition of
MAPK p38a by the pyrimidyl-imidazoles,
which have low affinity for other closely re-
lated kinases. Mutation of ThrlO6 results in
the loss of sensitivity to these inhibitors,
whereas the replacement of the corresponding
residue in another kinase (ERK2) by threo-
nine caused the mutated variant to become
sensitive to these inhibitors (189, 190).
The X-ray structural models were also used
at both SmithKline Beecham (later Glaxo-
SmithKline) and Vertex to guide the design of
new inhibitors. For example, both N1 of the
central imidazole and the 2-(para-methyl-
sulfony1)-phenyl substituent in enzyme-
low affinity for ATP, they are able to compete bound (81)face a channel that opens to bulk
effectively even i n vivo, where the ATP con- solvent. This observation led to the design of
:entration is in the millimolar range. The X- (82) (VK19911) at Vertex (181) and (83)
ray structures of (81)and several other pyridi- (SB242253) at GlaxoSmithKline (191). Com-
nyl-imidazole compounds in complexes with pound (83)is fivefold more potent than (81)in
hman MAPK p38a were solved in a collabo- vivo, in a mouse disease model, and was ad-
.ative effort between scientists at SmithKline vanced into human clinical trials for treat-
3eecham and the University of Texas (187). ment of rheumatoid arthritis (192). The piper-
several X-ray structures of human MAPK idine on N1 of (82) and (83)was designed to
13thwith and without bound inhibitors have form a salt bridge with Asp168. This interac-
dso been solved by scientists at Vertex (181, tion, and the preservation of other binding in-
.88). teractions, was directly demonstrated (181)
The structures of the inhibited enzyme for compound (82).
vere useful in understanding what parts of Analysis of the structural information from
he compounds were responsible for strong the X-ray models allowed the design at Vertex
binding to MAPK p38a. As shown in Fig. of a new scaffold for potent inhibition of
0.18, both hydrophobic and hydrogen bond- MAPK p38a, as shown for compound (84)
ng interactions are important components of (VX-745). This design process, SBDD through
Figure 10.18. Binding of SB203580 (shown a s a ball and stick structure) in the active site of W K
p38a. In addition to the side chains of the labeled residues, the protein backbone between Leu104 and
Met109 is shown, as well a s several aliphatic side chains and a water molecule (red sphere). Hydrogen
bonds (dotted lines) are shown between the backbone amide of Met109 and the inhibitor's pyrirnidi-
nyl nitrogen, and between the €-aminoof Lys53 and the inhibitor's imidazole N3. This figure is based
on the PDB coordinate set 1A9U (187). See color insert.
from earlier inhibitors. This compound, whose

K, for MAPK p38a is 100 picomolar, has en-
tered phase I1 clinical trials for treatment of
rheumatoid arthritis. The lead compound that
use of a crystal structure of MAPK p38a to led to compound (85) was a diary1 urea origi-
design potent inhibitors with potential utility nally identified by high throughput screening.
as human therapeutics, is the subject of an X-ray structural studies revealed novel modes
international patent application by Vertex, of binding for both the lead compound and
published in 2000 (193). The binding mode for (85) in the active site of MAPK p38a. Their
(84) has not been disclosed, but the compound binding sites are adjacent to the active site but
was advanced into clinical trials (194). Vertex do not directly overlap with that of ATP;
has since discontinued the clinical trials of rather, their binding mode changes the con-
(84) because of the potential for toxicity, based formation of MAPK p38a such that ATP can-
on animal data, but in mid-2002 Vertex began not bind. The optimization of the lead com-
a phase I clinical study of a new compound pound to clinical candidate (85) was an
targeted against MAPK p38a. iterative process using clever synthetic chem-
Scientists at Boehringer Ingelheim re- ical design, biochemical assays for affinity, X-
cently described (195, 196) their discovery of ray crystallographic studies of key complexes,
an orally active inhibitor of MAPK p38a, com- and cell-based and animal models. The devel-
pound (85) (BIRB-7961, that is very different opment of (85)as a MAPK p38a inhibitor with
efficacy in vivo makes it evident that there are Scientists at BioCryst, CIBA-Geigy, Southern
multiple ways to effectively inhibit this en- Research Institute, and the University of Ala-
zyme. bama collaborated to design inhibitors of hu-
man PNP (199,200). The project used an iter-
2.8.2 Purine Nucleoside Phosphorylase. ative process, in which new compound design
Purine nucleoside phosphorylase (PNP) cata- was guided by synthetic considerations, com-
lyzes the reversible phosphorolysis of purine puter graphics analysis of X-ray structural
nucleosides to the purine base and ribosyl or models, computational (Monte Carlo and en-
2-deoxyribosyl-a-1-phosphate. ergy minimization) methods, and the inhibi-
The vital role of PNP in the proliferation of tory potency of the compounds against PNP in
T-cells is evident from the fact that people vitro. Evaluation of the most potent inhibitors
with an inherited deficit in this activity have by use of cell-based assays, followed by phar-
30- to 100-fold lower numbers of T-lympho- macokinetic and pharmacological character-
cytes than normal (197). The accumulation of ization of several inhibitors in animal models,
dGTP and the resulting inhibition of ribonu- led to the choice of (86)for advancement into
cleotide reductase in PNP-deficient T-cells clinical trials. Compound (86) (BCX-34, pelde-
causes the suppression of T-cell proliferation. sine) is being evaluated for treatment of psori-
B-lymphocytes are unaffected. Hence, small asis and skin cancer (201,202).
molecule inhibitors of PNP could be used to
treat T-cell lymphomas and other T-cell-me-
diated diseases such as psoriasis. Adjunct
therapy with PNP inhibitors could also block
the catabolism of therapeutically useful nucle-
oside analogs.
Human PNP is a homotrimer of 32-kDa
subunits. The X-ray structures of the apoen-
zyme and some substrate analog complexes
were described in 1990. Each of the three iden-
tical active sites, located near the subunit in-
terfaces, are composed primarily of residues
from one subunit, with Phe159 participating
(86) peldesine
in the active site of the adjacent subunit (198).
+
PNP
In the SBDD project that produced (86),

the design work was initiated through use of
the X-ray structure of the PNP apoenzyme,
but was more successful when the structures
of the PNP-guanine complex and other com-
plexes were available (199). The PNP-gua-
nine crystal structure showed no important
interactions with N9, and indicated a poten-
tial for hydrophobic interaction in the vicinity
of the substrate ribose (Fig. 10.19). To test
this, the 9-deaza compound (87) was synthe-
sized. This was a weak PNP inhibitor (mea-

sured IC,, -l fl.
The X-ray structure of the complex be-
tween PNP and (87) showed that the hydro-
phobic interaction dominated the binding were made. These had affinitiesfor PNP in the
mode, and resulted in the disruption of the nanomolar range. X-ray crystallographicanal-
hydrogen bonding interactions seen in the ysis indicated new hydrogen bonding interac-
guanine complex (i.e., Fig. 10.19). To increase tions with these 9-deaza compounds (shown
the spacing between the hydrophobe and the for 89 in Fig. 10.20),made possible because N7
purine mimetic, compounds (88) and (89) is protonated.
Asn243
Lys244
NH3+ H
-..
--
..
Guanine
Figure 10.19. Binding interactions Phe200

in the active site for the complex be- Phe156
tween guanine and PNP.
Figure 10.20. Binding interactions

in the active site for the complex be-
tween PNP and (89).
While this work was under way, a Phase I and superior solubility and pharmacokinetic
clinical trial was undertaken of PNP inhibitor properties, and so was advanced into human
(90) (PD-119229), which was developed by testing.
2.9 Conclusions and Lessons Learned

The projects in which SBDD has been applied
to enable the discovery of new drugs and clin-
ical candidates have provided significant les-
sons for future investigators. Some of these'
lessons learned are summarized here. Much of
the credit for the summary presented here be-
longs to Michael Varney of Agouron, who pro-
vided a copy of a presentation that he made in
1998 to a medicinal chemistry symposium con-
cerning the lessons learned in 10 years of use
of SBDD methods.
Experience Matters. In every aspect of
other workers. This led to an exploration of a SBDD, as in all technical fields, there is no
series of 8-mino, 9-deaza derivatives, al- substitute for experience. Given the variety of
though the hydrogen bonding for the simpler different techniques that must be incorpo-
9-deaza compounds turned out to be superior rated, this means that experience from several
(reviewed in Ref. 202). It may be that com- different people will be needed for optimal
pounds such as (90) suffer from unfavorable function of a discovery project team. Essential
steric interactions between the 8-amino group expertise is needed in X-ray crystallographic
and the proximal methyl group of Thr242 of studies, graphical display of experimental re-
the enzyme, or that the energetic cost of dehy- sults, initial and iterative design of com-
drating the 8-amino group cannot be fully re- pounds and synthetic tactics, the creation of
paid by interactions with the enzyme. Other databases and database queries, and the anal-
chemical series were also explored, but com- yses of search outputs and of the results of
pound (86) had an acceptable safety profile computational simulation experiments.
Combine and Integrate Technologies. Dedi- development because compounds with very
cated molecular biology and protein chemistry low solubility have limited or variable bio-
personnel and equipment are essential for availability.
identifying the right constructs for crystalliza- Binding Sites Can Be Filled Many Ways.
tion and to the assurance of a steady supply of More than one small molecule scaffold can
protein. Synthetic chemists trained in graphi- provide the necessary and sufficient hydro-
cal analysis of protein structures tend to be phobic and polar complementarity to generate
excellent designers, and will be unlikely to de- potent inhibition. Sometimes, there are many
sign molecules that they cannot make. Early scaffolds that will work. However, the struc-
tactical integration of the synthetic ap- tures of complexes with all the different scaf-
proaches is even more important if combina- folds will likely have common features that
are distinct from the structure of the apo en-
torial chemistry is part of the program. The
zyme, attributed to large-scale conformational
structural information can be used to design
changes that occur upon binding any ligand.
combinatorial libraries as effectively as it can
The most useful X-rav " models to use for the
to design molecules one at a time. The use of design of new compounds will be those that
libraries can compensate for the inaccuracies already have some substrate or inhibitor
inherent in current computational scoring al- bound. There are several ways to design these
gorithms. More significantly, the integration compounds: modification of existing inhibi-
of orthogonal technologies will stimulate cre- tors, de novo creation of novel inhibitors, or
ative thought and yield much more than the some combination of these methods.
sum of the different technologies applied sep- Not All Inhibitors Are Drugs. Having the X-
arately. ray structure of the targetprotein, or even
Go Big Early and Often. Filling active site having used the solved structure to design a
space as much as possible will maximize the potent inhibitor, is only the beginning of solv-
chance that a compound will be a potent inhib- ing the difficult problems of drug design. The
itor. During compound design, it should also use of structure to create ~ o t e ninhibitors
t can
be recognized that proteins are flexible, and certainly shorten the time to get compounds
that accessible conformations are hard to pre- into human testing, but use of SBDD methods
dict. Sometimes, larger functionality can be does not guarantee that a potent compound
accommodated than the existing structural will become a drug. This is an old lesson, actu-
model permits. A few compounds should be ally, but is forgotten at great cost.
included to probe this. These may give rise to Structure of Free Inhibitor Is Important. De-
an unexpected boon, such as access to a signif- solvation of the free ligand and of the protein's
icantly altered new protein conformation with active-site groups upon complex formation are
novel sites that can be exploited in new rounds both significant. Both enthalpic and entropic
of design and synthesis. contributions to the binding energy must be
Aqueous Solubility is Critical to Success. considered. Particular attention should be
Both early in SBDD and later on in clinical paid to the advantage that can be gained from
u
development, sufficient aqueous solubility is preorganization" of the inhibitor before
critical. Solubility is important early because binding, that is, low energy conformers bind
the concentrations of compounds must be with greater apparent avidity.
high during crystallization experiments to sat- Bound Water Is Special, But Not All Hydro-
urate the high levels of protein. The ratio of gen Bonds Are Created Equal. Each of the
the solubility to the inhibition constant of a tightly bound waters present in an X-ray
compound is also critical to the success of the structural model has a uniaue environment
crystallization experiment. Once some struc- and a unique function. In some cases, libera-
tural information becomes available, both pation of a bound water molecule by displacing it
rameters can be manipulated, but usually, sol- with an inhibitor's functionality can greatly
uble inhibitors must be available before the increase inhibitor affinity, although this is nit
availability of structural information. Solubil- globally applicable. The entropic advantage of
ity matters during animal testing and later in releasing a bound water into bulk solvent does
eferences
not always exceed the enthalpic cost of the dis- variable, so multiple orthogonal methods
placement. In many situations, the preferred should be used to assess the effects of changes.
solution will be to retain a water molecule and It is also important during the rational design
use it to maximize inhibitor binding. For ex- process to include room for serendipity. Do not
ample, a water molecule that donates two hy- reject an idea for a new compound that seems
drogen bonds and accepts one cannot be isos- to make intuitive sense based on a single crys-
terically replaced. Electrostatic interactions tal structure or computational calculation.
that are more complex than hydrogen bonds
and simple ion pairs are very difficult to
model, anticipate, and exploit in inhibitor de- REFERENCES
sign. 1. D. J . Abraham, Intra-Sci. Chem. Rep., 8, 4
Retain Potency While Addressing Other Is- (1974).
sues. Structural information can be very use- 2. http://www.agouron.com/
ful in designing compounds that are not part 3. http://www.stromix.com/
of a competitor's intellectual property, or that 4. http://www.astex-technology.com
cannot be patented because of information in 5. http://www.accelrys.com/consortia/htc/
the public domain. Redesign of a compound 6. P. J. Goodford, J. Med. Chem., 27,557 (1984).
that is not itself proprietary, by use of struc- 7. C. R. Beddell, Ed., The Design ofDrugs to Mac-
tural information obtained with that com- romolecular Targets, John Wiley & Sons,
pound, can yield valuable new proprietary Chichester, UK, 1992.
molecules. Structural information can also 8. J. Greer, J. W. Erickson, J. J. Baldwin, and
guide the modification of physicochemical, M. D. Varney, J. Med. Chem., 37, 1035-1054
metabolic, or pharmacological properties or (1994).
target selectivity without compromising the 9. R. E. Babine and S. L. Bender, Chem. Rev., 97,
potency against the primary therapeutic tar- 1359 (1997).
get. 10. P. Veerapandian, Ed., Structure-Based Drug
All Models Are Wrong; Some Are Useful. At Design, Marcel Dekker, New York, 1997.
present, it is impossible to calculate an accu- 11. R. T. Borchardt, R. M. Freidinger, T. K. Saw-
rate value for a binding constant on an abso- yer, and P. L. Smith, Eds., Integration ofPhar-
lute scale. However, accurately estimating the maceutical Discovery and Development. Case .
relative binding of a series of closely related Histories (Pharmaceutical Biotechnology,
compounds is possible, and is much more Band l l ) , Plenum Press, New York, 1998.
likely to be successful if X-ray structures of 12. K. Gubernator and H.-J. Bohm, Eds., Struc-
target complexes with some of the compounds ture-Based Ligand Design, Wiley-VCH, New
are available. Thus, although there is much YorWWeinheim, 1998.
room for improvement, local computational 13. C. L. Nobbs, H. C. Watson, and J . C. Kendrew,
models can sometimes be quite useful. Even in Nature, 209,339 (1966).
the absence of an experimentally determined 14. M. F. Perutz, Nature, 228, 726 (1970).
X-ray structure of the target, a hypothetical 15. R. C. Ladner, E. J. Heidner, and M. F. Perutz,
model can be a powerful tool for the design of J. Mol. Biol., 114,385 (1977).
useful compounds (e.g., captopril and 16. G. Fermi, M. F. Perutz, B. Shaman, and R.
TAK-147). Fourme, J. Mol. Biol., 175, 159 (1984).
Iterative SBDD Cycles Are Optimal. Small 17. B. C. Wishner, K. B. Ward, E. E. Lattman, and
alterations in ligand structure often cause ma- W. E. Love, J. Mol. Biol., 98, 179 (1975).
jor changes in binding mode, protein confor- 18. D. J. Harrington, K. Adachi, and W. E. Royer
mation, or both. These changes can go unde- Jr., J. Mol. Biol., 272, 398 (1997).
tected if the structural effects are not analyzed 19. C. R. Beddell, P. J. Goodford, G. Kneen, R. D.
by X-ray analysis iteratively or too infre- White, S. Wilkinson, and R. Wootton, Br. J.
quently. This can yield confusing or mislead- Pharmacol., 82,397 (1984).
ing structure-activity relationships, leading to 20. M. Merrett, D. K. Stammers, R. D. White, R.
a waste of precious time. Moreover, changes in Wootton, and G. Kneen, Biochem. J.,239,387
compound structure seldom affect only one (1986).
21. F. C. Wireko and D. J. Abraham, Proc. Natl. 43. M. K. Safo, C. M. Moure, J. C. Burnett, G. S.
Acad. Sci. USA, 88,2209 (1991). Joshi, and D. J. Abraham, Protein Sci., 10,951
22. D. J. Abraham, A. S. Mehanna, F. C. Wireko, (2001).
E. P. Orringer, J. Whitney, and R. P. Thomas, 44. S. K. Burley and G. A. Petsko, FEBS Lett., 201,
Blood, 77, 1334 (1991). 751 (1986).
23. M. K. Safo, S. Nokuri, and D. J. Abraham, Un- 45. S. K. Burley and G. A. Petsko, Science, 229,23
published results. (1985).
24. P. E. Kennedy, F. L. Williams, and D. J. Abra- 46. M. Levitt and M. F. Perutz, J. Mol. Biol., 201,
ham, J. Med. Chem., 27, 103 (1984). 751 (1988).
25. D. J. Abraham, M. F. Perutz, and S. E. V. Phil- 47. D. J. Abraham, M. K. Safo, T. Boyiri, R. E.
lips, Proc. Natl. Acad. Sci. USA, 80,324 (1983). Danso-Danquah, J. Kister, and C. Poyart, Bio-
26. M. F. Perutz, G. Fermi, D. J. Abraham, C. Po- chemistry, 34,15006 (1995).
yart, and E. Bursaux, J. Am. Chem. Soc., 108, 48. M. P. Grella, R. Danso-Danquah, M. K. Safo,
1064 (1986). G. S. Joshi, J. Kister, S. J. Hoffman, M. Mar-
27. E. P. Orringer, D. S. Blythe, J. A. Whitney, S. den, and D. J. Abraham, J. Med. Chem., 25,
Brockenbrough, and D. J. Abraham, Am. J. 4726 (2001).
Hematol., 39, 39 (1992). 49. A.M. Youssef, M. K. Safo, R. Danso-Danquah,
28. D. J. Abraham, A. S. Mehanna, F. Williams, G. S. Joshi, J. Kister, M. Marden, and D. J.
E. J. Cragoe Jr., and 0. W. Woltersdorf Jr., Abraham, J. Med. Chem., 45,1184 (2002).
J. Med. Chem., 32,2460 (1989). 50. J. A. Walder, R. H. Zaugg, R. Y. Walder, J. M.
29. D. J. Abraham, P. E. Kennedy, A. S. Mehanna, Steele, and I. M. Klotz, Biochemistry, 18,4265
D. Patwa, and F. L. Williams, J. Med. Chem., (1979).
27,967 (1984). 51. R. Chatterjee, E. V. Welty, R. Y. Walder, S. L.
30. A. Arnone, Nature, 237, 146 (1972). Pruitt, P. H. Rogers, A. h o n e , and J. A.
31. V. Richard, G. G. Dodson, and Y. Mauguen, J. Walder, J. Biol. Chem., 261,9929 (1986).
Mol. Biol., 233,270 (1993). 52. S. R. Snyder, E. V. Welty, R. Y. Walder, L. A.
32. P. J. Goodford, J. St-Louis, and R. Wootton, Williams, and J. A. Walder, Proc. Natl. Acad.
Br. J. Pharmacol., 68, 741 (1980). Sci. USA, 84,7280 (1987).
33. C. R. Beddell, P. J. Goodford, F. E. Norrington, 53. N. Komiyama, J. Tame, and K. Nagai, Biol.
S. Wilkinson, and R. Wootton, Br. J. Pharma- Chem., 377,543 (1996).
col., 57,201 (1976). 54. T. Boyiri, M. K. Safo, R. E. Danso-Danquah,'J.
34. F. F. Brown and P. J. Goodford, Br. J. Phar- Kister, C. Poyart, and D. J. Abraham, Bio-
macol., 60,337 (1977). chemistry, 34,15021 (1995).
35. A. S. Mehanna and D. J. Abraham, Biochemis- 55. M. F. Perutz, Br. Med. Bull., 32, 195 (1976).
try, 29,3944 (1990). 56. J. Monod, J. Wyman, and J.-P. Changeux, J.
36. M. F. Perutz and C. Poyart, Lancet, 2, 881 Mol. Biol., 12,88 (1965).
(1983). 57. D. A. Matthews, R. A. Alden, J. T. Bolin, S. T.
37. I. Lalezari and P. Lalezari, J. Med. Chem., 32, Freer, R. Hamlin, N. Xuong, J . Kraut, M. Poe,
2352 (1989). M. Williams, and K. Hoogsteen, Science, 197,
38. I. Lalezari, P. Lalezari, C. Poyart, M. Marden, 452 (1977).
J. Kister, B. Bohn, G. Fermi, and M. F. Perutz, 58. L. F. Kuyper, B. Roth, D. P. Baccanari, R. Fer-
Biochemistry, 29, 1515 (1990). one, C. R. Beddell, J. N. Champness, D. K.
39. D. J. Abraham, R. S. Randad, M. A. Mahran, Stammers, J. G. Dann, F. E. Norrington, D. J.
and A. S. Mehanna, J. Med. Chem., 34, 752 Baker, and P. J. Goodford, J. Med. Chem., 25,
(1991). 1120 (1982).
40. D. J. Abraham, F. C. Wireko, R. S. Randad, C. 59. D. A. Matthews, J. T. Bolin, J. M. Burridge,
Poyart, J. Kister, B. Bohn, J. F. Leard, and D. J. Filman, K. W. Volz, and J. Kraut, J. Biol.
M. P. Kunert, Biochemistry, 31,9141 (1992). Chem., 260,392 (1985).
41. F. C. Wireko, G. E. Kellogg, and D. J . Abraham, 60. K. Appelt, R. J. Bacquet, C. A. Bartlett, C. L. J.
J. Med. Chem., 34,758 (1991). Booth, S. T. Freer, M. A. Fuhry, M. R. Gehring,
42. D. J. Abraham, J. Kister, G. S. Joshi, M. C. S. M. Herrmann, E. F. Howland, C. A. Janson,
Marden, and C. Poyart, J. Mol. Biol., 248,845 T. R. Jones, C. C. Kan, V. Kathardekar, K. K.
(1995). Lewis, G. P. Marzoni, D. A. Matthews, C.
Mohr, E. W. Moomaw, C. A. Morse, S. J. Oat- 77. R.J. Almassy, C. A. Janson, C. C. Kan,and Z.
ley, R. C. Ogden, M. R. Reddy, S. H. Reich, Hostomska, Proc. Natl. Acad. Sci. USA, 89,
W. S. Schoettlin, W. W. Smith, M. D. Varney, 6114-6118(1992).
J. E. Villafranca, R. W. Ward, S. Webber, S. E. 78. C. C. Kan, M. R. Gehring, B. R. Nodes, C. A.
Webber, K. M. Welsh, and J. White, J. Med. Janson, R. J. Almassy, and Z. Hostomska, J.
Chem., 34,1925(1991). Protein Chem., 11, 467-473(1992).
61. S. H. Reich and S. E. Webber, Perspect. Drug 79. M. D. Varney, C. L. Palmer, W. H. Romines
Discov. Des., 1,371-390(1993). 3rd, T. Boritzki, S. A. Margosiak, R. Almassy,
62. L. W. Hardy, J. S. Finer-Moore, W. R. Mont- C. A. Janson, C. Bartlett, E. J. Howland, and R.
fort, M. 0.Jones, D. V. Santi, and R. M. Stroud, Ferre, J. Med. Chem., 40,2502-2524(1997).
Science, 235,448-455(1987). 80. C. Shih, L. S. Gossett, J. F. Worzalla, S. M.
63. D.A. Matthews, K. Appelt, S. J. Oakley, and Rinzel, G. B. Grindey, P. M. Harrington, and
N. H. Xuong, J. Mol. Biol., 214, 923-936 E. C. Taylor, J. Med. Chem., 35, 1109-1116
(1990). (1992).
64. W. R. Montfort, K. M. Perry, E. B. Fauman, 81. D.W. Cushman, H. S. Cheung, E. F. Sabo, and
J. S. Finer-Moore, G. F. Maley, L. Hardy, F. M. A. Ondetti, Biochemistry, 16,5484 (1977).
Maley, and R. M. Stroud, Biochemistry, 29, 82. M. A. Ondetti, B. Rubin, and D. W. Cushman,
6964-6977(1990). Science, 196,441 (1977).
65. Y.Takemura and A. L. Jackman, Anticancer 83. M. J. Wyvratt and A. A. Patchett, Med. Res.
Drugs, 8,3-16(1997). Rev., 5,483-531(1985).
66. S. E.Webber, T. M. Bleckrnan, J. Attard, J. G. 84. D. W. Cushman and M. A. Ondetti, Hyperten-
Deal, V. Kathardekar, K. M. Welsh, S. Webber, sion, 17,589(1991).
C. A. Janson, D. A. Matthews, W. W. Smith, 85. D. W. Cushman and M. A. Ondetti, Nat. Med.,
S. T. Freer, S. R. Jordan, R. J. Bacquet, E. F. 5,1110(1999).
Howland, C. L. J. Booth, R. W. Ward, S. M. 86. J. Rahuel, V. Rasetti, J. Maibaum, H. Rueger,
Herrmann, J. White, C. A. Morse, J. A. R. Goschke, N. C. Cohen, S. Stutz, F. Cumin,
Hilliard, and C. A. Bartlett, J. Med. Chem., 36, W. Fuhrer, J. M. Wood, and M. G. Grutter,
733-746(1993). Chern. Biol., 7,493-504(2000).
67. I. Niculescu-Duvaz, Curr. Opin. Invest. Drugs, 87. L. D. Byers and R. Wolfenden, Biochemistry,
2,693-705(2001). 12,2070-2078(1973).
68. P.J. Goodford, J. Med. Chem., 28,849(1985). 88. J. R. Huff and J. Kahn, Adv. Protein Chem., 56,
69. P.Goodford, J.Chemom., 10,107(1996). 213-251(2001).
89. A. Wlodawer and J. Vondrasek, Annu. Rev.
70. M.D. Varney, G. P. Marzoni, C. L. Palmer,
Biophys. Biomol. Struct., 27,249 (1998).
J. G. Deal, S Webber, K. M. Welsh, R. J. Bac-
quet, C. A. Bartlett, C. A. Morse, C. L. Booth, 90. T. D. Meek, J.Enzyme Inhib., 6,65(1992).
S. M. Herrmann, E. F. Howland, R. W. Ward, 91. A. Wlodawer and J. W. Erickson, Annu. Rev.
and J. White, J. Med. Chem., 35, 663-676 Biochem., 62,543(1993).
(1992). 92. R. Lapatto, T. Blundell, A. Hemmings, J. Over-
71. D. R. Newell, Semin. Oncol., 26 (Suppl. 61, ington, A. Wilderspin, S. Wood, J. R. Merson,
74-81(1999). P. J. Whittle, D. E. Danley, K. F. Geoghegan, et
al., Nature, 342,299302(1989).
72. P. Norman, Curr. Opin. Invest. Drugs, 2,
93. M. A.Navia, P. M. Fitzgerald, B. M. McKeever,
1611-1622(2001).
C. T. Leu, J. C. Heimbach, W. K. Herber, I. S.
73. E. C. Taylor, Adv. Exp. Med. Biol., 338, 387- Sigal, P. L. Darke, and J. P. Springer, Nature,
408(1993). 337,615-620(1989).
74. G. P. Beardsley, B. A. Moroson, E. C. Taylor, 94. A. Wlodawer, M. Miller, M. Jaskolski, B. K.
and R. G. Moran, J. Biol. Chem., 264,328333 Sathyanarayana, E. Baldwin, I. T. Weber,
(1989). L. M. Selk, L. Clawson, J. Schneider, and S. B.
75. J. R. Piper, G. S. McCaleb, J. A. Montgomery, Kent, Science, 245,616-621(1989).
R. L. Kisliuk, Y. Gaumont, J. Thorndike, and 95. I. B. Duncan and S. Redshaw, Infect. Dis.
F. M. Sirotnak, J. Med. Chem., 31,2164-2169 Ther., 25,27-47(2002).
(1988). 96. A. G. Tomasselli, M. K. Olsen, J. 0. Hui, D. J.
76. S. E. Greasley, T. H. Marsilje, H. Cai, S. Baker, Staples, T. K. Sawyer, R. L. Heinrikson, and
S. J. Benkovic, D. L. Boger, and I. A. Wilson, C. S. Tomich, Biochemistry, 29, 264-269
Biochemistry, 40,13538-13547(2001). (1990).
97. M. Jaskolski, A. G. Tomasselli, T. K. Sawyer, M. Wiggins, C. M. Wiscount, 0. W. Wolters-

. Staples, R. L. Heinrikson, J. Schneider, dorf, S. D. Young, P. L. Darke, and J. A. Zu-
" . 9. Kent, and A. Wlodawer, Biochemistry, 30,
S. guay, J. Med. Chem., 38,305-317 (1995).
1600-1609 (1991). 108. E. E. Kim, C. T. Baker, M. D. Dwyer, M. A.
98. M. W. Holladay, F. G. Salituro, and D. H. Rich, Murcko, B. G. Rao, R. D. Tung, and M. A. Na-
J. Med. Chem., 30,374-383 (1987). via, J. Am. Chem. Soc., 117,1181-1182 (1995).
99. F. G. Salituro, N. Agarwal, T. Hofmann, and 109. S. W. Kaldor, V. J. Kalish, J. F. Davies 2nd,
D. H. Rich, J. Med. Chem., 30,286-295 (1987). B. V. Shetty, J. E. Fritz, K. Appelt, J. A. Bur-
100. D. J. Kempf, K. C. Marsh, D. A. Paul, M. F. gess, K. M. Campanale, N. Y. Chirgadze, D. K.
Knigge, D. W. Norbeck, W. E. Kohlbrenner, L. Clawson, B. A. Dressman, S. D. Hatch, D. A.
Codacovi, S. Vasavanonda, P. Bryant, X. C. Khalil, M. B. Kosa, P. P. Lubbehusen, M. A.
Wang, N. E. Wideburg, J.J. Clement, J.J. Platt- Muesing, A. K. Patick, S. H. Reich, K. S. Su,
ner, and J. Erickson, Antimicrob. Agents Che- and J. H. Tatlock, J. Med. Chem., 40, 3979-
mother., 35,2209-2214 (1991). 3985 (1997).
101. M. V. Hosur, N. T. Bhat, D. J. Kempf, E. T. 110. M. Moledina, M. Chakir, and P. J. Gandhi, J.
Baldwin, B. Liu, S. Gulnik, N. E. Wideburg, Thromb. Thrombolysis, 12, 141-149 (2001).
D. W. Norbeck, K. Appelt, and J. W. Erickson, 111. J. Hauptmann, Eur. J. Clin. Pharmacol., 57,
J. Am. Chem. Soc., 116,847-855 (1994). 751-758 (2002).
102. D. J. Kempf, H. L. Sham, K. C. Marsh, C. A. 112. D. W. Banner and P. Hadvary, J. Biol. Chem.,
Flentge, D. Betebenner, B. E. Green, E. Mc- 266,20085-20093 (1991).
Donald, S. Vasavanonda, A. Saldivar, N. E. 113. P. E. Sanderson and A. M. Naylor-Olsen, Curr.
Wideburg, W. M. Kati, L. Ruiz, C. Zhao, L. Med. Chem., 5,289 (1998).
Fino, J. Patterson, A. Molla, J. J. Plattner, and 114. J. P. Vacca, Curr. Opin. Chem. Biol., 4, 394
D. W. Norbeck, J. Med. Chem., 41, 602-617 (2000).
(1998). 115. J. Hauptmann, B. Kaiser, M. Paintz, and F.
103. C. N. Hodge, P. E. Aldrich, L. T. Bacheler, Markwardt, Biomed. Biochim. Acta, 46, 445-
C. H. Chang, C. J. Eyermann, S. Garber, M. 453 (1987).
Grubb, D. A. Jackson, P. K. Jadhav, B. Korant, 116. H. Brandstetter, D. Turk, H. W. HoeMren, D.
P. Y. Lam, M. B. Maurin, J. L. Meek, M. J. Grosse, J. Sturzebecher, P. D. Martin, B. F.
Otto, M. M. Rayner, C. Reid, T. R. Sharpe, L. Edwards, and W. Bode, J. Mol. Biol., 226,
Shum, D. L. Winslow, and S. Erickson- 1085-1099 (1992).
Viitanen, Chem. Biol., 3,301-314 (1996). 117. N. H. Hauel, H. Nar, H. Priepke, U. Reis, J. M.
104. B. D. Dorsey, R. B. Levin, S. L. McDaniel, J. P. Stassen, and W. Wienen, J. Med. Chem., 45,
Vacca, J. P. Guare, P. L. Darke, J. A. Zugay, 1757-1766 (2002).
E. A. Emini, W. A. Schleif, J. C. Quintero, J. H. 118. R. M. Friedlander, V. Gagliardini, H. Hara,
Lin, I. W. Chen, M. K. Holloway, P. M. D. K. B. Fink, W. Li, G. MacDonald, M. C. Fish-
Fitzgerald, M. G. Axel, D. Ostovic, P. S. Ander- man, A. H. Greenberg, M. A. Moskowitz, and J.
son, and J. R. Huff, J. Med. Chem., 37, 3443- Yuan, J. Exp. Med., 185,933-940 (1997).
3451 (1994). 119. B. Siegmund, H. A. Lehr, G. Fantuzzi, and
105. J. P. Vacca, J. P. Guare, S. J. DeSolms, W. M. C. A. Dinarello, Proc. Natl. Acad. Sci. USA, 98,
Sanders, E. A. Giuliani, S. D. Young, P. L. 13249-13254 (2001).
Darke, I. S. Sigal, W. A. Schleif, J. C. Quintero, 120. N. P. Walker, R. V. Talanian, K. D. Brady, L. C.
E. A. Emini, P. S. Anderson, and J. R. Huff, Dang, N. J. Bump, C. R. Ferenz, S. Franklin, T.
J. Med. Chem., 34,1228-1230 (1991). Ghayur, M. C. Hackett, L. D. Hammill, L. Her-
106. T. A. Lyle, C. M. Wiscount, J. P. Guare, W. J. zog, M. Hugunin, W. Houy, J. A. Mankovich, L.
Thompson, P. S. Anderson, P. L. Darke, J. A. McGuiness, E. Orlewicz, M. Paskind, C. A.
Zugay, E. A. Emini, W. A. Schleif, J. C. Quin- Pratt, P. Reis, A. Summani, M. Terranova,
tero, R. A. F. Dixon, I. S. Sigal, and J. R. Huff, J. P. Welch, L. Xiong, A. Moller, D. E. Tracey,
J. Med. Chem., 34,1230-1233 (1991). R. Kamen, and W. W. Wong, Cell, 78,343-352
107. M. K. Holloway, J. M. Wai, T. A. Halgren, P. M. (1994).
Fitzgerald, J. P. Vacca, B. D. Dorsey, R. B. 121. K. P. Wilson, J. A. Black, J. A. Thomson, E. E.
Levin, W. J. Thompson, L. J. Chen, S. J. Kim, J. P. Griffith, M. A. Navia, M. A. Murcko,
deSolms, N. Gaffm, A. K. Ghosh, E. A. Giu- S. P. Chambers, R. A. Aldape, S. A. Raybuck,
liani, S. L. Graham, J. P. Guare, R. W. Hun- and D. Livingstone, Nature, 370, 270-275
gate, T. A. Lyle, W. M. Sanders, T. J. Tucker, (1994).
122. R. Leung-Toung, W. Li, T. F. Tam, and K. Ka- 139. E. K. Perry, B. E. Tomlinson, G. Blessed, K.
rimian, Curr. Med. Chem.,9,979-1002 (2002). Bergrnann, P. H. Gibson, and R. H. Perry, Br.
123. M. R. Michaelides and M. L. Curtin, Curr. Med. J.,2,1457-1459 (1978).
Pharm. Des., 5, 787-819 (1999). 140. B. P. Imbimbo, CNS Drugs, 15, 375-390
124. P. D. Brown, Expert Opin. Invest. Drugs, 9, (2001).
2167-2177 (2000). 141. Y. Ishihara, K. Kato, and G. Goto, Chem.
125. 0.Santos, C. D. McDermott, R. G. Daniels, and Pharm. Bull. (Tokyo), 39,3225-3235 (1991).
K. Appelt, Clin. Exp. Metastasis, 15, 499-508 142. Y. Ishihara, G. Goto, and M. Miyamoto, Curr.
(1997). Med. Chem., 7,341-354 (2000).
126. L. J. MacPherson, E. K. Bayburt, M. P. Cap- 143. J. L. Sussman, M. Harel, F. Frolow, C. Oefner,
parelli, B. J. Carroll, R. Goldstein, M. R. Jus- A. Goldman, L. Toker, and I. Silman, Science,
tice, L. Zhu, S. Hu, R. A. Melton, L. Fryer, R. L. 253,872-879 (1991).
Goldberg, J . R. Doughty, S. Spirito, V. Blan- 144. Y. Yamamoto, Y. Ishihara, and I. D. Kuntz,
cuzzi, D. Wilson, E. M. O'Byrne, V. Ganu, and J. Med. Chem., 37,314143153 (1994).
D. T. Parker, J. Med. Chem., 40, 2525-2532 145. A. H. Reid, J. K. Taubenberger, and T. G. Fan-
(1997). ning, Microbes Infect., 3,81-87 (2001).
127. G. Clemens, B. Hibner, R. Humphrey, H. 146. J. N. Varghese, W. G. Laver, and P. M. Colman,
Kluender, and S. Wilhelm in N. J. Clendeninn Nature, 303,35-40 (1983).
and K. Appelt, Eds., Matrix Metalloproteinase
147. M. von Itzstein, W.-Y. Wu, G. B. Kok, M. S.
Inhibitors in Cancer Therapy, Humana Press,
Pegg, J. C. Dyason, B. Jin, T. Van Phan, M. L.
Totowa, NJ, 2001, pp. 175-192.
Smythe, H. F. White, S. W. Oliver, P. M. Col-
128. T. Kusumi, M. Tsuda, T. Katsunuma, and M. man, J. N. Varghese, D. M. Ryan, J. M. Woods,
Yamamura, Cell Biochem. Funct., 7, 201-204 R. C. Bethell, V. J. Hotham, J. M. Cameron,
(1989). and C. R. Penn, Nature, 363,418-423 (1993).
129. M. D. Sintchak, M. A. Fleming, 0. Futer, S. A. 148. P. Bossart-Whitaker, M. Carson, Y. S. Babu,
Raybuck, S. P. Chambers, P. R. Caron, M. A. C. D. Smith, W. G. Laver, and G. M. Air, J. Mol.
Murcko, and K. P. Wilson, Cell, 85, 921-930 Biol., 232,1069-1083 (1993).
(1996).
149. J. N. Varghese, V. C. Epa, and P. M. Colman,
130. L. Hedstrom, Curr. Med. Chem., 6, 545-560 Protein Sci., 4, 1081-1087 (1995).
(1999).
150. C. U. Kim, W. Lew, M. Williams, H. Liu, L.
131. M. D. Sintchak and E. Nimmesgern, Immuno- Zhang, S. Swaminathan, N. Bischofberger,
pharmacology, 47, 163-184 (2000). M. S. Chen, D. Mendel, W. G. Laver, and R. C.
132. D. A. Gschwend, A. C. Good, and I. D. Kuntz, J. Stevens, J. Am. Chem. Soc., 119,681 (1997).
Mol. Recognit., 9, 175-186 (1996). 151. Y. S. Babu, P. Chand, S. Bantia, P. Kotian, A.
133. D. Dvornik, J. Diabetes Complications, 6, Dehghani, Y. El-Kattan, T. H. Lin, T. L.
25-34 (1992). Hutchison, A. J. Elliott, C. D. Parker, S. L.
134. D. R. Tomlinson, E. J. Stevens, and L. T. Die- Ananth, L. L. Horn, G. W. Laver, and J. A.
mel, Trends Pharmacol. Sci., 15, 293-297 Montgomery, J. Med. Chem., 43, 3482-3486
(1994). (2000).
135. C. L. Kaul and P. Ramarao, Methods Find. 152. J. A. Green, G. M. Smith, R. Buchta, R. Lee,
Exp. Clin. Pharmacol., 23,465-475 (2001). K. Y. Ho, I. A. Rajkovic, and K. F. Scott, In-
136. A. Urzhumtsev, F. Tete-Favier, A. Mitschler, flammation, 15,355-367 (1991).
J. Barbanton, P. Barth, L. Urzhumtseva, J. F. 153. P. Vadas, J. Browning, J. Edelson, and W.
Biellmann, A. Podjarny, and D. Moras, Struc- Pruzanski, J. Lipid Mediat., 8,l-30 (1993).
ture, 5,601-612 (1997). 154. C. Bennion, S. Connolly, N. P. Gensmantel, C.
137. M. C. Van Zandt, E. 0. Sibley, K. J. Combs, Hallam, C. G. Jackson, W. U. Primrose, G. C.
E. E. McCann, B. Flam, D. J. Lavoie, D. Roberts, D. H. Robinson, and P. K. Slaich,
Sawicki, A. Sabetta, A. Carrington, J. Sredy,V. J. Med. Chem., 35,2939-2951 (1992).
Calderone, B. Cuevrier, and A. Podjarny, 155. S. Connolly, C. Bennion, S. Botterell, P. J. Cro-
Posterpresented at the 218th National Meeting shaw, C. Hallam, K. Hardy, P. Hartopp, C. G.
of the American Chemical Society, New Or- Jackson, S. J. King, L. Lawrence, A. Mete, D.
leans, LA, August 22-26,1999. Murray, D. H. Robinson, G. M. Smith, L. Stein,
138. S. Borman, Chem. Eng. News, 80, 35-39 I. Walters, E. Wells, and W. J. Withnall,
(2002). J. Med. Chem., 45,1348-1362 (2002).
156. H. G. Beaton, C. Bennion, S. Connolly, A. R. 170. M. P. Fox, M. J. Otto, and M. A. McKinlay,

Cook, N. P. Gensmantel, C. Hallam, K. Hardy, Antimicrob. Agents Chemother., 30, 110-116
B. Hitchin, C. G. Jackson, and D. H. Robinson, (1986).
J. Med. Chem., 37,557-559 (1994). 171. B. Jubelt, A. K. Wilson, S. L. Ropka, P. L. Guid-
157. J. P. Wery, R. W. Schevitz, D. K. Clawson, J. L. inger, and M. A. McKinlay, J. Infect. Dis., 159,
Bobbitt, E. R. Dow, G. Gamboa, T. Goodson 866-871 (1989).
Jr., R. B. Hermann, R. M. Kramer, D. B. Mc- 172. T. J. Smith, M. J. Kremer, M. Luo, G. Vriend,
Clure, et al., Nature, 352, 79-82 (1991). E. Arnold, G. Kamer, M. G. Rossmann, M. A.
158. R. W. Schevitz, N. J. Bach, D. G. Carlson, N. Y. McKinlay, G. D. Diana, and M. J. Otto, Sci-
Chirgadze, D. K. Clawson, R. D. Dillard, S. E. ence, 233,1286-1293 (1986).
Draheim, L. W. Hartley, N. D. Jones, Mihelich, 173. D. C. Pevear, M. J. Fancher, P. J. Felock, M. G.
et al., Nut. Struct. Biol., 2, 458-465 (1995). Rossmann, M. S. Miller, G. D. Diana, A. M.
159. D. L. Scott, S. P. White, J. L. Browning, J. J. Treasurywala, M. A. McKinlay, and F. J.
Rosa, M. H. Gelb, and P. B. Sigler, Science, Dutko, J. Virol., 63,2002-2007 (1989).
254,1007-1010 (1991). 174. K. H. Kim, P. Willingmann, Z. X. Gong, M. J.
160. M. M. Thunnissen, E. Ab, K. H. Kalk, J. Kremer, M. S. Chapman, I. Minor, M. A. 01-
Drenth, B. W. Dijkstra, 0.P. Kuipers, R. Dijk- iveira, M. G. Rossmann, K. Andries, G. D. Di-
man, G. H. de Haas, and H. M. Verheij, Nature, ana, F. J. Dutko, M. A. McKinlay, and D. C.
347,689-691 (1990). Pevear, J. Mol. Biol., 230, 206-227 (1993).
161. S. E. Draheim, N. J. Bach, R. D. Dillard, D. R. 175. G. D. Diana, D. Cutcliffe, R. C. Oglesby, M. J.
Berry, D. G. Carlson, N. Y. Chirgadze, D. K. Otto, J. P. Mallamo, V. Akullian, and M. A.
Clawson, L. W. Hartley, L. M. Johnson, N. D. McKinlay, J. Med. Chem., 32,450-455 (1989).
Jones, E. R. McKinney, E. D. Mihelich, J . L. 176. G. D. Diana and D. C. Pevear, Antiviral Chem.
Olkowski, R. W. Schevitz, A. C. Smith, D. W. Chemother., 8,401 (2002).
Snyder, C. D. Sommers, and J. P. Wery, 177. G. D. Diana, P. Rudewicz, D. C. Pevear, T. J.
J. Med. Chem., 39,5159-5175 (1996). Nitz, S. C. Aldous, D. J. Aldous, D. T. Robin-
162. D. W. Snyder, N. J. Bach, R. D. Dillard, S. E. son, T. Draper, F. J. Dutko, C. Aldi, et al.,
Draheim, D. G. Carlson, N. Fox, N. W. Roehm, J. Med. Chem., 38,1355-1371 (1995).
C. T. Armstrong, C. H. Chang, L. W. Hartley, 178. J . M. Rogers, G. D. Diana, and M. A. McKinlay,
L. M. Johnson, C. R. Roman, A. C. Smith, M. Adv. Exp. Med. Biol., 458,69-76 (1999).
Song, and J . H. Fleisch, J. Pharmacol. Exp.
179. F. G. Hayden, T. Coats, K. Kim, H. A. H q s -
Ther., 288, 1117-1124 (1999).
man, M. M. Blatter, B. Zhang, and S. Liu, An-
163. D. M. Springer, Curr. Pharm. Des., 7,181-198 tiviral Ther., 7, 53-65 (2002).
(2001).
180. B. DBrijard, J. Raingeaud, T. Barrett, I.-H. Wu,
164. C. Savolainen, S. Blomqvist, M. N. Mulders, J . Han, R. J. Ulevitch, and R. J. Davis, Science,
and T. Hovi, J. Gen. Virol., 83 (Pt 2), 333-340 267,682-685 (1995).
(2002).
181. K. P. Wilson, P. G. McCaffrey, K. Hsiao, S.
165. M. G. Rossmann, Viral Immunol., 2, 143-161 Pazhanisamy, V. Galullo, G. W. Bemis, M. J.
(1989). Fitzgibbon, P. R. Caron, M. A. Murcko, and
166. M. G. Rossmann, E. Arnold, J. W. Erickson, M. S. Su, Chem. Biol., 4,423-431 (1997).
E. A. Frankenberger, J. P. Griffith, H . J . 182. B. Frantz, T. Klatt, M. Pang, J. Parsons, A.
Hecht, J. E. Johnson, G. Kamer, M. Luo, A. G. Rolando, H. Williams, M. J. Tocci, S. J.
Mosser, R. R. Rueckert, B. Sherry, and G. O'Keefe, and E. A. O'Neill, Biochemistry, 37,
Vriend, Nature, 317, 145-153 (1985). 13846-13853 (1998).
167. G. D. Diana, M. A. McKinlay, M. J. Otto, V. 183. J . C. Lee, J. T. Laydon, P. C. McDonnell, T. F.
Akullian, and C. Oglesby, J. Med. Chem., 28, Gallagher, S. Kumar, D. Green, D. McNulty,
1906-1910 (1985). M. J. Blumenthal, J. R. Heys, S. W. Landvat-
168. G. D. Diana, M. A. McKinlay, C. J. Brisson, ter, J. E. Strickler, M. M. McLaughlin, I. R.
E. S. Zalay, J. V. Miralles, and U. J. Salvador, Siemens, S. M. Fisher, G. P. Livi, J. R. White,
J. Med. Chem., 28, 748-752 (1985). J. L. Adams, and P. R. Young, Nature, 372,
169. M. J. Otto, M. P. Fox, M. J. Fancher, M. F. 739-746 (1994).
Kuhrt, G. D. Diana, and M. A. McKinlay, An- 184. J. C. Lee, S. Kumar, D. E. Griswold, D. C. Un-
timicrob. Agents Chemother., 27, 883-886 derwood, B. J. Votta, and J. L. Adams, Immu-
(1985). nopharmacology, 47,185-201 (2000).
References
185. A. Cuenda, J. Rouse, Y. N. Doza, R. Meier, P. 194. J. J. Haddad, Curr. Opin. Invest. Drugs, 2,
Cohen, T. F. Gallagher, P. R. Young, and J. C. 1070 (2001).
Lee, FEBS Lett., 364,229-233(1995). 195. C. Pargellis, L.Tong, L. Churchill, P. F. Cirillo,
186. A. M. Badger, J. N. Bradbeer, B. Votta, J. C. T. Gilmore, A. G. Graham, P. M. Grob, E. R.
Lee, J. L. Adams, and D. E. Griswold, J. Phar- Hickey, N. Moss, S. Pav, and J. Regan, Nut.
mmol. Exp. Ther., 279,1453-1461 (1996). Struct. Biol., 9,268-272(2002).
187. Z. Wang, B. J. Canagarajah, J. C. Boehm, S. 196. J. Regan, S. Breitfelder, P. Cirillo, T. Gilmore,
Kassisa, M. H. Cobb, P. R. Young, S. Abdel- A. G. Graham, E. Hickey, B. Klaus, J. Madwed,
Meguid, J . L. Adams, and E. J. Goldsmith, M. Moriak, N. Moss, C. Pargellis, S. Pav, A.
Structure, 6,1117-1128(1998). Proto, A. Swinamer, L. Tong, and C. Torcel-
188. K. P. Wilson, M. J. Fitzgibbon, P. R. Caron, lini, J. Med. Chem., 45,2994(2002).
J. P. Griffith, W. Chen, P. G. McCaffrey, S. P. 197. G. R. Boss and J . E. Seegmiller, Annu. Rev.
Chambers, and M. S. Su, J. Biol. Chem., 271, Genet., 16,297-328(1982).
27696-27700(1996). 198. S. E. Ealick, S. A. Rule, D. C. Carter, T. J.
189. T. Fox, J. T. Coll, X. Xie, P. J. Ford, U. A. Greenhough, Y. S. Babu, W. J. Cook, J. Ha-
Germann, M. D. Porter, S. Pazhanisamy, M. A. bash, J. R. Helliwell, J. D. Stoeckler, R. E.
Fleming, V. Galullo, M. S. Su, and K. P. Wil- Parks Jr., S. Chen, and C. E. Bugg, J. Biol.
son, Protein Sci., 7,2249(1998). Chem., 265,1812(1990).
190. R. J. Gum, M. M. McLaughlin, S. Kumar, Z. 199. S. E. Ealick, Y. S. Babu, C. E. Bugg, M. D.
Wang, M. J. Bower, J. C. Lee, J. L. Adams, G. P. Erion, W. C. Guida, J. A. Montgomery, and
Livi, E. J. Goldsmith, and P. R. Young, J. Biol. J. A. Secrist 3rd, Proc. Natl. Acad. Sci. USA,
Chem., 273,15605-15610(1998). 88,11540-11544(1991).
191. J. L. Adams, J. C. Boehm, T. F. Gallagher, S. 200. J. A. Montgomery, S. Niwas, J. D. Rose, J. A.
Kassis, E. F. Webb, R. Hall, M. Sorenson, R. Secrist 3rd, Y. S. Babu, C. E. Bugg, M. D.
Garigipati, D. E. Griswold, and J. C. Lee, Erion, W. C. Guida, and S. E. Ealick, J. Med.
Bioorg. Med. Chem. Lett., 11, 2867-2870 Chem., 36,55-69(1993).
(2001). 201. M. Duvic, E. A. Olsen, G. A. Omura, J. C.
192. T. Fullerton, A. Sharma, U. Prabhakar, M. Maize, E. C. Vonderheid, C. A. Elmets, J. L.
Tucci, S. Boike, H. Davis, D. Jorkasky, and W. Shupack, M. F. Demierre, T. M. Kuzel, and
Williams, Clin. Pharmacol. Ther., 67, 114 D. Y. Sanders, J. Am. Acad. Dermatol., 44,
(2000). 940-947 (2001).
193. Pat. Appl. Vertex Pharmaceuticals, Inc., as- 202. P. E. Morris Jr. and G. A. Omura, Curr. *
signee, PCT WO 00/36096(2000). Pharm. Des., 6,943-959(2000).
CHAPTER ELEVEN
X-Ray Crystallography
-
in rug Discovery
DOUGLAS A. LMNGSTON
SEANG. BUCHANAN
KEVINL. D'AMICO
" MICHAEL V. MILBURN
THOMAS S. PEAT
J. MICHAEL SAUDER
Structural GenomiX
San Diego, California
Contents
1 Introduction, 472
2 Methodology, 472
2.1 Theory, 472
2.2 Crystallization, 473
2.3 Data Collection, 474
2.4 Phase Problem, 476
2.5 Computing and Refinement, 478
2.6 Databases, 478
3 Applications of the Use of Crystallographic
Studies in Drug Discovery and Development, 479
4 Structural Genomics, 481
4.1 Introduction to Structural Genomics, 481
4.2 Genome Annotation, 481
4.3 Pathways, 495
4.4 Protein Structure Modeling, 495
5 Conclusion, 496

ISBN 0-471-27090-3 0 2003 John Wiley &Sons, Inc.
471
X-Ray Crystallography in Drug Discovery
1 INTRODUCTION ful before the chemist has embarked on the

synthesis of the next series, rather than after.
The practice of crystallography is undergoing Another important development toward
dramatic change because of the advent of new new target identification is the effort in large-
robotics technologies, orders-of-magnitude scale structural annotation of various ge-
improvement in X-ray sources and computa- nomes, the field of structural genomics. In
tional power, and the advances in protein pro- classifying proteins by function as a step to-
duction stemming from the recent revolution ward validating them as therapeutic targets,
in molecular biology. This chapter covers structural homology is perhaps the most im-
these changes in the context of an overview of portant tool available. These efforts have been
the techniques of modern crystallography, taken up by a number of publicly-funded con-
their application in the identification and sortia (2), because the commercial value of
characterization of targets and mechanisms genomic databases in general has not been
for therapeutic intervention, and the nascent high enough to justify their cost in the private
field of structural genomics. Structure-based sector. Given that medicinal chemists think
drug design applications are covered else- and communicate largely in structural terms,
where. this recent growth in the influence of struc-
The exponential growth in the rate of de- tural biology is very important. It forms the
termination of new protein structures contin- basis of a powerful link between chemistry
ues unabated. Technologies developed in the and biology, and we have only begun to realize
late 1980s (1)have now evolved to the point its potential.
that they have been implemented in high-
throughput (HTS) format, driving the rate
2 METHODOLOGY
even higher. Super-intense, precise, tunable
X-rays are now available from undulator
2.1 Theory
beamlines. Three "third-generation" synchro-
trons, designed and built for this purpose, are X-ray crystallography provides atomic or near
now on line-ESRF in Grenoble, France; atomic resolution of matter. The periodicity of
Spring-8 in Japan; APS at Argonne National crystals, reflecting the repeating units of mo-
Laboratory in the United States-and others lecular structure, diffracts X-rays according to
are under construction. In a relative sense, Bragg's law: nh = 2dsin0, where n is the order
this capability has had minimal impact on me- of diffraction, h the wavelength of the radia-
dicinal chemistry to date, but that will cer- tion, d the spacing or distance between a fam-
tainly change. The companies that have suc- ily of lattice planes in the crystal, and e the
cessfully built high-throughput protein angle of the diffraction. X-radiation is ideal to
crystallography systems (SGX and Syrrx in analyze atomic structure, because the wave-
the United States and Astex in the U.K. lengths used are in the order of 0.1-2.0 A with
among others) have all now turned their pro- 0.75 A being about one-half the distance of an
digious capacity to the co-crystallization of aliphatic carbon-carbon bond.
small molecules with target proteins for the The images of diffracted crystal lattices can
purpose of drug discovery. The capacity to be observed with specialized precession photo-
compare, in parallel, the binding modes of a graphic equipment, although the modern day
set of hits from HTS, or a given lead series, will image plate detectors used in most laborato-
be valuable, but an even greater impact will ries produce a diffraction image that can be
result from the decrease in turnaround time analyzed by computer to provide the indices of
required to generate co-crystal structures. the lattice diffraction spots (Fig. 11.1, a-c).
This has been the most significant hindrance The X-ray diffraction from the electron
to realizing the full potential of structure- clouds surrounding each nucleus is either re-
based drug design. A structure is far more use- inforced or impeded and gives rise to the dif-
ethodology 473
Figure 11.1. (a) A look at a two-dimensional crystal lattice diffraction pattern for a small molecule
natural product, MW 222. Each diMaction intensity in the lattice is numbered to give a unique three
hensional address (identification)for that measurement. These numerical addresses are referred to as
Uiller indices or hkl values. (b) A diffraction pattern from a precession photograph for hemoglobin, MW
55,000. Note the the diffraction lattice spacings are much smaller for the large molecule and reflects the
nature of Bragg's law, where the lattice is observed in reciprocal space (lld = 2sinBlnA). (c)An image plate
Wfraction pattern for a protein. [Adapted with permission from D. J. Abraham, Computer-Aided Drug
Design, Methods, and Applications, Marcel Dekker, Inc., New York, 1989.1
fere:nce in intensities observed in Fig. 11.1. tunately, crystallization is still more empirical
The steps that one goes through to solve a than scientific. It requires closely monitored
crystal structure follow, with the intent of pro- matrix changes in growing conditions, i.e., pH,
vidiiig the non-crystallographer with a simpli- salt concentration, temperature, solvents, and
fied and pictorial view of the process. crystallization setups. Most laboratories now
use well-known sparse matrix screens pio-
Crystallization neered by Jancarik and Kim (4) and further
Cry$ltallization is the critical first and most refined and commercially distributed by
imp(wtant step, because good single crystals Hampton Research (5, 6). Screens will typi-
USUiilly provide quality diffraction. Linus cally employ vapor diffusion experiments
Paulling once entitled one of his lectures "The (hangingdrops or sitting drops), and occasion-
Imp1ortance of Being Crystalline" (3). Unfor- ally batch and liquid-liquid diffusion methods.
More recently, batch crystallizations have icantly lower. The typical exposure time for
been rejuvenated by the development of mi- home laboratory CuKa sources ranges from 5
crobatch robots and by the groups of Chayen to 60 min for a range of data, whereas the equiv-
(7), DeTitta (8), and D'Arcy (9). alent set of data at an undulator beamline, i.e.,
Although discovering the crystallization the advanced photon source (APS), requires
conditions for a new protein or nucleic acid only about 1 s of exposure time. Synchrotron
can be tedious, relatively inexperienced indi- radiation has also allowed the use of MAD, en-
viduals can usually succeed at growing crys- abling phasing (imaging) of the protein using a
tals once the initial conditions are established. derivative with only one heavy element.
Some of the most successful crystallization A variety of detectors are in common use to
methodologies are based on vapor diffusion record X-ray data and have the advantage of
methods (Fig. 11.2). The general idea behind measuring the intensities of large numbers of
vapor diffusion crystallization is to dissolve diffraction spots simultaneously. The most
the protein in a buffer, with a non-precipitat- popular detectors are image plates and charge-
ing amount of the miscible vapor solvent, in a coupled device (CCD) cameras. Image plates
reservoir that is in equilibrium with higher are typically the choice for laboratory rotating
concentration of the vaporizing solvent anode sources and lower flux synchrotron
nearby. Another variation is to set up the crys- sources (Fig. 11.3). CCDs have the distinct ad-
tallization cocktail containing salts, buffers, P vantage of speed at the higher flux synchro-
(poly) ethylene glycols (PEGS), small mole- tron sources, because they simultaneously
cule solvents, etc., where volume is slowly re- measure and record diffraction intensities
duced by the equilibrating mixture, which is (amplitudes). Current CCD cameras have
placed nearby. McPherson, Carter, and others readout times on the order of a few (typically
have developed more quantitative methods for 2-8) seconds, a speed not dreamed of when the
optimizing crystal growth (10). first protein structure data was recorded from
phot&aphs (with intensities measured by
2.3 Data Collection
eye comparison to standard reference spots on
Most laboratories have rotating anode sources a separate film strip). Speed of data collection
for production of high intensity X-ray beams. can be an important advantage at third gener-
These are coupled with an area detector that ation synchrotron sources, with even shorter
has made single crystal diffractometers obso- exposure times. On the other hand, image
lete. Mirrors and other technology have also plates have a greater range of use, being acces-
been used to provide a more intense and sible in any X-ray diffraction laboratory, with
monochromatic radiation source (11).Radia- many of the newer models taking less than 1
tion from rotating anode sources is at a fixed min to record the intensity data. Image plate
wavelength, usually from high-voltage elec- detectors offen have more than one image plate,
trons impinging on either a copper or molyb- so one can be read while the other is exposed,
denum rotating anode, i.e., radiation at 1.54 effectively wasting no time during the collection
(copper) or 0.71 A (molybdenum). Radiation period. The image plates also offer a larger sur-
from synchrotron sources can often be tuned face area for data collection than most CCD cam-
to a wavelength of interest for multiwave- eras and are considerably less expensive.
length anomalous diffraction (MAD) experi- X-ray diffraction data from crystals are ei-
ments (see below). ther collected at room temperature or under
X-rays generated by a synchrotron source cryogenic conditions at liquid nitrogen tem-
are typically two orders of magnitude stronger peratures [around 100°K (- 170"C)l.For room
than conventional CuKa radiation generated temperature data collection, crystals are nor-
by a rotating anode. Synchrotron sources have mally mounted in thin-walled glass capillar-
greatly extended the ability to solve new pro- ies, with a small amount of mother liquor
tein structures when only weakly diffracting about 5 mm from the crystal. The mother li-
or small crystals are available. Another advan- quor in the capillary is critical because protein
tage in using the stronger synchrotron radia- crystals are 40-80% water-dried protein
tion is that the crystal exposure time is signif- crystals do not diffract. The nearby mother
Drop: 50% protein,
50% cocktail
--- ............... "..."."-."."-"m""" ........

SCcces fur T r i l Y0000750 nn 17 OCT-01 i-,sxxtbn I
Drop: 50%
protein,
50% cocktail
Fi gure 11.2. (a) The drops are typically 1-10 pA total volume, with between 100 and 1000 pA total
vo.lume of cocktail in the well. The smaller the drop size, the faster the equilibrium occurs, in general.
Thlere are a variety of plates now available in which to set up these vapor diffusion experiments, the
mc1st common being 24-well Limbro plates and 96-well microtiter plates. Several robots have been
d eveloped to automatically set up the crystallization experiments; although most are no faster than
do1ing the same procedure by hand (particularly with a multi-channel pipettor), there can be other
ad.vantages (e.g., consistency and reducing repetitive stress syndrome). Once plates are set up, they
arc? typically kept at a constant temperature and observed periodically under a microscope. (b and c)
Prc3gress in automating this aspect of characterization has occurred, and there are now imagers that
wil1 take high resolution, digital pictures of each drop in turn and store these for either manual or
aut;omated analysis. (dl Batch experiments are set up such that the protein is mixed with cocktail and
the!re is little concentration or dilution to the sample over time. This can now be done in very high
thr,oughput and small scale: 50-200 nL drops under oil in 1536-well plates, for example. This kind of
aPI)roach has been used to screen hundreds of conditions with small amounts of protein, which may
all()w for faster optimization later. One caveat is that small crystals don't necessarily lead to larger
crystals later, and all structures to date have had crystals of greater than 10 microns in a t least one
dinlension.
Figure 11.4. The crystals are manipulated by

scooping them up with a small loop of nylon that is
glued to the end of a pin. Surface tension from the
liquid will hold the crystal in the loop, but the crys-
tal can also be held by using a loop that is smaller in
size than the crystal of interest. This technique will
work particularly well with fragile crystals, thin
plates for example, that would normally fall apart in
a capillary mount. Once the crystal is frozen, it is
placed on an axis in line of both an X-ray source and
a stream of nitrogen set to about 100,000 to keep the
crystal frozen. The crystal is rotated in increments
during the data collection procedure to collect a full
data set (typically one or two degrees per frame,
depending on the resolution limits, mosaicity of the
crystal, unit cell lengths, etc.).
11.4). The cryoprotectant forms alayer of non-

crystalline glass around the crystal to protect
it from freeze shock. Simple freezing of the
Figure 11.3. (a) Area detector showing the config- crystal results in the formation of ice in the
uration of the unit. (b) Area detector showing the interior of the crystal and renders it useless. A
face. quick perusal of the literature shows PEG,
glycerol, sucrose, and 2-methyl-2,4-pentane
liquor ensures that the crystal is bathed in the diol (MPD) as the most popular cryopro-
vapor of the mother liquor and prevents dry- tectants. Oils, such as paraffin oil, have also
ing during data collection. The majority of been used successfully as cryoprotectants (12).
present day data collections in home laborato-
2.4 Phase Problem
ries and at synchrotrons are done under cryo-
genic conditions, which allows high intensity X-ray diffraction measurements as described
X-radiation to be used without the crystal de- above only provide the amplitudes of the dif-
cay observed in room temperature data collec- fracted waves. One must have the phase an-
tion. For cryogenic data collection, crystals are gles of all measured waves relative to a com-
normally mounted in a thin fiber loop with a mon origin in order to image the molecule
layer of suitable cryoprotectant solution (Fig. using a Fourier analysis. Figure 11.5 illus-
2 Methodology
(a) Amplitude structural motif of a homologous protein (14),

t and MAD from a single heavy element (1).
MR methodology requires a structural
model that is structurally homologous to the
protein that has been crystallized. Phasing
is accomplished through a six-dimensional
search-a three-dimensional rotation search
followed by a three-dimensional translation
search, using the model against the crystal
data. Molecular replacement is being em-
ployed more frequently as the number of
known structures has increased, which has
(b) Phase difference
k made unique structural motifs available for
phasing. For highly homologous protein struc-
tures, this method is usually straightforward
and successful. For marginal cases, the addi-
tion of some independent phase information,
single isomorphous replacement (SIR) or
MIR, in combination with MR can enhance the
quality of the Fourier map.
MAD phasing is an alternate methodology
for solving the phase problem. MAD requires a
Figure 11.5. Graphs showing the phase relation- single heavy atom with anomalous peak scat-
ship of electromagnetic radiation. [Adapted with tering at a wavelength where X-rays both at
permission from D. J. Abraham, Computer-Aided and near the spectral energy are accessible.
Drug Design, Methods, and Applications, Marcel Data sets are collected at different wave-
Dekker, Inc., New York, 1989.1 lengths to optimize the anomalous and disper-
sive signals 5 0 m the heavy atom. Certain
trates the differences in the phases and ampli- beamlines have been designed with wave-
tudes for two reflections. The solution of the lengths that are tunable "on the fly," and
phase problem that permitted the first image these are often referred to as MAD beamlines.
reconstruction of a protein was discovered by MAD has become the method of choice for
Perutz using multiple isomorphous replace- rapid structure solution when synchrotron ra-
ment (MIR) (13). diation is available. The advantage of MAD
The majority of the earliest structures were phasing is that one often only needs a single
solved using MIR to phase the maps. This re- crystal to collect all of the data necessary to
quires soaking the crystals or co-crystallizing solve the structure. Although multiple wave-
the protein with two or more heavy atoms and lengths are collected (anywhere from two to
hoping that these heavy atoms bind in a spe- five sets), data collection is routinely com-
cific way to the protein. It also requires that pleted in less than a few hours. The peak
the subsequent crystals are isomorphous with wavelength choice data set is very important
the native protein (i.e., no changes in the unit to collect first as it contains the greatest anom-
cell or symmetry of the crystal). Although it is alous signal and is often used alone to find the
possible to obtain phase information from a sin- heavy atom sites (Shake'n Bake program,
gle heavy atom derivative using additional infor- anomalous Patterson maps, etc.). If the crystal
mation (e.g., anomalous scattering or density degrades quickly in the beam, one can also
modification),one often works diligently to get a employ single-wavelength anomalous diffrac-
second or third derivative to improve the quality tion phasing (SAD) if a full data set at the peak
of the electron density (Fourier) map. wavelength was successfully collected. SAD
Two other common methods are used to phasing requires additional information, ob-
estimate phases in protein crystallography: tained by density modification, to obtain inter-
molecular replacement (MR), which uses the pretable electron density maps, but has been
proven in many instances to result in very grams such as WARP (24) can automatically
high quality maps (15). provide models of protein structures. When
Many different heavy atoms have been high
- resolution data is not available. a model is
used for MADISAD phasing, the most popular most often built in by hand using such graph-
being selenium. Selenium is incorporated into ics programs like 0 (25) or XFIT. The models
the amino acid sequence of the protein by add- are refined against the data by programs such
ing selenomethionine to the growth media as REFMAC (19) and CNS (26). All of these
when the protein is produced (16). For pro- programs have become much faster and easier
teins that bind DNA, 5-bromouracil has been a to use because of the incredible increases in
popular choice for phasing through anomalous speed that new hardware has allowed.
scattering. Most heavy elements have good It is worth mentioning that statistical and
anomalous signals (Hg, Pt, U, Au, etc.). Lan- probabilistic techniques have had a significant
thanides have a particularly good signal and impact in how heavy atoms are found and
can sometimes substitute for divalent metals models are refined (e.g., SHARP, SOLVE,
found naturally in the protein (e.g., Ca) (17). REFMAC). Baysian statistics and maximum
One of the major advantages of MAD phasing likelihood methods are now used instead of
is that the signal does not decay at higher res- least-squares methods. One may want to con-
olution with perfectly isomorphic crystals, so sider how various data collection strategies
the experimentally phased map can be quite may affect the later steps in the process by
good out to the full resolution of diffraction. keeping this in mind, i.e., high redundancy in
This typically has not been the case when us- the data makes for better statistics.
ing multiple isomorphous replacement, where The quality of a structure is measured in
the experimentally phased map often only ex- many ways: how low the R factor or R,,, is
tends to around 2.5 A resolution, because of a (the fit of observed data to the model), the res-
lack of isomorphism between the native and olution limit of the data, the ideality of the
heavy metal substituted crystals. Anomalous bonds and angles, etc. How well a structure
scattering has been useful in the structure de- measures up to other structures of about the
termination of very large structures; the 30s same resolution also gives a good idea of how
ribosome was recently solved using 0s and Lu "good" a given structure is (PROCHECK pro-
derivatives (18). gram). SFCHECK is a useful program for,as-
sessing the agreement between the atomic
2.5 Computing and Refinement
model and the experimental X-ray data. The
Raw intensity X-ray crystallographic data is level of confidence one expects from a given
next reduced and scaled to provide structure model will depend on the resolution of the
factors (F)that are used to solve and image the data. This can be seen clearly in Fig. 11.6,
structure. Two of the most -popular
- software where a residue from a protein structure is
packages employed to reduce raw date are shown with three different data cutoffs at dif-
Mosflm/CCP4 (19) and the HKL suite (20). ferent resolution ranges. The model from a
Both work very well and are very fast with 3.0-A data set may look the same as one from a
modern computers. A variety of programs, 1.3-A data set, but the level of confidence is
such as SOLVE (21), Shake and Bake (22), or much higher in the latter. A reasonably well-
SHELX (23), can be employed to find the refined structure will have a crystallographic
heavy atom positions, including hand search- R factor between 15% and 25% and will have
ing methods through Patterson maps. Once an R,,, of less than 30% under most circum-
heavy atom sites are found, they are usually stances.
refined with the programs SHARP (24) or
2.6 Databases
MLPHARE (19). The heavy atom positions
are next used as phase information input to The Protein Data Bank (PDB) (27,281 is now
provide initial phases for electron density coordinated by a consortium of several insti-
maps, which are used to fit the remainder of tutions (Rutgers University, the San Diego
the protein or nucleic acid. Once a model of the Supercomputer Center, and National Insti-
structure is obtained it is refined. In cases tute for Standards and Technology). As of this
where high resolution data is available, pro- writing, the PDB has over 18,000 structures,
lications of the Use of Crystallographic Studies in Drug Discovery and Development 479
gure 11.6. Three density maps at differing resolutions: a, 1.3 A; b, 2.1 A; c, 3.0A. See color insert.
with alver 15,000 of these done by X-ray crys- of structure-based drug design. As structural
tallogeaphy. Most of the rest were done by biology moves into the post-genomic age,
NMR. For small molecules, the Cambridge many companies and academic laboratories
Struct(ural Database (CSD) (29) contains are faced with the challenge of co-crystalliza-
structural information for over 230,000 or- tion of targets and inhibitors or activators on a
ganic and organometallic compounds. All of scale never before attempted. Previously,
these structures
, have been determined by X- crystal structure determination of a protein-
ray or neutron diffraction techniques. substrate or inhibitor complex in an academic
or industrial environment often yielded the
3 AP PLlCATlONS OF THE USE OF structural information desired to understand
CRYST'ALLOCRAPHIC STUDIES IN DRUG the mechanism of action or in the design of a
DlSCC)VERY A N D DEVELOPMENT more suitable substrate or inhibitor. However
modern day laboratories are now faced with
Crystadlization of small molecule compounds the daunting challenge of crystallizing hun-
with a protein or nucleic acid target followed dreds of compounds for clues in further ligand
by X-riay crystallographic determination of the design using standard organic synthesis or
combiined structure is the basis and hallmark combinatorial approaches.
A variety of methods have been employed Co-crystallization permits highly parallel

to co-crystallize biological molecules with screening for bound ligands through robotic
small molecules. Discovery of crystallization systems. The co-crystallization method is bet-
conditions is still an often tedious task, so ter suited for high-throughput crystallogra-
newer methods for screening crystallization phy, because ligand binding can sometimes be
conditions for proteins include the use of semi- determined without the need to solve each
automated robots. structure. Faster spectral analyses, or alterna-
Two fundamental methods are available tives such as native gel shifts, gel filtration,
for co-crystallizations. One method is termed and mass spectrometry can provide informa-
"soaking." This employs the addition of the tion on which of the crystals should be taken
small molecule directly to saturated solutions into X-ray studies. One difficulty in using the
containing crystals of the biological macro- co-crystallization method is the problem of de-
molecule in hopes that the ligand of interest
termining the concentration of the protein
will soak directly into the crystal and bind to
that is most suitable for complexation. For ex-
its active or binding site, so that the co-crystal
ample, if the protein solution is roughly 1 mM
structure can be determined. The other
method, called co-crystallization, depends on in high salt or aqueous buffers, many organic
having an ability to add the ligand to the aque- molecules are not as soluble at that level. In
ous protein solution in at least stoichiometric these cases a lower concentration of protein is
amounts, followed by crystallization using ei- usually employed to attain stoichiometric ra-
ther the known crystallization conditions or tios. As described above, small percentages of
by setting up a new screen for determining organic solvent can be useful for increasing
suitable crystallization conditions. Both the concentration of the organic compound in
methods have disadvantages and advantages, solution, but not without affecting the protein
and it is primarily up to the investigator to stability or crystal quality. In general, lower-
decide which method, or both methods, should ing the protein concentration sufficiently, fol-
be employed for their experiments. lowed by addition of the appropriate amount
One limitation to the soaking method is of ligand, and then concentration of the mix-
that the amount actually dissolved and avail- ture to the desired protein concentration for
able to form the complex can often not be eas- crystallization is the most successful method.
ily determined or controlled. In general, an Once conditions for obtaining the com-
excess of ligand, as a solid, is added to the so- plexed protein have been obtained, the next
lution with the crystals of protein with the step is to decide on which crystallization con-
hope that the ligand dissolves completely and ditions to use. In some cases, those proteins
will diffuse into the crystal binding site. One that do not undergo large tertiary structure
method that has demonstrated success in- changes when complexed to ligands can be
volves equilibration of the crystal with slightly crystallized under similar conditions as for the
higher concentration of the crystal mother li- uncomplexed ligand. However, in some in-
quor that contains the ligand solubilized by stances, proteins will change conformation,
organic cosolvents (i.e., isopropanol, DMSO, depending on the type of ligand that they are
ethanol, etc.) as part of the medium for diffu- complexed with, and a large screening of pos-
sion. However, higher levels of organic solvent sible new crystallization conditions is re-
often decreases the resolution of diffraction. quired.
Lowering the level of solvent after the addi- In many cases, soaking a compound into a
tion of compound has been found to result in crystal is not possible because of low solubility
better diffracting crystals. Another major lim- of the compound in the aqueous mother li-
itation of this method is that it is necessary to quor. Soaking experiments can also be limited
collect an entire X-ray diffraction data set to when the conformational space of the binding
determine if the small molecule is bound to site is hindered, occupied by adjacent mole-
the protein. This trial and error method can cules in the crystal lattice, or if there are con-
be time-consuming or expensive if a high- formational changes in the binding site be-
throughput crystallography approach is the cause of crystal packing effects. On the other
objective. hand, co-crystallization of the protein and li-
gand, by the nature of the process, usually re- this day, crystallography is almost exclusively
quires more resources in terms of protein and used in the pharmaceutical industry to study
experimental time, leading to greater expense. small molecule interactions with drug targets
Soaking crystals and/or growing crystals in (see Section 3).
the presence of inhibitors or ligands provides The development of several new methods
an opportunity to directly observe their bind- (described in Section 2) and the availability of
ing interactions, along with the often subtle the complete genome sequences of both patho-
conformational effects that can have a pro- gens and hosts provides an unparalleled op-
found effect on the mode of binding. When it is portunity to exploit protein structures for
possible to use this in an iterative fashion to drug discovery research in new ways. We can
guide the design of the next set of compounds now contemplate using protein structure de-
to be synthesized in a lead series, this becomes termination to help annotate genomes, that is,
a potent tool. Understanding how and why a to assess new drug targets as well as provide
compound or series binds to an active site, par- multiple high-resolution structures that ad-
ticularly when the affinity is also known, pro- dress selectivity issues. This emerging science
vides the best understanding, and the highest of high-throughput structural biology has
level at which it is possible to enable drug de- been termed structural genomics.
sign. As structural biology becomes a more in-
tegrated component of drug discovery, better
4.2 Genome Annotation
methods of obtaining crystal structures with
and without bound ligands will be developed, It is in infectious disease that whole genome
with lower costs and faster turnaround times. information first became available, and it is in
Many companies and academic laboratories this field that structural genomics is having an
are now focused on solving these challenges. initial impact (373). A typical approach has
But it is also impressive to see how well we been to assess the viability and/or virulence of
have progressed-Table 11.1 enumerates the pathogens by systematic disruption of every
structures of known therapeutic targets (with predicted gene product. As a consequence, a
or without ligands) that are available in the large number of potential new targets have
public domain. This table is based on the Na- emerged: genes that are essential for pathoge-
ture Biotechnology "The Usual Suspects" nicity of bacteria in a model system. Often
poster, published in 2001, but is almost iden- these new genes have been filtered for those
tical in content to the 1997 version (30). Ref- that are conserved in a variety of pathogens
erence sequences for 300 of the targets were and that do not have a close human homolog
extracted from NCBI Genbank and the non- (374). About 30-50% of the genes of a typical
redundant database was searched with several pathogen have no reliable functional assign-
iterations of PSI-BLAST. The resulting profile ment. A similar fraction of the novel targets
was used to search the PDB + SGX database shown to be essential also fall into this cate-
of known structures. The top hits for each gory, which becomes problematic for assay
drug target are tabulated below. A great many configuration. Indeed, in target-based ap-
more reside within pharmaceutical and bio- proaches, the number of leads emerging has
tech companies as proprietary structures. been disappointing. Protein structure can pro-
vide the information required to prioritize
among these essential genes and to establish
4 STRUCTURAL CENOMICS
assays. Co-complex structures with even low-
affinity hits can be used to provide key infor-
4.1 Introduction to Structural Cenomics
mation for medicinal chemistrv. "
Until recently, structure determination by There are several ways- in which structural
protein crystallography was a time-consuming genomics has promise as a tool for genome an-
method accessible to a few privileged skilled notation and target prioritization. For genes
practitioners. X-ray crystallography was re- of unknown function. structure can often -pro-
served to tackle questions requiring atomic vide clues to biochemical function. Sequence
resolution details of a demonstrably impor- homology has become a routine method for
tant protein, often a drug target. Indeed, to functional assignment, but even the most
Table 11.1 Known Drug Targets with Published Structures
Target and PDB Reference Resolution Source Homology Year Reference
Acetylcholinesterase
1MAHW Green mamba
1B41(A), 1F8U(A) Green mamba
1C2B(A), 1C20(A) Electric eel
lMAA(A) Mouse
Adenosine deaminase
lFKX, lFKW Mouse
1A4L(A),1A4M(A) Mouse
1UI0, lUIP Mouse
lADD
2ADA
Alpha-amylase
lJXT(A), lJXK(A) Human
lSMD Human
1C8Q(A) Human
lCPU(A), 2CPU(A) Human
lBSI Human
lHNY Human
3CPU(A) Human
1B2Y(A) Human
lDHK(A) Kidney bean
1J F H Pig
lPIF, 1PIG Pig
lOSE Pig
lHXO(A) Pig
lBVN(P) S. tendae
1PPI
Androgen receptor
1E3G(A) Human
1137(A), 1138(A) Rat
Anticoagulant protein C
lAUT(C) Human
Aquaporin 1
1IH5(A) Human
1FQWA) Human
P-Amyloid
lMWP(A) Human
p-Lactamase[Sal
lBTL Bacteria
lFQG(A) Bacteria
lJTD(A) Bacteria
lHTZ(A) Bacteria
lERM(A), lERO(A) Bacteria
lERQ(A) Bacteria
lXPB Bacteria
lESU(A) Bacteria
1BT5(A) Escherichia coli
lTEM Escherichia coli
1CK3(A) Escherichia coli
lAXB Escherichia coli
0-Tubulin
lJFF(B) Bovine
lTUB(B) Pig
lFFX(B) Rat
Calcineurin A
lTCO(A) Bovine
lAUI(A) Human
Carbonic anhydrase 2
lHEA, 4CAC, 5CAC Human, HSV-1
1G6V(A) Arabian camel
lCNw, lCNx, 1CNY Human
1IF4(A), 1IF5(A), 1IF6(A) Human
1IF9(A) Human
lCA.3, lHEB, lHED Human
lDCA, 1DCB Human
lCRA Human
lCIL, lCIM, lCIN Human, HSV-1
lCAY Human
lRZA, lRZB, IRZC, lRZD,1RZE Human
2CA2 Human
lBNl,lBN3,1BN4,1BNM Human
lCAH Human
Target and PDB Reference Resolution Source Homology Year
118Z(A) 1.93 A Human 100%
1BV3(A) 1.85 A Human 100%
12CA 2.40 A Human 99%
1G53(A) 1.94 A Human 100%
1AM6 2.10 A Human 100%
lCAN, lCA0 1.90 A Human rhinovirus 100%
lGOE(A), lGOF(A) 1.60 A Human 99%
lAVN 2.00 A Human 100%
lUGF 2.00 A Human 99%
lHVA 2.30 A Human 99%
5cA2 2.10 A Human 99%
lHCA 2.30 A - 100%
4CA2,6CA2,7CA2,9CA2 2.10 A-2.80 A Human 100%
lZNC(A) 2.80 A Human 100%
Catechol methyltransferase
lVID Rat
Cholecystokinina receptor
1D6G(A) NMR
Coagulation factor 10
lEZQ(A), lFOR(A),lFOS(A) Human
1C5M(D) Human
lxKA(C), lxKB(C) Human
lFAX(A) Human
lFJS(A) Human
lKIG(H) Soft tick
lHCG(A) -
1AI8(H) Hirudo rnedicinalis
lMKW(K), lMKX(K) Bos taurus
lBTH(H) Bovine
lHXF(H) Hirudo medicinalis
1G30(B) Hirudo medicinalis
1A3E(H) Hirudo medicinalis
1D3P(B), 1D3Q(B) Hirudo rnedicinalis
lHDT(H) Hirudo medicinalis
1AD8(H) Hirudo medicinalis
lLHC(H), lLHF(H), lLHG(H) Hirudomedicinalis
1JOU(B) Human
lDIT(H) Human
lWS(H) Human
4THN(H) Human
lTHP(B) Human
lJOU(B) Human
lDIT(H) Human
1WS(H) Human
4THN(H) Human
lTHP(B) Human
1AY6(H) Human
lClU(H), lClV(H) Human
1A4W(H) Human
1G37(A) Human
lEOJ(A), lEOL(A) Human
lBBO(B) Human
1C4U(2), 1D6W(A),1D9I(A), 1DOJ(A) Human
7KME(H) Human
.b
lQBV(H) Human
% 1DM4(B) Human
lUMA(H) Medicinal leech
lBMM(H), lBMN(H) Medicinal leech
1A2C(H) M. aeruginosa
lFPC(H) -
lNRO(H), lNRR(H) -
lHAG(E) -
lHLT(H) -
1TMU(H) -
4HTC(H) -
lAK(H), lDWB(H), lDWC(H) Hirudo medicinalis
2HPP(H) -
lABI(H) -
lJBU(H) Bacteria
Coagulation factor 7a
1QFWH) Human
lDVA(H) Human
lDAN(H) 2.00 A Human 100% 1997 (152)
lCVW(H) 2.28 A Human 100% 1999 (153)
lFAK(H) 2.10 A Human 100% 1998 (154)
lRFN(A) 2.80 A Human 100% 1999 (155)
lPFX(C) 3.00 A Pig 88% 1995 (156)
Cox-1
1DWA) 3.00 A Sheep 93% 1999 (157)
lCQE(A), lPRH(A) 3.10 A, 3.50 A Sheep 92% 1994 (158)
1PTH 3.40 A Sheep 92% 1995 (159)
lEBV(A) 3.20 A Sheep 93% 2000 (160)
1FE2(A) 3.00 A Sheep 92% 2000 (161)
lEQG(A), lEQH(A), 1HT5(A), lHT\ill\(A) 2.61 A-2.75 A Sheep 92% 2000 (162)
lPGE(A), lPGF(A), lPGG(A) 3.50 A, 4.50 A Sheep 92% 1995 (163)
Cox-2
lCW(A), lDDX(A) 2.40 A, 3.00 A Mouse 87% 1999 (164)
lCX2,3PGH, 4COX, 5COX, 6COX 3.00 A Mouse 87% 1996 (165)
Cytochrome P450 reductase
lBlC(A) 1.93 A Human 100% 1998 (166)
lAMO(A) 2.60 A Rat 93% 1997 (167)
1J9Z(A), lJAO(A), lJAl(A) 2.70 A, 2.60 4 1.80 A Rat 92% 2001 (168)
Dihydrofolate reductase
lBOZ(A) 2.10 A Human 99% 1998 (169)
lHFP, lHFQ, 1HFR 2.10 A Human 100% 1997 (170)
lOHJ, lOHK 2.50 A Human 100% 1997 (171)
lDRl,lDR5,1DR6, 1DR7 2.20 A, 2.40 A - 75% 1992 (172)
1DR2,1DR3 2.30 A - 75% 1992 (173)
1DR4 2.40 A - 75% 1992 (174)
lDHF(A), 2DHF(A) 2.30 A - 100% 1989 (175)
lDLR, 1DLS 2.30 A - 99% 1995 (176)
8DFR 1.70 A - 75% 1989 (177)
1DRF 2.00 A - 100% 1990 (178)
Dihydroorotate dehydrogenase
1D3G(A), 1D3H(A) 1.60 *,1.80 A Human 100% 1999 (179)
Dihydropteroate synthetase[Sal
lADl(A), 1AD4(A) S. aureus
DNA helicase pcra[Sa]
1QHHW B. thermophilus
DNA topoisomerase 1
1EJ9(A) Human
1A36(A) Human
1A31(A), 1A35(A) Human
Estrogen receptor l a
lQKT(A), lQKU(A) 2.20 A, 3.20 A Human
lHCP NMR Human
1A52(A) 2.80 A Human
lERR(A), lERE(A) 2.60 A, 3.10 A Human
lHCQ(A) 2.40 A Human
3ERT(A), 3ERD(A) 1.90 A, 2.03 A Human
FK506-binding protein
lTCO(C) 2.50 A Bovine
lFKD, 2FKE 1.72 A Human
lFKJ, lFKK, 1FKL 1.70 & 2.20 A Cow
lFAP(A) 2.70 A Human
3FAP(A), 4FAP(A) 1.85 A, 2.80 A Human
lNSG(A) 2.20 A Human
lFKR, lFKS, 1FKT NMR Human
lEYM(A) 2.00 A Human
1BL4(A) 1.90 A Human
1D60(A), 1D7H(A), 1D7I(A), 1D7J(A) 1.85 A-1.90 A Human
lQPF(A), lQPL(A) 2.50 A, 2.90 A Human
1F40(A) NMR Human
1B6C(A) 2.60 A Human
1A7X(A) 2.00 A Human
lBKF 1.60 A Human
2FAP(A) 2.20 A Human
1C9H(A) 2.00 A Human
IFKG, lFKH, lFKI(A) 2.00 A, 1.95 & 2.20 A
lFKF 1.70 A
lFKB 1.70 A
Follicle stimulating hormone
1FL7(B) 3.00 A Human 99% 2000 (211)
GABA transferase
lGTX(A) 3.00 A
. Pig 94% 1999 (212)
Glucocorticoid receptor
lLAT(A) 1.90 A Rat 85% 1995 (213)
lGLU(A) 2.90 A - 94% 1992 (214)
Glutamate receptor 1
lEWK(A), lEWT(A), lEWV(A) 2.20 3.70 A, 4.00 A Rat 98% 2000 (215)
Glutathione peroxidase
lGPl(A) 90% J u n 1985 (216)
G-CSF 3
e 1CD9(A), lPGR(A) 2.80 A, 3.50 A
0
oa lBGC, IBGD, lBGE(A) 1.70 A, 2.30 $2.20 A
lGNC NMR
lRHG(A) 2.20 A
Granulocyte-macrophage CSF
lCSG(A) -
2GMF(A) Human
Growth hormone receptor
1A22(B) Human
1AmB) Human
lHWG(B), lHWH(B) Human
3HHR(B) -
HIV reverse transcriptase
lDLO HIV-1
1RT3(B) HIV-1
lHPZ, lHQE, ' ~ H Q U HIV-1
lBQM, lBQN HIV-1
lTVR(B), lUWB(B) HIV-1
lEET HIV-1
lIKv, lIKw, lIKX, lIKY HIV-1
lHW(B) HIV-1
lClB(B) HW-1 pol
2HMI(B) virus
1HYs Virus
1HMV Virus
lHNI Virus
1HNV virus
lFKP(B) virus
lJLA, lJLB, lJLC, lJLE, lJLF, 1JLG virus
1550, lQEl(B) HIV-1
3HVT(B) -
Inosine monophosphate dehydrogenase 2
lJRl(A) Chinese hamster
1B30(A) Human
Insulin-like growth factor 1
3LRI(A) NMR Human
lBQT NMR Human
1IMXA) 1.82 A Human
1B9G(A) NMR -
2GF1,3GF1 NMR -
Insulin-like growth factor 1 receptor
lIGR(A) Human
lGAG(A) Human
1144(A) Human
1IR3(A) Human
lIRK -
Insulin-like growth factor 2
lIGL NMR
Integrin alpham
lBHQ(l), lIDN(1) Human
lJLM Human
lIDO Human
Intercellular adhesion molecule 1
lIAM Human
lICl(A) Human
1D3E(I), 1D3I(I), 1D3L(A) Human rhinovirus
Interferon a 1
lITF NMR Human
1RH2(A) 2.90 A Human
Interferon y
1FG9(A) 2.90 A Human
lFYH(A) 2.04 A Human
lEKU(A) 2.90 A Human
lHIG(A) 3.50 A
Interleukin 1
2ILA 2.30 A
Interleukin 1 receptor
lGOY(R) 3.00 A Human
1IPAO 2.70 A Human
lITB(B) 2.50 A Human
p. Interleukin 10
rD
o lVLK 1.90 A Epstein-Barr virus
21LK 1.60 A Human
lILK 1.80 A Human
1J7V(L) 2.90 A Human
lINR 2.00 A Human
Interleukin 12
1F42(A),1F45(A) 2.50 A, 2.80 A Human
Interleukin 13
1GA3(A) NMR Human
Interleukin 2
lIRL NMR Human
3INK(C) 2.50 A -
Interleukin 3
1JLI NMR Escherichia coli
Interleukin 4
lHIJ, lHIK 3.00 4 2.60 A Human
lIAR(A) 2.30 A Human
lHZI(A) 2.05 A Human
lITM NMR .
lBBN, 1BCN NMR
lCYL NMR
2CYK NMR
lITL NMR
21NT 2.40 A
1RCB 2.25 A
lITI NMR
Interleukin 5
lHUL(A) 2.40 A Human
Interleukin 6
1IL6,2IL6 NMR Human
lALU 1.90 A Human
Interleukin 8
lIKL, 1IKM NMR Human
lICW(A) 2.01 A Human
lILP(A), lILQ(A) NMR Human
1QE6(A) 2.35 A Human
lROD(A) NMR Human
3IL8 2.00 A -
1IL8(A),2IL8(A) NMR -
e Leukotriene A4 hydrolase
2
1HS6(A) 1.95 A Human
Lipocortin I
lAIN 2.50 A Human
1HM6(A) 1.80 A Pig
Luteinizing hormone P
lQFW(B) 3.50 A Escherichia coli
lHCN(B) 2.60 A -
lHRP(B) 3.00 A -
Macrophage CSF 1
lHMC(A) 2.50 A
Neurarninidase[int B virus]
lINF 2.40 A Influenza b virus
2.20 A 1.90 A Influenza b virus 94%
2.40 A - 99%
1.70 1.80 A - 94%
2.50 A, 2.40 A, 2.35 A Influenza b virus 99%
2.40 A - 99%
2.20 A - 94%
Neuropeptide Y
lRON NMR Human
1F8P(A) NMR -
lFVN(A) NMR -
Parathyroid hormone
lHTH NMR Human
lFVY(A) NMR Human
lBWX, lHPY, lZWA, 1ZWC NMR Human
lETl(A) 0.90 A Human
1HPH NMR -
lZWB, lZWD, lZWE, lZWF, lZWG NMR -
PDGF p
lPDG(A) -
Phospholipase A2
lBCI NMR Human
lRLW 2.40 A Human
\O lCJY(A) 2.50 A Human
N
Potassium channel shaker
1A68 Sea hare
lEOD(A), lEOE(A), lEOF(A) Sea hare
lTlD(A) Sea hare
lEXB(E) Rat
lDSX(A), lQDV(A), lQDW(A) Rat
PPAR y
4PRG(A) Escherichia coli 97%
lPRG(A), 2PRG(A) Human 97%
1FM6(D), 1FM9(D) Human 99%
3PRG(A) Human 99%
Progesterone receptor
1E3K(A) Human
1A28(A) Human
Proladin receptor
1BP3(B) Human
1F6F(B) Rat
Retinoic acid receptor
lDSZ(A) Human
lEXA(A), lEXX(A) Human
2LBD 2.00 A Human
3LBD, 4LBD 2.40 A Human
lDKF(B) 2.50 A Human
lFCX(A), lFCY(A), lFCZ(A) 1.47 A, 1.30 4 1.38 A Human
lHRA NMR
Retinoid X receptor
1FM6(A), 1FM9(A) 2.10 A Human
lDSZ(B)
lDKF(A), 1LBD
2NLL(A)
*,
1.70 A
2.50 2.70 A
1.90 A
Human
Human
Human
lRXR NMR Human
lGlU(A), 1G5Y(A) 2.50 4 2.00 A Human
lFBY(A) 2.25 A Human
1BY4(A) 2.10 A Human
Serotransferrin p
1JNF(A) Rabbit
Stem cell factor
lEXZ(A) Human
lSCF(A) Human
P
a
W
Thymidine k i n a s e [ H W
1KWA)
10HI(A), 2KI5(A)
lKIZ,lKI3,1KI4,1KI6,lKI7,1KI8
lWK, 2WK, 3 W K
1E2H(A), 1E2I(A), 1E2J(A)
1E2M(A), 1E2N(A), 1E2P(A)
1E2K(A), 1E2L(A)
Tumor necrosis factor receptor 1
lNCF(A) Human
lEXT(A) Human
lTNR(R) -
Vitamin D receptor
1IE8(A), 1IE9(A) Human
lDBl(A) Human
Xanthine-guanine phosphoribosyltransferase
1A95(A), 1A97(A), 1A98(A) Escherichia coli
lNUL(A) Escherichia coli
1A96(A) Escherichia coli
sensitive sequence methods [such as ISS ceptor (GPCR)-mediated signal transduction

(37511 fail to identify many homologous rela- (382) This on-going story demonstrates the
tionships. Structure is more conserved than power of structural approaches to determine
sequence, so structural classification schemes function.
(SCOP, CATH) have been a valuable method In tackling structures of proteins of un-
to assign proteins to functional groups. A now known function bound metal ions, natural
classic example of functional understanding substrates or even serendipitously bound
from structural homology was the discovery small molecules arising from the crystal prep-
that the Bcl-2 family of apoptosis proteins are aration (e.g., buffers) often suggests the loca-
homologous to pore-forming toxins (376). This tion of an active site. If the side-chains contrib-
finding led to the suggestion that Bcl-2 pro- uting to binding are well conserved, then this
teins may function by perforating mitochon- is good evidence locating an active site and
drial membranes, and has since opened new helps assess the "drugability" of the protein.
avenues of fruitful research. The recent structure of LuxS illustrates the
In addition to structural homology as as- power of this approach (373). A number of
sessed by global similarities, local structural genes had been identified that are required for
features can give clues to structure even when quorum sensing in bacteria by system 2. Quo-
proteins are not homologous. By identifying rum sensing by the widely conserved system 2
surface clusters of polar residues that are well has emerged as an intriguing mechanism by
conserved in the sequence family it is possible which bacteria monitor their density and
to identify likely functional sites even when seems to be an important component of the
there is no obvious structural homology. progression to virulence, at least in certain
These three-dimensional motifs can be com- pathogens. LuxS is the product of one of the
pared with a structure database to identify genes required for system 2, but nothing was
similar motifs with known function. A classic known of the molecular function of LuxS in
example is found among the serine proteases. this pathway. Disruption of this pathway has
Chyrnotrypsin and subtilisin share a similar promise in antibacterial drug design, but
catalytic triad (His-Asp-Ser) but are otherwise whether LuxS would be an attractive target
unrelated structurally. The PLP-dependent for small molecule design was unclear. No in-
enzymes are famed for the diversity of both formation was available to develop a biochem-
structure and function, but even among this ical assay, and besides, it was not clear what
group, common structural motifs seem to have kind of library to use in high-throughput
evolved convergently (377). screening. The structure of LuxS was solved a t
Simply searching for large clefts in the pro- Structural GenomiX in less than 2 months,
tein surface turns out to be an extremely suc- and there are representative X-ray structures
cessful method to identify active sites. Nucleic from three different bacteria. The structure
acid binding functions can be particularly ob- showed that LuxS forms a dimer in which each
vious from an analysis of surface electrostatics monomer has a zinc ion coordinated by a His-
(378,379).Mice homozygous for tubby loss-of- His-Cys triad and water molecule. Non-co-
function mutations show an obese phenotype, valently bound methionine molecules were
and therefore, the tubby protein has attracted found to have bound in a pocket formed at the
considerable interest. However, 3 years after dimer interface and close to the zinc ions (see
the initial cloning of the tubby gene (380,3811, Fig. 11.7, a and b). Methionine was shown to
the molecular function of the protein was still have bound as an artefact of the purification
a mystery. The structure of the conserved C- procedure. With this information, it became
terminal domain of tubby was determined by immediately clear that LuxS is likely a zinc
X-ray crystallography, and a large groove of metalloenzyme, and a hypothesis for the likely
highly positive charge immediately led to the physiological substrate emerged from molecu-
hypothesis that the protein acted as a tran- lar modeling studies of the methionine bind-
scription factor (6). The search is now on for ing site. This example illustrates how struc-
downstream targets of tubby, and further ture could rapidly accelerate an early stage
structural work has demonstrated a new role project providing the starting point for assay
for the tubby protein in G-protein-coupled re- development and selection of an appropriate
4 Structural Cenomics
pathways, such as peptidoglycan biosynthesis

and translation, are the targets of current
drugs and several new pathways are promis-
ing targets for the development of novel
agents.
Comprehensive, high-resolution structural
information of multiple pathway components
provides a basis for the rational design of in-
hibitors targeting the pathway. Interfering
with the function of anv " of a number of en-
zymes of a pathway may have equally benefi-
cial therapeutic value. Despite this, some en-
zymes may be more tractable targets for the
design of inhibitors than others. Access to
high resolution structural information of all
the components of a therapeutically relevant
pathway enables the rational choice of the
best-suited target(s1 to pursue for the design
of agonists and antagonists. This choice may
depend on such pragmatic considerations as
the access to libraries targeted to particular
enzyme types and available synthetic chemis-
try expertise. Furthermore, comparison of the
binding pockets of consecutive enzymes in the
pathway that bind similar (or identical) sub-
strates and products may even enable the de-
sign of inhibitors of multiple pathway compo-
nents. Such a compound may be particularly
desirable in the development of novel anti-mi-
crobial and cancer agents where compound re-,
Figure 11.7. (a) The likely active site of LwrS iden- sistance can rapidly emerge. The evolution of
tified by searching for clusters of polar, conserved resistance to a drug that inhibits two consec-
residues in the structure. (b) Structure of the LuxS utive enzymes in an essential pathway is the-
monomer highlighting the bound zinc ion (magenta) oretically much less probable than evolution
and methionine (green). See color insert. of resistance to a single enzyme inhibitor.
The non-mevalonate isopentenyl pyro-
phosphate biosynthesis pathway has attracted
screening library (in this case metalloenzyme attention in recent years as a novel target for
inhibitor libraries would be desirable). The the design of anti-microbial inhibitors (383).
model of the likely substrate bound to the ac- At Structural GenomiX, the structures of
tive site suggests further experiments to test three consecutive enzymes in this pathway
this hypothesis and even provides a starting have been solved. There is now a clear under-
point for medicinal chemistry exploration. standing of which pathway components may
be most tractable to inhibitor discovery, which
4.3 Pathways
likely have least structural homology to hu-
Increasingly in drug discovery, particular mo- man proteins, and even how to go about the
lecular pathways are attracting interest in design of pan-pathway inhibitors.
drug design and often manipulation of any
4.4 Protein Structure Modeling
of a number of pathway components would
achieve the same end. Pathways controlling An aim of structural genomics efforts, to pro-
apoptosis, the cell cycle, and inflammation all vide high quality three-dimensional struc-
contain multiple biologically validated tar- tures for every protein sequence, will not be
gets. In microbial disease, several biosynthetic achieved by experimental approaches alone
Many consortiurns are selecting targets for X- 3. L.Pauling, Lecture presented at the Interna-
ray crystallography that would provide the tional Congress of X-ray Crystallography at
templates for comparative modeling tech- Stonybrook, NY, August 1973.
niques of all other sequences (384, 385). As 4. J. Jancarik and S. H. Kim, J. Appl. Cryst., 24,
more structures are determined by NMR and 409-411(1991).
X-ray crystallography, the quality of the mod- 5. Hampton Research, available online at http://
els will improve simply because more similar www.hamptonresearch.com,accessed on Octo-
templates will become available but also be- ber 9,2001.
cause and new methods for loop modeling and 6. G. L.Gilliland, M. Tung, D. M. Blakeslee, and
ab initio structure prediction will undoubtedly J. Ladner, Biological Macromolecule Crystalli-
zation Database (BMCD), available online at
emerge (386, 387). Efforts are also underway
http://www.bmal.nist.gov:8O8O/bmcdmmal.html,
both in industry and academia to assemble da- accessedon October9,2001.
tabases of homology models for all sequences
7. N.E. Chayen, Structure, 5,1269-1274(1997).
that can be reasonably well modeled (388).
8. I. Jurisica, et al., IBMSystems J.,40,394-409
(2001).
5 CONCLUSION 9. A. D'Arcy, et al.,J. Cryst. Growth, 168, 175-
180 (1996).
Anyone who is involved or interested in drug 10. C. W. Carter Jr., Methods Enzymol., 276,
discovery will recognize the potential of pro- 74-99(1997).
tein crystallography to greatly enhance the 11. A. C.Bloomer and U.W. Arndt, Acta Crystal-
process. Whether this promise has been met to logr. D Biol. Crystallogr., 55(Pt lo),1672-1680
date is the subject of considerable debate. (1999).
What is certain, however, is that in the very 12. (a) G. A. Petsko, J. Mol. Biol., 96, 381-392
near future the advances in crystallography (1975);(b) R. L.Sutton, J. Chem. Soc. Faraday
technology will render this question moot. Trans., 1, 101-105 (1991);(c) D. W. Rodgers,
The histograms on the PDB website (27, 28) Structure, 2,1135-1140(1994).
that show the increasing rate of structures de- 13. D. W. Green, V. M. Ingram and M. F. Perutz,
posited over the last decade are a startling vi- Proc. R. Soc. A, 225,287-307(1954).
sual indicator of the revolution that is occur- 14. M. G. Rossman and D. M. Blow, Acta Cryst., 15,
ring in the field. Clearly, the impact will be felt 24-34(1962).
in drug discovery very soon and perhaps very 15. L. M. Rice, T. N. Earnest, A. T. Brunger, Acta
dramatically, and it serves the audience of this Cryst. D., 56,1413-1420(2000).
series to be well informed of these advances in 16. W.A. Hendrickson, et al., EMBO J.9,1665-
technology and their subtle limitations. 1672(1990).
It is tempting to draw analogy with the de- 17. W. I. Weis, et al., Science, 254, 1608-1615
velopment of other analytical technologies (1991).
(NMR, FAB-MS) and conclude that protein 18. W. M. Clemons, Jr., et al., J Mol Biol. 310,
crystallography will soon leave the incubator 827-843(2001).
of "big machine physics" to become an every- 19. Collaborative Computational Project, Number
day, routine tool used in the medicinal chem- 4,Acta Cryst. D,50,760-763(1994).
istry laboratory. Hopefully, this chapter has 20. Z.Otwinowski and W. Minor, available online
shown some of the subtle complexities of sam- a t http://www.hkl-xray.com,accessed October
ple preparation and handling, data collection, 9,2001.
and refinement, etc. that temper this vision 21. T.Terwilliger, Automated Structure Solution
and will likely keep this a specialized field for for MIR and MAD, available online at http://
some time. www.solve.lanl.gov, accessed October 9,2001.
22. C. M. Weeks, S. A. Potter, J. Rappleye, R.
Miller, available online a t http://www.hwi.
REFERENCES buffalo. edu/SnB, accessed on October 9,2001.
1. W. A. Hendrickson, Science, 254,51-58(1991). 23. G. M. Sheldrick, available online at http://
2. T. C. Tenvilliger, Nat. Struct. Biol., 7,935-939 shekuni-ac.gwdg.de/SHEW, accessed on Oc-
(2000). tober 9,2001.
References
24. V. S. Lamzin and A. Perrakis, available online 43. G. D. Brayer, G. Sidhu, R. Maurus, E. H. Ryd-
at http://www.embl-hamburg.de/ARP, accessed berg, C. Braun, Y. Wang, N. T. Nguyen, C. M.
on October 9,2001. Overall, and S. G. Withers, Biochemistry, 39,
25. A. Jones and M. Kjeldgaard, available online at 4778 (2000).
http://www.imsb.au.dk/-moWo, accessed on 44. E. H. Rydberg, G. Sidhu, H. C. Vo, J. Hewitt,
October 9,2001. H. C. Cote, Y. Wang, S. Numao, R. T. Macgil-
26. A. T. Brunger, P. D. Adams, G. M. Clore, W. L. livray, C. M. Overall, G. D. Brayer, and S. G.
Delano, P. Gros, R. W. Grosse-Kunstleve, J.4. Withers, Protein Sci., 8,635 (1999).
Jiang, J. Kuszewski, M. Nilges, N. S. Pannu, 45. G. D. Brayer, Y. Luo, and S. G. Withers, Pro-
R. J. Read, L. M. Rice, T. Simonson, and G. L. tein Sci., 4, 1730 (1995).
Warren, Crystallography and NMR System, 46. G. D. Brayer, G. Sidhu, R. Maurus, E. H. Ryd-
available online at http://cns.csb.yale.edu/vl.0, berg, C. Braun, Y. Wang, N. T. Nguyen, C. M.
accessed on October 9,2001. Overall, and S. G. Withers, Biochemistry, 39,
27. H. M. Berman, D. S. Goodsell, and P. E. 4778 (2000).
Bourne, Am. Scientist, 90,350-359 (2002). 47. M. Qian, R. Haser, G. Buisson, E. Duee, and F.
28. H. M. Berman, J. Westbrook, Z. Feng, G. Gilli- Payan, Biochemistry, 33,6284 (1994).
land, T. N. Bhat, H. Weissig, I. N. Shindyalov, 48. C. Bompard-Gilles, P. Rousseau, P. Rouge, and
P. E. Bourne, The Protein Data Bank, Nucleic F. Payan, Structure (Lond), 4, 1441 (1996).
Acids Res., 28,235-242 (2000). 49. M. Qian, S. Spinelli, H. Driguez, and F. Payan,
29. Information on how to obtain this database Protein Sci., 6, 2285 (1997).
is available at: http://www.ccdc.cam.ac.uW 50. M. Machius, L. Vertesy, R. Huber, and G. Wie-
prods/. gand, J. Mol. Biol., 260,409 (1996).
30. J. Drews and S. Ryser, Nature Biotechnol.15, 51. C. Gilles, J. P. h t i e r , G. Marchis-Mouren, C.
(1997). Cambillau, and F. Payan, Eur. J. Biochem.,
31. Y. Bourne, P. Taylor, and P. Marchot, Cell, 83, 238,561 (1996).
503 (1995). 52. M. Qian, V. Nahoum, J. Bonicel, H. Bischoff, B.
32. G. Kryger, M. Harel, M. Harel, A. Shafferman, Henrissat, and F. Payan, Biochemistry, 40,
I. Silman, and J. L. Sussman,Acta Crystallogr., 7700 (2001).
Sect. D, 56, 1385 (2000). 53. G. Wiegand, 0. Epp, and R. Huber, J. Mol.
33. Y. Bourne, J. Grassi, P. E. Bougis, and P. Mar- Biol., 247,99 (1995).
chot, J. Biol. Chem., 274,3370 (1999). 54. M. Qian, R. Haser, G. Buisson, E. Duee, and F'.
34. Y. Bourne, P. Taylor, P. E. Bougis, and P. Mar- Payan, Biochemistry, 33,6284 (1994).
chot, J. Biol. Chem., 274,2963 (1999). 55. P. M. Matias, P. Donner, R. Coelho, M.
35. V. Sideraki, K. A. Mohamedali, D. K. Wilson, Z. Thomaz, C. Peixoto, S. Macedo, N. Otto, S.
Chang, R. E. Kellems, F.A. Quiocho, and F. B. Joschko, P. Scholz, A. Wegg, S. Basler, M.
Rudolph, Biochemistry, 35,7862 (1996). Schafer, U. Egner, and M. A. Carrondo, J.Biol.
Chem., 275,26164 (2000).
36. Z. Wang and F. A. Quiocho, Biochemistry, 37,
8314 (1998). 56. J. S. Sack, K. F. Kish, C. Wang, R. M. Attar,
S. E. Kiefer,Y. Ang.Y. Wu, J. E. Schemer, M. E.
37. V. Sideraki, D. K. Wilson, L. C. Kurz, F. A. Salvati, S. R. Krystekjr., R. Weinmann, and
Quiocho, and F. B. Rudolph, Biochemistry, 35,
H. M. Einspahr, Proc. Nut. h a d . Sci. USA, 98,
15019 (1996).
4904 (2001).
38. D. K. Wilson, F. B. Rudolph, and F. A. Quiocho, 57. T. Mather, V. Oganessyan, P. Hof, R. Huber, S.
Science, 252, 1278 (1991). Foundling, C. Esmon, and W. Bode, Embo J.,
39. D. K. Wilson and F. A. Quiocho, Biochemistry, 15,6822 (1996).
32, 1689 (1993). 58. G. Ren, V. S. Reddy, A. Cheng, P. Melnyk, and
40. N. Ramasubbu, C. Ragunath, and Z. Wang, In A. K. Mitra, Proc. Nut. Acad. Sci. USA, 98,
press. 1398 (2001).
41. N. Rarnasubbu, P. Venugopalan, Y. Luo, G. D. 59. K. Murata, K. Mitsuoka, T. Hirai, T. Walz, P.
Brayer, and M. J. Levine, In press. A g e , J. B. Heyrnann, A. Engel, and Y. Fujiyo-
42. N. Ramasubbu, K. Sekar, and D. Velmurugan, shi, Nature, 407,599 (2000).
Acta Crystallogr. D Biol. Crystallogr., 52, 435 60. J. Rossjohn, R. Cappai, S. C. Feil, A. Henry,
(1996). W. J. Mckinstryd. Galatis, L. Hesse, G. Mul-
thaup, K. Beyreuther, C. L. Masters, andM. W. 79. C.-Y. Kim and D. W. Christianson, In press.
Parker, Nut. Struct. Biol., 6, 327 (1999). 80. B. A. Grzybowski, A. V. Ishchenko, C.-Y. Kim,
61. C. Jelsch, L. Mourey, J. M. Masson, and J. P. G. Topalov, R. Chapman, D. W. Christianson,
Samama, Proteins, 16, 364 (1993). G. M. Whitesides, and E. I. Shakhnovich, Proc.
62. N. C. Strynadka, H. Adachi, S. E. Jensen, K. Nut. Acad. Sci. USA, 99,1270 (2002).
Johns, A. Sielecki, C. Betzel, K. Sutoh, and 81. S. K. Nair and D. W. Christianson, Biochemis-
M. N. James, Nature, 359, 700 (1992). try, 32,4506 (1993).
63. D. C. Lim, H. U. Park, L. Decastro, S. G. Kang, 82. J. A. Ippolito and D. W. Christianson, Bio-
H. S. Lee, S. Jensen, K. J . Lee, and N. C. J. chemistry, 32,9901 (1993).
Strynadka, Nut. Struct. Biol., 8,848 (2001). 83. S. Mangani and A. Liljas, J. Mol. Biol., 232,9
64. M. C. Orencia, J . S. Yoon, J. E. Ness, W. P. (1993).
Stemmer, and R. C. Stevens, Nut. Struct. Biol., 84. G. M. Smith, R. S. Alexander, D. W. Christian-
8, 238 (2001). son, B. M. McKeever, G. S. Ponticello, J . P.
65. S. Ness, R. Martin, A. M. Kindler, M. Paetzel, Springer, W. C. Randall, J. J. Baldwin, and
M. Gold, J. B. Jones, and N. C. J. Strynadka, C. N. Habecker, Protein Sci., 3,118 (1994).
Biochemistry, 39, 5312 (2000). 85. K. Hakansson, C. Briand, V. Zaitsev, Y. Xue,
66. E. Fonze, P. Charlier, Y. To'Th, M. Vermeire, and A. Liljas, Acta Crystallogr. D Biol. Crystal-
X. Raquet, and A. Dubus, Acta Crystallogr., logr., 50, 101 (1994).
Sect. D, 61,682 (1995). 86. K. Hakansson, A. Wehnert, and A. Liljas, Acta
67. E. Fonze, P. Charlier, Y. To'Th, M. Vermeire, Crystallogr. D Biol. Crystallogr., 50,93 (1994).
X. Raquet, and A. Dubus, Acta Crystallogr.,
87. A. E. Eriksson, P.M. Kylsten, T. A. Jones, and
Sect. D, 51,682 (1995).
A. Liljas, Proteins, 4,283 (1988).
68. L. Maveyraud, L. Mourey, L. P. Kotra, J. D.
Pedelacq, V. Guillet, S. Mobashery, and J. P. 88. P. A. Boriack-Sjodin, S. Zeitlin, H. H. Chen, L.
Samama, J. Am. Chem. Soc., 120,9748 (1998). Crenshaw, S. Gross, A. Dantanarayana, P. Del-
gado, J. A. May, T. Dean, and D. W. Christian-
69. L. Maveyraud, I. Massova, C. Birck, K. Mi- son, Protein Sci., 7, 2483 (1998).
yashita, J. P. Samama, and S. Mobashery,
J. Am. Chem. Soc., 118, 7435 (1996). 89. K. Hakansson and A. Wehnert, J. Mol. Biol.,
228, 1212 (1992).
70. P. Swaren, D. Golemi, S. Cabantous, A. Buly-
chev, L. Maveyraud, S. Mobashery, and J. P. 90. C.-Y. Kim, D. A. Whittington, J. S. Chang, J.
Samama, Biochemistry, 38,9570 (1999). Liao, J. A. May, and D. W. Christianson,
71. L. Maveyraud, R. F. Pratt, and J. P. Samama, J. Med. Chem., 45, 888 (2002).
Biochemistry, 37,2622 (1998). 91. F. Briganti, S. Mangani, A. Scozzafava, G. Ver-
72. J. Lowe, H. Li, K. H. Downing, and E. Nogales, naglione, and C. T. Supuran, J. Biol. Znorg.
J. Mol. Biol., 313, 1045 (2001). Chem., 4, 528 (1999).
73. E. Nogales, S. G. Wolf, and K. H. Downing, 92. S. K. Nair, T. L. Calderone, D. W. Christian-
Nature, 391, 199 (1998). son, and C. A. Fierke, J. Biol. Chem., 266,
74. B. Gigant, P. A. Curmi, C. Martin-Barbey, E. 17320 (1991).
Charbaut, S. Lachkar, L. Lebeau, S. Sia- 93. C.-Y. Kim, J. S. Chang, J. B. Doyon, T. T.
voshian, A. Sobel, and M. Knossow, Cell, 102, Bairdjr., C. A. Fierke, A. Jain, and D. W. Chris-
809 (2000). tianson, J.Am. Chem. Soc., 122,12125 (2000).
75. J. P. Griffith, J. L. Kim, E. E. Kim, M. D. Sint- 94. L. R. Scolnick, A. M. Clements, J. Liao, L.
chak, J. A. Thomson, M. J . Fitzgibbon, M. A. Crenshaw, M. Hellberg, J. May, T. R. Dean,
Fleming, P. R. Caron, K. Hsiao, and M. A. Na- and D. W. Christianson, J. Am. Chem. Soc.,
via, Cell, 82, 507 (1995). 119,850 (1997).
76. C. R. Kissinger, H. E. Parge, D. R. Knighton, 95. S. Mangani and K. Hakansson, Eur. J. Bio-
C. T. Lewis, L. A. Pelletier, A. Tempczyk, V. J. chem., 210,867 (1992).
Kalish, K. D. Tucker, R. E. Showalter, E. W.
Moomaw, L. N. Gastinel, N. Habuka, X. Chen, 96. D. Duda, C. Tu, M. Qian, P. Laipis, M. Ag-
F. Maldonado, J. E. Barker, R. Bacquet, and bandje-Mckenna, D. N. Silverman, and R.
J. E. Villafranca, Nature, 378,641 (1995). Mckenna, Biochemistry, 40,1741 (2001).
77. A. Desmyter, K. Decanniere, S. Muyldermans, 97. F. Briganti, S. Mangani, P. Orioli, A. Scoz-
and L. Wyns, In press. zafava, G. Vernaglione, and C. T. Supuran,
78. P. A. Boriack, D. W. Christianson, J. Kingery- Biochemistry, 36,10384 (1997).
Wood, and G. M. Whitesides, J. Med. Chem., 98. L. R. Scolnick and D. W. Christianson, Bio-
38,2286 (1995). chemistry, 35, 16429 (1996).
118. N. Y. Chirgadze, D. J. Sall, S. L. Briggs, D. K.
D. W. Christianson, Biochemistry, 32, 1510 Clawson, M. Zhang, G. F. Smith, and R. W.
(1993). Schevitz, Protein Sci., 9, 29 (2000).
100. J. F. Krebs, C. A. Fierke, R. S. Alexander, and 119. L. Tabernero, C. Y. Chang, S. Ohringer, W. F.
D. W. Christianso, Biochemistry, 30, 9153 Lau, E. J. Iwanowicz, W.-C. Han, T. C. Wang,
(1991). S. M. Seiler, D. G. M. Roberts, and J. S. Sack, J.
101. S. K. Nair and D. W. Christianson, J. Am. Mol. Biol., 246, 14 (1995).
Chem. Soc., 113,9455 (1991). 120. J. A. Malikayil, J . P. Burkhart, H. A. Schreu-
102. R. S. Alexander, S. K. Nair, and D. W. Chris- der, R. J. Broersmajunior, C. Tardif, L. W.
tianson, Biochemistry, 30,11064 (1991). Kutcheriii, S. Mehdi, G. L. Schatzman, B.
103. T. Stams, S. K. Nair, T. Okuyama, A. Waheed, Neises, and N. P. Peet, Biochemistry, 36,1034
W. S. Sly, and D. W. Christianson, Proc. Nat. (1997).
Acad. Sci. USA, 93,13589 (1996). 121. P. C. Weber, S. L. Lee, F. A. Lewandowski,
104. J. Vidgren, L. A. Svensson, and A. Liljas, Na- M. C. Schadt, C. W. Chang, and C. A. Kettner,
ture, 368, 354 (1994). Biochemistry, 34, 3750 (1995).
105. M. Pellegrini and D. F. Mierke, Biochemistry, 122. J. A. Huntington and C. T. Esmon, In press.
38,14775 (1999).
123. R. Krishnan, A. Tulinsky, G. P. Vlasuk, D.
106. S. Maignan, J. P. Guilloteau, S. Pouzieux, Pearson, P. Vallar, P. Bergum, T. K. Brunck,
Y. M. Choi-Sledeski, M. R. Becker, S. I. Klein, and W. C. Ripka, Protein Sci., 5, 422 (1996).
W. R. Ewing, H. W. Pauls, A. P. Spada, and V.
Mikol, J. Med. Chem., 43,3226 (2000). 124. R. A. Engh, H. Brandstetter, G. Sucher, A.
Eichinger, U. Baumann, W. Bode, R. Huber, T.
107. B. A. Katz, R. Mackman, C. Luong, K. Radika, Poll, R. Rudolph, and W. Vondersaal, Structure
A. Martelli, P. A. Sprengeler, J. Wang, H. (Lond), 4, 1353 (1996).
Chan, and L. Wong, Chem. Biol., 7,299 (2000).
125. A. Lombardi, G. Desimone, F. Nastri, S.
108. K. Kamata, H. Kawamoto, T. Honma, T.
Galdiero, R. Dellamorte, N. Staiano, C. Pe-
Iwama, and S. H. Kim, Proc. Nat. Acad. Sci.
done, M. Bolognesi, and V. Pavone, Protein
USA, 95,6630 (1998).
Sci., 8, 91 (1999).
109. H. Brandstetter, A. Kuhne, W. Bode, R. Huber,
W. Vondersaal, K. Wirthensohn, and R. A. 126. E. Guinto, S. Caccia, T. Rose, K. Futterer, G.
Engh, J. Biol. Chem., 271, 29988 (1996). Waksman, and E. Dicera, Proc. Nat. Acad. Sci.
USA, 96,1852 (1999).
110. M. Adler, D. D. Davey, G. B. Phillips, S. H.
Kim, J. Jancarik, G. Rumennik, D. L. Light, 127. B. E. Maryanoff, X. Qiu, K. P. Padmanabhan,
and M. Whitlow, Biochemistry, 39, 12534 A. Tulinsky, H. R. Almondjunior, P. Andrade-
(2000). Gordon, M. N. Greco, J. A. Kauffman, K. C.
Nicolaou, A. Liu, P. H. Brungs, and N. Fuset-
111. A. Wei, R. Alexander, J. Duke, H. Ross, S.
ani, Proc. Nat. Acad. Sci. USA, 90, 8048
Rosenfeld, and C.-H. Chang, J.Mol. Biol., 283,
(1993).
147 (1998).
112. K. Padmanabhan, K. P. Padmanabhan, A. Tu- 128. B. A. Katz, J. M. Clark, J.S. Finer-Moore, T. E.
linsky, C. H. Park, W. Bode, R. Huber, D. T. Jenkins, C. R. Johnson, M. J. Ross, C. Luong,
Blankenship, A. D. Cardin, and W. Kisiel, J. W. R. Moore, and R. M. Stroud, Nature, 391,
Mol. Biol., 232,947 (1993). 608 (1998).
113. M. G. Malkowski, P. D. Martin, J. C. Guzik, 129. J. H. Matthews, R. Krishnan, M. J. Costanzo,
and B. F. P. Edwards, Protein Sci., 6, 1438 B. E. Maryanoff, and A. Tulinsky, Biophys. J.,
(1997). 71,2830 (1996).
114. A. Vandelocht, W. Bode, R. Huber, B. F. Leb- 130. B. Bachand, M. Tarazi, Y. St-Denis, J. J. Ed-
onniec, S. R. Stone, C. T. Esmon, and M. T. munds, P. D. Winocour, L. Leblond, and M. A.
Stubbs, Embo J.,16,2977 (1997). Siddiqui, Bioorg. Med. Chem. Lett., 11, 287
115. E. Zhang and A. Tulinsky, Biophys. Chem., 63, (2001).
185 (1997). 131. J. J. Slon-Usakiewicz, J. Sivaraman, Y. Li, M.
116. H. Nar, M. Bauer, A. Schmid, J. Stassen, W. Cygler, and Y. Konishi, Biochemistry, 39,2384
Wienen, H. W. Priepke, I. K. Kauffmann, U. J. (2000).
Ries, and N. H. Hauel, Structure, 9,29 (2001). 132. R. Krishnan, E. Zhang, K. Hakansson, R. K.
117. A. Zdanov, S. Wu, J. DiMaio, Y. Konishi, Y. Li, Arni, A. Tu1inskym.S. Lim-Wilby, 0. E. Levy,
X. Wu, B. F. Edwards, P. D. Martin, and M. J. E. Semple, and T. K. Brunck, Biochemistry,
Cygler, Proteins, 17,252 (1993) 37,12094 (1998).
133. R. Krishnan, I. Mochalkin, R. Arni, and A. Tu- 152. D. W. Banner, A. D'Arcy, C. Chene, F. K. Wink-
linsky, Acta Crystallogr., Sect. D, 56, 294 ler, A. Guha, W. H. Konigsberg, Y. Nemerson,
(2000). and D. Kirchhofer, Nature, 380,41 (1996).
134. I. Mochalkin and A. Tulinsky, Acta Crystal- 153. G. Kemball-Cook, D. J. D. Johnson, E. G. D.
logr., Sect. D, 55, 785 (1999). Tuddenham, and K. Harlos, J. Struct. Biol.,
127,213 (1999).
135. R. Bone, T. Lu, C. R. Illig, R. M. Soll, and J. C.
Spurlino, J. Med. Chem., 41,2068 (1998). 154. E. Zhang, R. Stcharles, and A. Tulinsky, J.
Mol. Biol., 285,2089 (1999).
136. R. Krishnan, E. J. Sadler, and A. Tulinsky,
Acta Crystallogr., Sect. D, 56,406 (2000). 155. K.-P. Hopfner, A. Lang, A. Karcher, K. Sichler,
E. Kopetzkih. Brandstetter, R. Huber, W.
137. M. Nardini, A. Pesce, M. Rizzi, E. Casale, R. Bode, and R. A. Engh, Structure (Lond), 7,989
Ferraccioli, G. Balliano, P. Milla, P. Ascenzi, (1999).
and M. Bolognesi, J. Mol. Biol., 258, 851
156. H. Brandstetter, M. Bauer, R. Huber, P. Lol-
(1996).
lar, and W. Bode, Proc. Nat. Acad. Sci. USA,
138. M. F. Malley, L. Tabernero, C. Y. Chang, S. L. 92,9796 (1995).
Ohringer, D. G. Roberts, J. Das, and J. S. Sack, 157. M. G. Malkowski, S. L. Ginell, W. L. Smith, and
Protein Sci., 5,221 (1996). R. M. Garavito, Science, 289, 1933 (2000).
139. J. L. R. Steiner, M. Murakami, and A. Tulin- 158. D. Picot, P. J. Loll, and R. M. Garavito, Nature,
sky, J. Am. Chem. Soc., 120,597 (1998). 367, 243 (1994).
140. I. I. Mathews and A. Tulinsky, Acta Crystal- 159. P. J. Loll, D. Picot, and R. M. Garavito, Nat.
logr. D Biol. Crystallogr., 51,550 (1995). Struct. Biol., 2,637 (1995).
141. I. I. Mathews, K. P. Padmanabhan,V. Ganesh, 160. P. J. Loll, C. T. Sharkey, S. J. O'Connor, C. M.
A. Tulinsky, M. Ishil, J . Chen, C. W. Turck, Dooley, E. O'Brien, M. Devocelle, K. B. Nolan,
and S. R. Coughlin, Biochemistry, 33, 3266 B. S. Selinsky, and D. J. Fitzgerald, Mol. Phar-
(1994). macol., 60, 1407 (2001).
142. J. Vijayalakshmi, K. P. Padmanabhan, K. G. 161. E. D. Thuresson, M. G. Malkowski, K. M. Lak-
Mann, and A. Tulinsky, Protein Sci., 3, 2254 kides, C. J. Riekea.M. Mulichak, S. L. Ginell,
(1994). R. M. Garavito, and W. L. Smith, J. Biol.
143. 1. I. Mathews, K. P. Padmanabhan, A. Tulin- Chem., 276, 10358 (2001).
sky, and J. E. Sadler, Biochemistry, 33,13547 162. B. S. Selinsky, K. Gupta, C. T. Sharkey, and
(1994). P. J. Loll, Biochemistry, 40, 5172 (2001). ,
144. J. P. Priestle, J. Rahuel, H. Rink, M. Tones, 163. P. J . Loll, D. Picot, 0. Ekabo, and R. M. Gara-
and M. G. Gruetter, Protein Sci., 2, 1630 vito, Biochemistry, 35, 7330 (1996).
(1993). 164. J. R. Kiefer, J. L. Pawlitz, K. T. Moreland, R. A.
145. T. J. Rydel, A. Tulinsky, W. Bode, and R. Hu- Stegeman, J. K. Gierse, W. F. Hood, J. K.
ber, J. Mol. Biol., 221, 583 (1991). Gierse, A. M. Stevens, D. C. Goodwin, S. W.
Rowlinson, L. J. Marnett, W. C. Stallings, and
146. D. W. Banner and P. Hadvary, J. Biol. Chem., R. G. Kurumbail, Nature, 405,97 (2000).
266,20085 (1991).
165. R. G. Kurumbail, A. M. Stevens, J. K. Gierse,
147. R. K. Arni, K. Padmanabhan, K. P. Padmanab- J. J. Mcdonald, R. A. Stegeman, J. Y. Pak, D.
han, T-P. Wu, and A. Tulinsky, Biochemistry, Gildehaus, J. M. Miyashiro, T. D. Penning, K.
32,4727 (1993) Seibert, P. C. Isakson, and W. C. Stallings, Na-
148. X. Qiu, K. P. Padmanabhan, V. E. Carperos, A. ture, 384,644 (1996).
Tulinsky, T. Kline, J. M. Maraganore, and 166. Q. Zhao, S. Modi, G. Smith, M. Paine, P. D.
J. W. Fentonii, Biochemistry, 31,11689 (1992). Mcdonagh, C. R. Wolf, D. Tew, L. Y. Lian, G. C.
149. C. Eigenbrot, D. Kirchhofer, M. S. Dennis, L. Roberts, and H. P. Driessen, Protein Sci., 8,
Santell, R. A. Lazarus, J. Stamos, and M. H. 298 (1999).
Ultsch, Structure, 9, 627 (2001). 167. M. Wang, D. L. Roberts, R. Paschke, T. M.
150. A. C. W. Pike, A. M. Brzozowski, S.M. Roberts, Shea, B. S. Masters, and J. J. Kim, Proc. Nat.
0. H. Olsen, and E. Persson, Proc. Nat. Acad. h a d . Sci. USA, 94,8411 (1997).
Sci. USA, 96,8925 (1999). 168. P. A. Hubbard, A. L. Shen, R. Paschke, C. B.
151. M. S. Dennis, C. Eigenbrot, N. J. Skelton, Kasper, and J. J. Kim, J. Biol. Chem., 276,
M. H. Ultsch, L. Santell, M. L. Dwyer, M. P. 29163 (2001).
O'Connell, and R. A. Lazarus, Nature, 404, 169. A. Gangjee, A. P. Vidwans, A. Vasudevan, S. F.
465 (2000). Queener, R. L. Kisliuk, V. Cody, R. Li, N. Gal-
References
itsky, J. R. Luft, and W. Pangborn, J. Med. 189. J. W. R. Schwabe, L. Chapman, J. T. Finch,

Chem., 41,3426 (1998). and D. Rhodes, Cell, 75,567 (1993).
170. V. Cody, N. Galitsky, J. R. Luft, W. Pangborn, 190. A. K. Shiau, D. Barstad, P. M. Loria, L. Cheng,
R. L. Blakley, and A. Gangjee, J. Mol. Biol., P. J. Kushner, D. A. Agard, and G. L. Greene,
221,583 (1991). Cell, 95,927 (1998).
171. V. Cody, N. Galitsky, J. R. Luft, W. Pangborn, 191. J. W. R. Schwabe, L. Chapman, J. T. Finch,
A. Rosowsky, and R. L. Blakley, Biochemistry, and D. Rhodes, Cell, 75, 567 (1993).
36,13897 (1997). 192. A. K. Shiau, D. Barstad, P.M. Loria, L. Cheng,
172. M. A. McTigue, J. F. Davies 11,B. T. Kaufman, P. J. Kushner, D. A. Agard, and G. L. Greene,
and J. Kraut, Biochemistry, 31,7264 (1992). Cell, 95,927 (1998).
173. M. A. McTigue, J. F. Davies 11,B. T. Kaufman, 193. K. P. Wilson, M. M. Yamashita, M. D. Sint-
and J. Kraut, Biochemistry, 32,6855 (1993). chak, S. H. R0tsteinm.A. Murcko, J. Boger,
J. A. Thomson, M. J. Fitzgibbon, and M. A.
174. M. A. McTigue, J. F. Daviesii, B. T. Kaufman, Navia, Acta Crystallogr., Sect. D, 51, 511
N.-H. Xuong, and J. Kraut, In Press. (1995).
175. J. F. Davies, T. J. Delcamp, N. J. Prendergast, 194. J. Choi, J. Chen, S. L. Schreiber, and J. Clardy,
V. A. Ashford, J. H. Freisheim, and J. Kraut, Science, 273,239 (1996).
Biochemistry, 29,9467 (1990).
195. J. Liang, J. Clardy, and J. Choi, Acta Crystal-
176. W. S. Lewis, V. Cody, N. Galitsky, J. R. Luft, W. logr., Sect. D, 55, 736 (1999).
Pangborn, S. K. Chunduru, H. T. Spencer, 196. J. Liang, J. Choi, and J. Clardy, Acta Crystal-
J. R. Appleman, and R. L. Blakley, J. Biol. logr. D Biol. Crystallogr., 55, 736 (1999).
Chem., 270,5057 (1995). 197. S. W. Michnick, M. K. Rosen, T. J. Wandless,
177. J. F. Davies, D. A. Matthews, S. J. Oatley, B. T. M. Karplus, and S. L. Schreiber, Science, 252,
Kaufman, N.-H. Xuong, and J. Kraut, In press. 836 (1991)
178. C. Oefner, A. D'Arcy, and F. K. Winkler, Eur. 198. C. T. Rollins, V. M. Rivera, D. N. Woolfson, T.
J. Biochem., 174,377 (1988). Keenan, M. Hatada, S. E. Adams, L. J. An-
179. S. Liu, E. A. Neidhardt, T. H. Grossman, T. drade, D. Yaeger, M. R. Vanschravendijk, D. A.
Ocain, and J. Clardy, Structure (Lond), 8, 25 Holt, M. Gilman, and T. Clackson, Proc. Nut.
(2000). Acad. Sci. USA, 97, 7096 (2000).
199. T. Clackson, W. Yang, L. Rozamus, J. Amara,
180. I. C. Hampele, A. D'Arcy, G. E. Dale, D. Ko-
M. H. Hatada, C. T. Rollins, L. F. Stevenson,
strewa, J. Nielsenc. Oefner, M. G. Page, H. J.
S. R. Magari, S. A. Wood, N. L. Courage, X. Lu,
Schonfeld, D. Stuber, and R. L. Then, J. Mol.
F. Cerasolijunior, M. Gilman, and D. Holt,
Biol., 268, 21 (1997).
Proc. Nut. Acad. Sci. USA, 95, 10437 (1998).
181. P. Soultanas, M. S. Dillingham, S. S. Velankar, 200. P. Burkhard, P. Taylor, and M. D. Walkin-
and D. B. Wigley, J.Mol. Biol., 290,137 (1999). shaw, J. Mol. Biol., 295,953 (2000).
182. M. R. Redinbo, J. J. Champoux, and W. G. Hol, 201. J. W. Becker, J . Rotonda, J. G. Cryan, M. Mar-
Biochemistry, 39, 6832 (2000). tin, W. H. Parsons, P. J. Sinclair, G. Wieder-
183. L. Stewart, M. R. Redinbo, X. Qiu, W. G. Hol, recht, and F. Wong, J. Med. Chem., 42, 2798
and J. J. Champoux, Science, 279,1534 (1998). (1999).
184. M. R. Redinbo, L. Stewart, P. Kuhn, J. J. 202. C. Sich, S. Improta, D. J. Cowley, C. Guenet,
Champoux, and W. G. Hol, Science, 279,1504 J. P. Merly, M. Teufel, and V. Saudek, Eur.
(1998). J. Biochem., 267, 5342 (2000).
203. M. Huse, Y. G. Chen, J. Massague, and J.
185. M. Ruff, M. Gangloff, S. Eiler, S. Duclaud, J . M.
Kuriyan, Cell, 96,425 (1999).
Wurtz, and M. Dino, In press.
204. L. W. Schultz and J. Clardy, Bioorg. Med.
186. J. W. R. Schwabe, L. Chapman, J. T. Finch, D. Chem. Lett., 8, 1 (1998).
Rhodes, and D. Neuhaus, Structure (Lond), 1, 205. S. Itoh, M. T. Decenzo, D. J. Livingston, D. A.
187 (1993). Pearlman, and M. A. Navia, Bioorg. Med.
187. D. M. Tanenbaum, Y. Wang, S. P. Williams, Chem. Lett., 5, 1983 (1995).
and P. B. Sigler, Proc. Nut. Acad. Sci. USA, 95, 206. J. Liang, J. Choi, and J. Clardy, Acta Crystal-
5998 (1998). logr., Sect. D, 55, 736 (1999).
188. A. M. Brzozowski, A. C. W. Pike, Z. Dauter, 207. C. C. S. Deivanayagam, M. Carson, A. Tho-
R. E. Hubbard, T. Bonn, 0. Engstrom, L. takura, S. V. L. Narayana, and C. S. Choda-
Ohman, G. L. Greene, J.-A. Gustaffson, and M. varapu, Acta Crystallogr., Sect. D, 56, 266
Carlquist, Nature, 389, 753 (1997). (2000).
208. D. A. Holt, J. I. Luengo, D. S. Yamashita, H.-J. 227. J. Ren, R. M. Esnouf, A. L. Hopkins, E. Y.

Oh, A. L. Konialian, H.-K. Yen, L. W. Rozamus, Jones, I. Kirby, J. Keeling, C. K. Ross, B. A.
M. Brandt, M. J. Bossard, M. A. Levy, D. S. Larder, D. I. Stuart, and D. K. Stammers, Proc.
Eggleston, T. J. Stout, J. Liang, L. W. Schultz, Nut. Acad. Sci. USA, 95,9518 (1998).
and J. Clardy, J. Am. Chem. Soc., 115, 9925 228. Y. Hsiou, J. Ding, K. Das, A. D. Clark, P. L.
(1993). Boyer, P. Lewi, P. A. J. Janssen, J.-P. Kleim,
209. G. D. VanDuyne, R. F. Standaert, P. A. Kar- M. Roesner, S. H. Hughes, and E. Arnold, J.
plus, S. L. Schreiber, and J. Clardy, Science, Mol. Biol., 309,437 (2001).
252,839 (1991). 229. Y. Hsiou, K. Das, J. Ding, A. D. Clarkjunior,
210. G. D. Vanduyne, R. F. Standaert, S. L. Schrei- J. P. Kleim, M. Rosner, I. Winkler, G. Riess,
ber, and J. Clardy, J. An. Chem. Soc., 113, S. H. Hughes, and E. Arnold, J.Mol. Biol.,284,
7433 (1991). 313 (1998).
211. K. M. Fox, J. A. Dias, and P. Vanroey, Mol. 230. K. Das, J. Ding, Y. Hsiou, A. D. Clarkjunior, H.
Endocrinol., 15,378 (2001). Moereels, L. Koyrnans, K. Andries, R. Pauwels,
212. P. Storici, G. Capitani, D. Debiase, M. Moser, P. A. Janssen, P. L. Boyer, P. Clark, R. H.
R. A. John, J. N. Jansonius, and T. Schirmer, Smithjunior, M. B. Kroegersmith, C. J.
Biochemistry, 38,8628 (1999). Michejda, S. H. Hughes and E. Arnold, J. Mol.
213. D. T. Gewirth and P. B. Sigler, Nut. Struct. Biol., 264, 1085 (1996).
Biol., 2,386 (1995). 231. M. Hogberg, C. Sahlberg, P. Engelhardt, R.
214. B. F. Luisi, W. X. Xu, Z. Otwinowski, L. P. Noreen, J. Kangasmetsa, N. G. Johansson, B.
Freedman, K. R. Yamamoto, and P. B. Sigler, Oberg, L. Vrang, H. Zhang, B. L. Sahlberg, T.
Nature, 352,497 (1991). Unge, S. Lovgren, K. Fridborg, and K. Back-
bro, J. Med. Chem., 43,304 (2000).
215. N. Kunishima, Y. Shimada, Y. Tsuji, T. Sato,
M. Yamamoto, T. Kumasaka, S. Nakanishi, H. 232. J. Lindberg, S. Sigurdsson, S. Lowgren, H. 0.
Jingami, and K. Morikawa, Nature, 407, 971 Andersson, C. Sahlberg, R. Noreen, K. Frid-
(2000). borg, H. Zhang, and T. Unge, Eur. J. Biochem.,
269, 1670 (2002).
216. 0. Epp, R. Ladenstein, and A. Wendel, Eur.
J. Biochem., 133,51 (1983) 233. J. Jaeger, T. Restle, and T. A. Steitz, Embo J.,
17,4535 (1998).
217. M. Aritomi, N. Kunishima, T. Okamoto, R. Ku-
roki, Y. Ota, and K. Morikawa, Nature, 401, 234. A. L. Hopkins, J. Ren, H. Tanaka, B. Baba, M.
713 (1999). Okarnato, D. I. Stuart, and D. K. Stammers,
218. B. Lovejoy, D. Cascio, and D. Eisenberg, J.
J. Med. Chem., 42,4500 (1999).
Mol. Biol., 234, 640 (1993). 235. J. Ding, K. Das, H. Yu, S. G. Saraf~anos,A. D.
Clarkjunior, A. Jacobo-Molina, C. Tantillo,
219. T. Zink, A. Ross, K. Luers, C. Ciesler, R. Ru-
S. H. Hughes, and E. Arnold, J.Mol. Biol., 284,
dolph, and T. A. Hola, Biochemistry, 33, 8453
(1994). 1095 (1998).
220. C. P. Hill, T. D. Osslund, and D. Eisenberg, 236. S. G. Sarafianos, K. Das, C. Tantillo, A. D. Clar-
Proc. Natl. Acad. Sci. USA, 90,5167 (1993). kjr., J. Ding, J. Whitcomb, P. L. Boyer, S. H.
Hughes, and E. Arnold, Embo J., 20, 1449
221. D. Rozwarski, K. Diederichs, R. Hecht, T. (2001).
Boone, and P. A. Karplus, Proteins, 26, 304
(1996). 237. D. W. Rodgers, S. J. Gamblin, B. A. Harris, S.
Ray, J. S. Culp, B. Hellmig, D. J. Woolf, C.
222. T. Clackson, M. H. Ultsch, J. A. Wells, and Debouck, and S. C. Harrison, Proc. Natl. Acad.
A.M. Devos, J. Mol. Biol., 277, 1111 (1998). Sci. USA, 92,1222 (1995).
223. S. Atwell, M. Ultsch, A. M. Devos, and J. A.
238. J. Ding, K. Das, C. Tantillo, W. Zhang, A. D.
Wells, Science, 278, 1125 (1997). Clarkjunior, S. Jessen, X. Lu, Y. Hsiou, A. Ja-
224. M. Sundstrom, T. Lundqvist, J. Rodin, L. B. cobo-Molina, K. Andries, R. Pauwels, H. Moer-
Giebel, D. Milligan, and G. Norstedt, J. Biol. eels, L. Koymans, P. A. J. Janssen, R. H.
Chem., 271,32197 (1996). Smithjunior, M. K. Koepke, C. J. Michejda,
225. A. M. Devos, M. Ultsch, and A. A. Kossiakoff, S. H. Hughes, and E. Arnold, Structure (Lond),
Science, 255, 306 (1992). 3, 365 (1995).
226. Y. Hsiou, J. Ding, K. Das, A. D. Clarkjunior, 239. J. Ding, K. Das, H. Moereels, L. Koymans, K.
S. H. Hughes, and E. Arnold, Structure, 4,853 Andries, P. A. J. Janssen, S. H. Hughes, and E.
(1996). Arnold, Nut. Struct. Biol., 2, 407 (1995).
References
240. J. Ren, J. Milton, K. L. Weaver, S. A. Short, L. L. Kelley, A. M. Mildner, J. B. Moon, J. E.

D. I. Stuart, and D. K. Stammers, Structure, 8, Mott, V. T. Mutchler, C. S. Tomich, K. D. Wa-
1089 (2000). tenpaugh, and V. H. Wiley, Structure (Land),
241. J. Ren, C. Nichols, L. Bird, P. Chamberlain, K. 6,923 (1998).
Weaver, S. Short, D. I. Stuart, and D. K. Stam- 258. J. 0. Lee, L. A. Bankston, M. A. Arnaout, and
mers, J. Mol. Biol., 312, 795 (2001). R. C. Liddington, Structure (Land), 3, 1333
242. S. G. Saraiianos, K. Das, C. Tantil10,A. D. Clar- (1995).
kjr., J. Ding, J. Whitcomb, P. L. Boyer, S. H. 259. J. 0. Lee, P. Rieu, M. A. Amaout, and R. Lid-
Hughes, and E. Arnold, Proc. Nat. Acad. Sci. dington, Cell, 80,631 (1995).
USA, 96,10027 (1999). 260. J. Bella, P. R. Kolatkar, C. W. Marlor, J. M.
243. J. Wang, S. J. Smerdon, J. Jager, L. A. Kohl- Greve, and M. G. Rossmann, Proc. Nat. Acad.
staedt, P. A. Rice, J. M. Friedman, T. A. Steitz, Sci. USA, 95,4140 (1998).
Proc. Natl. Acad. Sci. USA, 91, 7242 (1994). 261. J. M. Casasnovas, T. Stehle, J. H. Liu, J. H.
244. M. D. Sintchak, M. A. Fleming, 0. Futer, S. A. Wang, and T. A. Springer, Proc. Nat. Acad. Sci.
Raybuck, S. P. Chambers, P. R. Caron, M. A. USA, 95,4134 (1998).
Murcko, and K. P. Wilson, Cell, 85,921 (1996). 262. P. R. Kolatkar, J. Bella, N. H. Olson, C. Bator,
245. T. D. Colby, K. Vanderveen, M. Strickler, G. D. T. S. Baker, and M. G. Rossmann, Embo J.,18,
Markham, and B. M. Goldstein, Proc. Nat. 6249 (1999).
Acad. Sci. USA, 96, 3531 (1999). 263. W. Klaus, B. Gsell, A. M. Labhardt, B. Wipf,
246. L. G. Laajoki, G. L. Francis, J. C. Wallace, J. A. and H. Senn, J. Mol. Biol., 274,661 (1997).
Carver, and M. A. Keniry, J. Biol. Chem., 275, 264. R. Radhakrishnan, L. J. Walter, A. Hruza, P.
10009 (2000). Reichert, P. P. Trotta, T. L. Nagabhushan, and
247. A. Sato, S. Nishimura, T. ohkubo,Y. Kyogoku, M. R. Walter, Structure, 4, 1453 (1996)
S. Koyama, M. Kobayashi, T. Vasuda, and Y. 265. D. J. Thiel, M.-H. Ledu, R. L. Walter, A.
Kobayashi, Znt. J. Pept. Protein Res., 41, 433 D'Arcy, C. Chene, M. Fountoulakis, G. Ga-
(1993). rotta, F. K. Winkler, and S. E. Ealick, Struc-
248. F. F. Vajdos, M. Ultsch, M. L. Schaffer, K. D. ture Fold Des., 8,927 (2000).
Deshayes, J. Liu, N. J. Skelton, and A. M. De- 266. M. Randal and A. A. Kossiakoff, Structure, 9,
vos, Biochemistry, 40, 11022 (2001). 155 (2001).
249. E. Dewolf, R. Gill, S. Geddes, J. Pitts, A. 267. A. Landar, B. Curry, M. H. Parker, R. Digia-
Wollmer, and J. Grotzinger, Protein Sci., 5, como, S. R. Indelicato, T. L. Nagabhushan, G.
2193 (1996). Rizzi, and M. R. Walter, J.Mol. Biol., 299, 169
250. R. M. Cooke, T. S. Harvey, and I. D. Campbell, (2000).
Biochemistry, 30, 5484 (1991). 268. S. E. Ealick, W. J. Cook, S. Vijay-Kumar, M.
251. T. P. J. Garrett, N. M. Mckern, M. Lou, M. J. Carson, T. L. Nagabhushan, P. P. Trotta, and
Frenkel, J. D. Bentley, G. 0. Lovrecz, T. C. C. E. Bugg, Science, 252,698 (1991).
Elleman, L. J. Cosgrove, and C. W. Ward, Na- 269. B. J. Graves, M. H. Hatada, W. A. Hendrick-
ture, 394,395 (1998). son, J. K. Miller, V. S. Madison, and Y. Satow,
252. K. Parang, J. H. Till, A. J. Ablooglu, R. A. Ko- Biochemistry, 29,2679 (1990).
hanski, S. R. Hubbard, and P. A. Cole, Nat. 270. G. P. A. Vigers, D. J. Dripps, C. K. Edwards,
Struct. Biol., 8, 37 (2001). and B. J. Brandhuber, Structure, 275, 36927
253. J. H. Till, A. J. Ablooglu, M. Frankel, S. M. (2000).
Bishop, R. A. Kohanski, and S. R. Hubbard, 271. H. Schreuder, C. Tardif, S. Trump-Kallmeyer,
J. Biol. Chem., 276, 10049 (2001). A. Soffientini, E. Sarubbi, A. Akeson, T. Bow-
254. S. R. Hubbard, Embo J.,16, 5572 (1997). lin, S. Yanofsky, and R. W. Barrett, Nature,
255. S. R. Hubbard, L. Wei, L. Ellis, and W. A. Hen- 386, 194 (1997).
drickson, Nature, 372,746 (1994). 272. G. P. Vigers, L. J. Anderson, P. Caffes, and
256. A. M. Torres, B. E. Forbes, S. E. Aplin, J. C. B. J. Brandhuber, Nature, 386, 190 (1997).
Wallace, G. L. Francis, and R. S. Norton, J. 273. A. Zdanov, C. Schalk-Hihi, S. Menon, K. W.
Mol. Biol., 248,385 (1995). Moore and A. Wlodawer, J.Mol. Biol., 268,460
257. E. T. Baldwin, R. W. Sarver, G. L. Bryantjun- (1997).
ior, K. A. Currym.B. Fairbanks, B. C. Finzel, 274. A. Zdanov, C. Schalk-Hihi, and A. Wlodawer,
R. L. Garlick, R. L. Heinrikson, N. C. Horton, Protein Sci., 5, 1955 (1996).
275. A. Zdanov, C. Schalk-Hihi, A. Gustchina, M. 296. W. Somers, M. Stahl, and J. S. Seehra, Embo
Tsang, J . Weatherbee, and A. Wlodawer, J., 16, 989 (1997).
Structure, 3,591 (1995). 297. K. Rajarathnam, I. Clark-Lewis, and B. D.
276. K. Josephson, N. J. Logsdon, and M. R. Walter, Sykes, Biochemistry, 34, 12983 (1995).
Immunity, 15, 35 (2001). 298. C. Eigenbrot, H. B. Lowman, L. Chee, and
277. M. R. Walter and T. L. Nagabhushan, Bio- D. R. Artis, Proteins, 27, 556 (1997).
chemistry, 34, 12118 (1995). 299. N. J. Skelton, C. Quan, D. Reilly, and H. Low-
278. C. Yoon, S. C. Johnston, J. Tang, M. Stahl, J. F. man, Structure Fold Des., 7, 157 (1999).
Tobin, and W. S. Somers, Embo J., 19, 3530 300. N. Gerber, H. Lowman, D. R. Artis, and C.
(2000). Eigenbrot, Proteins: Struct., Funct., Genet.,
279. E. Z. Eisenmesser, D. A. Horita, A. S. Altieri, 38,361 (2000).
and R. A. Byrd, J. Mol. Biol., 310,231 (2001). 301. H. Sticht, M. Auer, B. Schmitt, J. Besemer, M.
280. H. R. Mott, B. S. Baines, R. M. Hall, R. M. Horcher, T. Kirsch, J. D. Lindley, and P.
Cooke, P. C. Driscoll, M. P. Weir, and I. D. Roesch, Eur. J. Biochem., 235, 26 (1996).
Campbell, J. Mol. Biol., 248, 979 (1995). 302. E. T. Baldwin, I. T. Weber, R. St. Charles, J.-C.
281. D. B. McKay, Science, 257,412 (1992). Xuan, E. Appella, M. Yamada, K. Matsushima,
282. Y. Feng, B. K. Klein, and C. A. Mcwherter, J. B. F. P. Edwards, G. M. Clore, A. M. Gronen-
Mol. Biol., 259,524 (1996). born, and A. Wlodawer, Proc. Nut. Acad. Sci.
283. T. Mueller, F. Oehlenschlaeger, and M. Bueh- USA, 88,502 (1991).
ner, J. Mol. Biol., 247, 360 (1955). 303. G. M. Clore, E. Appella, M. Yamada, K. Mat-
284. T. Hage, W. Sebald, and P. Reinemer, Cell, 97, sushima, and A. M. Gronenborn, Biochemis-
271 (1999). try, 29,1689 (1990).
285. M. Hulsmeyer, C. Scheufler, and M. K. Dreyer, 304. M. M. G. M. Thunnissen, P. N. Nordlund, and
Acta Crystallogr., Sect. D, 57, 1334 (2001). J. Z. Haeggstrom, Nut. Struct. Biol., 8, 131
(2001).
286. C. Redfield, L. J. Smith, J. Boyd, G. M. P. Law-
rence, R. G. Edwards, C. J. Gershater, R. A. G. 305. X. Weng, H. Luecke, I. S. Song, D. S. Kang,
Smith, and C. M. Dobson, J. Mol. Biol., 238,23 S-H. Kim, and R. Huber, Protein Sci., 2, 448
(1994). (1993).
287. R. Powers, D. S. Garrett, C. J. March, E. A. 306. A. Rosengarth, V. Gerke, and H. Luecke, J.
Frieden, A. M. Gronenborn, and G. M. Clore, Mol. Biol., 306,489 (2001).
Science, 256, 1673 (1992). 307. M. Tegoni, S. Spinelli, M. Verhoeyen, P. Davis,
288. T. Mueller, T. Dieckmann, W. Sebald, and H. and C. Carnbilla, J. Mol. Biol., 289, 1375
Oschkinat, J. Mol. Bio1.237,423 (1994) (1999).
289. T. Mueller, T. Dieckmann, W. Sebald, and H. 308. H. Wu, J. W. Lustbader, Y. Liu, R. E. Canfield,
Oschkinat, J. Mol. Biol., 237,423 (1994). and W. A. Hendrickson, Structure, 2, 545
(1994).
290. L. J. Smith, C. Redfield, J. Boyd, G. M. P. Law-
rence, R. G. Edwards, R. A. G. Smith, and C. M. 309. A. J. Lapthorn, D. C. Harris, A. Littlejohn,
Dobson, J. Mol. Biol., 224,899 (1992). J. W. Lustbader, R. E. Canfield, K. J. Machin,
F. J. Morgan, and N. W. Isaacs, Nature, 369,
291. M. R. Walter, W. J. Cook, B. G. Zhao, R. Cam- 455 (1994).
eronjunior, S. E. Ealick, R. L. Walterjunior, P.
Reichert, T. L. Nagabhushan, P. P. Trotta, and 310. A. Bohm, J. Pandit, J. Jancarik, R. Halenbeck,
C. E. Bugg, J. Biol. Chem., 267,20371 (1992). K. Koths, and S.-H. Kim, Science, 258, 1358
(1992).
292. A. Wlodawer, A. Pavlovsky, and A. Gustchina,
311. M. J. Jedrzejas, S. Singh, W. J . Brouillette,
Febs Lett., 309,59 (1992).
W. G. Laver, G. M. Air, and M. Luo, J. Mol.
293. R. Powers, D. S. Garrett, C. J. March, E. A. Biol., 267,584 (1997).
Frieden, A. M. Gronenborn, and G. M. Clore, 312. N. R. Taylor,A. Cleasby, 0.Singh, T. Skarzyn-
Biochemistry, 32,6744 (1993). ski, A. J. Wonacott, P. W. Smith, S. L. Sollis,
294. M. V. Milburn, A. M. Hassell, M. H. Lambert, P. D. Howes, P. C. Cherry, R. Bethell, P. Col-
S. R. Jordan, A. E. I. Proudfoot, P. Graber, and man, and J. Varghese, J. Med. Chem., 41,798
T. N. C. Wells, Nature, 363, 172 (1993). (1998).
295. G. Y. Xu, H. A. Yu, J. Hong, M. Stahl, T. Mc- 313. M. J. Jedrzejas, S. Singh, W. J . Brouillette,
donagh, L. E. Kay, and D. A. Cumming, J. Mol. W. G. Laver, G. M. Air, and M. Luo, Biochem-
Biol., 268,468 (1997). istry, 34, 3144 (1995).
i References
3
314. W. P. Burmeister, B. Henrissat, C. Bosso, S. 333. J. M. Gulbis, M. Zhou, S. Mann, and R. Mac-
Cusack, and R. W. H. Ruigrok, Structure, 1,19 kinnon, Science, 289, 123 (2000).
(1993). 334. D. L. Minorjr., Y.-F. Lin, B. C. Mobley, A. Ave-
315. J. B. Finley, V. R. Atigadda, F. Duarte, J. J. lar, Y. N. Jan1.Y. Jan, and J. M. Berger, Cell,
Zhao, W. J. Brouillette, G. M. Air, and M. Luo, 102, 657 (2000).
J. Mol. Biol., 293, 1107 (1999). 335. J. L. Oberfield, J. L. Collins, C. P. Holmes,
316. C. L. White, M. N. Janakiraman, W. G. Laver, D. M. Goreham, J. P. Cooper, J. E. Cobb, J. M.
C. Philippon, A. Vasella, G. M. Air, and M. Luo, Lenhard, E. A. Hull-Ryde, C. P. Mohr, S. G.
J. Mol. Biol., 245,623 (1995). Blanchard, D. J. Parks, L. B. Moore, J. M. Leh-
317. W. P. Burmeister, R. W. H. Ruigrok, and S. mann, K. Plunket, A. B. Miller, M. V. Milburn,
Cusack, Embo J.,11,49 (1992). S. A. Kliewer, and T. M. Wilson, Proc. Nut.
Acad. Sci. USA, 96,6102 (1999).
318. S. A. Monks, G. Karagianis, G. J. Howlett, and
R. S. Norton, J. Biomol. NMR, 8,379 (1996). 336. R. T. Nolte, G. B. Wisely, S. Westin, J. E. Cobb,
M. H. Lambert, R. Kurokawa, M. G. Rosenfeld,
319. R. Bader, A. Bettio, A. G. Beck-Sickinger, and
T. M. Willson, C. K. Glass, and M. V. Milburn,
0. Zerbe, J. Mol. Biol., 305,307 (2001).
Nature, 395, 137 (1998).
320. C. Cabrele, M. Langer, R. Bader, H. A. Wie-
337. R. T. Gampejr.,V. G. Montana, M. H. Lambert,
land, H. N. Doods, 0. Zerbe, and A. G. Beck-
A. B. Miller, R. K. Bledsoe, M. V. Milburn, S. A.
Sickinger, J. Biol. Chem., 275,36043 (2000).
Kliewer, T. M. Willson, and H. E. Xu, Mol. Cell,
321. G. Seidel, W. Schaefer, A. Esswein, E. Hof- 5, 545 (2000).
mann, and P. Roesch, In Press.
338. J. Uppenberg, C. Svensson, M. Jaki, G. Bertils-
322. Z. Chen, P. Xu, J.-R. Barbier, G. Willick, and F. son, L. Jendeberg, and A. Berkenstam, J. Biol.
Ni, Biochemistry, 39, 12766 (2000). Chem., 273,31108 (1998).
323. U. C. Mam, K. Adermann, P. Bayer, W.-G. 339. S. P. Williams and P. B. Sigler, Nature, 393,
Forssmann, and P. Roesc, Biochem. Biophys. 392 (1998).
Res. Comm., 267,213 (2000). 340. W. Somers, M. Ultsch, A. M. Devos, and A. A.
324. L. Jin, S. L. Briggs, S. Chandrasekhar, N. Y. Kossiakoff, Nature, 372,478 (1994).
Chirgadze, D. K. Clawson, R. W. Schevitz, D. L. 341. P. A. Elkins, H. W. Christinger, Y. Sandowski,
Smiley, A. H. Tashjian, and F. Zhang, J. Biol. E. Sakal, A. Gertler, A. M. Devos, and A. A.
Chem., 275,27238 (2000). Kossiakoff, Nut. Struct. Biol., 7, 808 (2000).
325. U. C. Mam, S. Austermann, P. Bayer, K. Ader- 342. F. Rastinejad, T. Wagner, Q. Zhao, and S. Kho-
mann, A. Ejcharth. Sticht, S. Walter, F.-X. rasanizadeh, Embo J.,19, 1045 (2000).
Schmid, R. Jaenicke, W.-G. Forssmann and P.
343. B. P. Klaholz, A. Mitschler, M. Belema, C. Zusi,
Roesch, J. Biol. Chem., 270, 15194 (1995).
and D. Moras, Proc. Nut. Acad. Sci. USA, 97,
326. U. C. Marx, Strukturen VerschiedenerParathor- 6322 (2000).
monfragmente in Loesung, University of Bay- 344. J. P. Renaud, N. Rochel, M. Ruff, V. Vivat, P.
reuth Thesis, Bayreuth, 1996. Chambon, H. Gronemeyer, and D. Moras, Na-
327. G. Y. Xu, T. Mcdonagh, H. A. Yu, E. A. Nalef- ture, 378, 681 (1995).
ski, J. D. Clark, and D. A. Cumming, J. Mol. 345. B. P. Klaholz, J. P. Renaud, A. Mitschler, C.
Biol., 280,485 (1998). Zusi, P. Chambon, H. Gronemeyer, and D. Mo-
328. 0. Perisic, S. Fong, D. E. Lynch, M. Bycroft, ras, Nut. Struct. Biol., 5, 199 (1998).
and R. L. Williams, J. Biol. Chem., 273, 1596 346. W. Bourguet, V. Vivat, J. M. Wurtz, P. Cham-
(1998). bon, H. Gronemeyer, and D. Moras, Mol. Cell,
329. A. Dessen, J. Tang, H. Schmidt, M. Stahl, J. D. 5, 289 (2000).
Clark, J. Seehra, and W. S. Somers, Cell, 97, 347. B. P. Klaholz, A. Mitschler, and D. Moras, J.
349 (1999). Mol. Biol., 302, 155 (2000).
330. A. Kreusch, P. J. Pfaffinger, C. F. Stevens, and 348. R. M. A. Knegtel, M. Katahira, J. G. Schilthuis,
S. Choe, Nature, 392,945 (1998). A. M. J. J. Bonvin, R. Boelens, D. Eib, P. T.
331. S. J. Cushman, M. H. Nanao, A. W. Jahng, D. Vandersaag, and R. Kaptein, J. Biomol. NMR,
Derubeis, S. Choe, and P. J. Pfaffinger, Nut. 3, l(1993).
Struct. Biol., 7,403 (2000). 349. W. Bourguet, M. Ruff, P. Chambon, H. Grone-
332. K. A. Bixby, M. H. Nanao, N. V. Shen, A. meyer, and D. Moras, Nature, 375,377 (1995).
Kreusch, H. Bellamy, P. J. Pfaffinger, and S. 350. F. Rastinejad, T. Perlmann, R. M. Evans, and
Choe, Nut. Struct. Biol., 6 , 3 8 (1998). P. B. Sigler, Nature, 375, 203 (1995).
351. S. M. Holmbeck, M. P. Foster, D. R. Casimiro, 365. J. H. Naismith, T. Q. Devine, B. Brandhuber,

D. S. Sem, H. J. Dyson, and P. E. Wright, J. and S. R. Sprang, J. Biol. Chem., 270, 13303
Mol. Biol., 281,271 (1998). (1995).
352. R. T. Gampejr., V. G. Montana, M. H. Lambert, 366. J. H. Naismith, T. Q. Devine, H. Khono, and
G. B. Wisely, M. V. Milburn, and H. E. Xu, S. R. Sprang, Structure, 4, 1251 (1996).
Genes Dev., 14, 2229 (2000). 367. D. W. Banner, A. D'Arcy, W. Janes, R. Gentz,
H-J. Schoenfeld, C. Broger, H. Loetscher, and
353. P. F. Egea, A. Mitschler, N. Rochel, M. Ruff, P. W. Lesslauer, Cell, 73,431 (1993).
Chambon, and D. Moras, Embo J., 19, 2592
368. G. Tocchini-Valentini, N. Rochel, J. M. Wurtz,
(2000).
A. Mitschler, and D. Moras, Proc. Nat. Acad.
354. Q. Zhao, S. A. Chasse, S. Devarakonda, M. L. Sci. USA, 98, 5491 (2001).
Sierk, and B. A. Rastinejad, J. Mol. Biol., 296, 369. N. Rochel, J. M. Wurtz, A. Mitschler, B. Kla-
509 (2000). holz, and D. Moras, Mol. Cell, 5, 173 (2000).
355. D. R. Hall, J. M. Hadden, G. A. Leonard, S. 370. S. Vos, R. J. Parry, M. R. Burns, J. Dejersey,
Bailey, M. Neu, M. Winn, and P. F. Lindley, and J. L. Martin, J.Mol. Biol., 282,875 (1998).
Acta Crystallogr., Sect. D, 58, 70 (2002). 371. S. Vos and J. De Jersey, Biochemistry, 36,4125
356. Z. Zhang, R. Zhang, A. Joachimiak, J. Schless- (1997).
inger, and X. Kong, Proc. Nat. Acad. Sci. USA, 372. S. Vos, R. J. Parry, M. R. Burns, J. Dejersey,
97, 7732 (2000). and J. L. Martin, J.Mol. Biol., 282,875 (1998).
357. X. Jiang, 0. Gurel, E. A. Mendiaz, G. W. 373. H. A. Lewis, Structure (Camb), 9, 527-537
Steams, C. L. Clogston, H. S. Lu, T. D. Oss- (2001).
lund, R. S. Syed, K. E. Langley, and W. A. Hen- 374. C. Freiberg, Drug Discov. Today, 6, S72-S80
drickson, Embo J.,19,3192 (2000). (2001).
358. J. N. Charnpness, M. S. Bennett, F. Wien, R. 375. J. M. Sauder, J. W. Arthur, and R. L. Dun-
Visse, W. C. Summers, P. Herdewijn, E. De- brack, Proteins, 40, 6-22 (2000).
clercq, T. Ostrowski, R. L. Jarvest, and M. R. 376. S. W. Muchmore, Nature, 381,335-341 (1996).
Sanderson, Proteins, 32, 350 (1998). 377. K. A. Denessiouk, A. I. Denesyuk, J. V. Leh-
359. M. S. Bennett, F. Wien, J. N. Champness, T. tonen, T. Korpela, and M. S. Johnson, Pro-
Batuwangala, T. Rutherford, W. C. Summers, teins, 35, 250-261 (1999).
H. Sun, G. Wright, and M. R. Sanderson, Febs 378. M. Teplova, Protein Sci., 9,2557-2566 (2000).
Lett., 443, 121 (1999). 379. T. J. Boggon, W. S. Shan, S. Santagata, S..C.
360. J. N. Champness, M. S. Bennett, F. Wien, R. Myers, and L. Shapiro, Science, 286,
Visse, W. C. Summers, P. Herdewijn, E. De- 2119-2125 (1999).
clerq, T. Ostrowski, R. L. Jarvest, and M. R. 380. P. W. Kleyn, Cell, 85,281-290 (1996).
Sanderson, Proteins Struct. Funct. Genet., 32,
381. K. Noben-Trauth, J. K. Naggert, M. A. North,
350 (1998). and P. M. Nishina, Nature, 380, 534-538
361. K. Wild, T. Bohner, G. Folkers, and G. E. (1996).
Schulz, Protein Sci., 6,2097 (1997). 382. S. Santagata, Science, 292,2041-2050 (2001).
362. J . Vogt, R. Perozzo, A. Pautsch, A. Prota, P. 383. W. Eisenreich, Chem. Biol., 5, R221-R233
Schelling, B. Pilger, G. Folkers, L. Scapozza, (1998).
and G. E. Schulz, Proteins Struct. Funct. 384. R. Sanchez, Nat. Struct. Biol., 7, 986-990
Genet., 41, 545 (2000). (2000).
363. C. Wurth, U. Kessler, J . Vogt, G. E. Schulz, G. 385. A. Sali, Nat. Struct. Biol., 5,1029-1032 (1998).
Folkers, and L. Scapozza, Protein Sci., 10, 63 386. A. Fiser, R. K. Do, and A. Sali, Protein Sci., 9,
(2001). 1753-1773 (2000).
364. A. Prota, J. Vogt, B. Pilger, R. Perozzo, C. 387. K. T. Simons, C. Strauss, and D. Baker, J.Mol.
Wurth, V. Marquez, P. Russ, G. E. Schulz, G. Biol., 306, 1191-1199 (2001).
Folkers, and L. Scapozza, Biochemistry, 39, 388. R. Sanchez, Nucleic Acids Res., 28, 250-253
9597 (2000). (2000).
AMR and Drug Discovery
i
DAVID J. CFNK
RICHARD J. CLARK
Institute for Molecular Bioscience
Australian Research Council Special Research
Centre for Functional and Applied Genomics
University of Queensland
Brisbane, Australia
Contents
1 Introduction, 508
1.1 Overview of Drug Development, 509
1.2 Scope of Chapter, 510
1.3 Principles of NMR Spectroscopy, 510
1.4 Instrumentation, 514
1.5 Applications of NMR in Drug Design
and Discovery, 516
2 Ligand-Based Design, 517
2.1 Structure Elucidation, 517
2.1.1 Structure Elucidation of Natural
Products, 517
2.1.2 Structure Determination of Bioactive
Peptides, 518
2.1.2.1 NMR Structure of Ziconotide: A
Novel Treatment for Pain, 518
2.1.2.2 Endothelin as a Lead in Ligand-
Based Design, 523
2.1.3 Instrumental Advances and their
Impact on Structure Elucidation, 524
2.2 Conformational Analysis, 525
2.3 Charge State, 526
2.4 Tautomeric Equilibria, 526
2.5 Ligand Dynamics: Line-Shape
and Relaxation Data, 528
2.6 Pharmocophore Modeling: Conformations of
a Set of Ligands, 531
2.7 Limitations of Analog-Based Design, 532
2.8 Conformation of Bound Ligands:
Transferred NOES, 532
Burger's Medicinal Chemistry and Drug Discovery 3 Receptor-Based Design, 532
Sixth Edition, Volume 1: Drug Discovery 3.1 Macromolecular Structure Determination,
Edited by Donald J. Abraham 533
ISBN 0471-27090-3 O 2003 John Wiey & Sons, Inc. 3.1.1 Overview of Approach, 533
507
NMR and Drug Discovery
3.1.2 Sample Requirements and Assignment 3.2.4.2Immunophilins: Studies of

Protocols, 534 FK506 Analog Binding to
3.1.3Recent Developments, 534 FKBP, 552
3.1.4 Dynamics, 534 3.2.4.3Matrix Metalloproteinases, 555
3.1.5 Nucleic Acid Structures, 535 3.2.4.4 Dihydrofolate Reductase, 557
3.1.6 Challenges for the Future: Membrane- 3.2.4.5 H N Protease, 559
Bound Proteins, 535 4 NMR Screening, 562
3.2 Macromolecule-Ligand Interactions, 535 4.1 Methods, 562
3.2.1 Overview, 535 4.1.1 Chemical-Shift Perturbation, 562
4.1.2 Magnetization Transfer Experiments,
3.2.2Influence of Kinetics and NMR
568
Timescales, 536
4.1.3 Molecular Diffusion, 570
3.2.2.1Slow Exchange, 538
4.1.4 Relaxation, 571
3.2.2.2Fast Exchange, 539 4.1.5 NOE, 571
3.2.2.3Intermediate Exchange, 540 4.1.6 Spin Labels, 573
3.2.3 NMR Techniques, 543 4.2 Practical Considerations, 574
3.2.3.1Chemical-Shift Mapping, 543 4.2.1 Screening Approach, 574
3.2.3.2NMR Titrations, 545 4.2.2 Library Design, 574
3.2.3.3Isotope Editing and Filtering, 4.2.2.1 Ligand Properties, 574
545 4.2.2.2 Mixture Design, 576
3.2.3.4NOE Docking, 545 4.2.3 Hardware and Automation, 576
3.2.4Selected Examples, 547 5 Conclusions, 577
3.2.4.1 DNA-Binding Drugs, 547 6 Acknowledgments, 577
1 INTRODUCTION The aim of this chapter is to describe how

NMR spectroscopy is used in modern drug dis-
NMR spectroscopy has been widely used as a covery. The term discovery is used generically
front-line tool in the pharmaceutical industry throughout to include processes that involve
for several decades. In the past, the main use rational drug design as well as those that in-
of NMR was in the structural characterization volve discovery through NMR screening. The
of organic molecules synthesized in the course latter is a relatively recent development and
of medicinal chemistry programs. Indeed, me- refers to the use of NMR as a tool to screen a
dicinal chemists have long regarded NMR as compound library, to identify a molecule or
the premier tool to be used in the structure molecules that bind to a chosen macromolecu-
characterization process, to confirm the iden- lar target. Of course, the distinction between
tity of intermediates or to determine the "design" and "discovery" is often quite
conformation of lead molecules. Over the last blurred. This is nowhere more evident than in
decade major developments in both instru- the recently developed SAR-by-NMR ap-
mentation and methods have resulted in this proach (I),in which the discovery of several
traditional use of NMR in the pharmaceutical weakly bound ligands from a screening pro-
industry being augmented by a range of excit- gram is intimately linked to a design process
ing new applications. Two of the most impor- to chemically join them. SAR-by-NMR repre-
tant of these are the use of NMR in structure- sents an exciting new technique for lead gen-
based drug design and in screening for drug eration and is described in more detail later in
discovery. Both applications differ from the this chapter.
traditional use of NMR in that now the mac- Drug designtdiscovery represents only the
romolecular binding partner of the medicinal first stage in the whole drug development pro-
compound is included in the mixture to be an- cess. As is clear from the other chapters in this
alyzed; that is, contemporary applications of volume, there are many other steps that need
NMR in drug discovery are predominantly fo- to be made once a lead molecule has been de-
cused on the interaction between drug mole- signed or discovered. Although other stages of
cules and their macromolecular targets. the process, including lead optimization, tox-
1 Introduction
Cycle A
Figure 12.1. Overview of the drug

development process and summary
of various types of NMR experiments
that contribute at different stages.
:ity studies, preclinical investigations, and 1.1 Overview of Drug Development

linical monitoring, do not fall within the
cope of this chapter, it is worth mentioning To give an overview of the breadth of applica-
hat NMR spectroscopy contributes signifi- tions of NMR, Fig. 12.1 summarizes the drug
sntly across the whole spectrum of drug de- development process and indicates the role of
elopment, right through into the clinical NMR at various stages. Drug development is
omain. For example, NMR spectroscopy an iterative process and can be simplified by
as been applied for the detection of drug representing it with two interconnected cycles
letabolites in biological fluids and magnetic of activity. Cycle A involves the design or dis-
sonance imaging, which is based on the covery of an initial lead followed by its synthe-
lndamental principles of NMR, plays an sis and bioassay. Based on the initial assay
nportant role in clinical investigations. ~tis results there may be several loops around this
lcreasingly being used to monitor the func- cycle before commencing the in vivo studies
anal outcomes of drug therapy. We briefly represented in Cycle B. At this stage consider-
ldress these broader applications of NMR ation of bioavailability, metabolism, and phar-
?forereturning to the main topic of NMR in macokinetic profiles must be made and this
rug discovery. may involve synthetic modifications of the
lead molecules to improve their druglike prop- egories 1 and 2 may be classified as structure-
erties. Again, several loops around Cycle B based design, whereas category 3 relates to
may be necessary before one or more develop- drug discovery.
ment candidates are identified. Ultimately one
or two of these development candidates are 1.2 Scope of Chapter
identified for progression through clinical tri- Our aim is to give a broad overview on the use
als. of NMR as a tool in structure-based design and
As indicated in Fig. 12.1, it is convenient to in screening approaches to drug discovery.
envisage five broad categories of NMR experi- The chapter also contains a description of the
ments that may contribute to this overall drug relevant NMR methods, which are highlighted
development process. by illustrative examples. We briefly describe
the instrumentation required for such studies
1. Small molecule, or ligand-based, NMR. and emerging trends in the field are discussed.
This involves studies of drugs and drug This includes developments in the field of drug
leads, typically organic molecules with a discovery in the postgenomic era that are
molecular weight <500 Da, but also includ- likely to have an impact on the way in which
ing small proteins of up to a few kDa. These NMR is used, as seen for example by the recent
studies may be used to characterize natural interest in structural genomics programs.
products or synthetic drug leads, or to de- NMR instrument developments are also de-
termine their conformation. scribed. For example, recent advances in cryo-
probe technology promise to dramatically in-
2. Macromolecular NMR. This involves stud- crease the sensitivity of NMR spectroscopy
ies of the macromolecular targets of drugs, and increase its application across the phar-
typically to determine their three-dimen- maceutical industry. Finally, a section outlin-
sional structure and/or the nature of their ing some of the practical considerations in
complexes with ligands. structure-based design and screening is in-
3. NMR screening. This involves the use of cluded. Future directions for the field are men-
NMR to identify lead molecules that bind tioned throughout the discussion.
to a macromolecular target. These studies There have been a number of reviews that
typically involve both small molecules and describe applications of NMR in drug discov-
macromolecules and seek to detect the ery or screening and the reader is referred to
presence of binding interactions between these for additional information (2-15). Re-
them. cent books covering aspects of NMR in drug
4. Metabolic NMR. This involves studies of design are also available (16, 17).
endogenous molecules whose levels may be It is assumed that most readers will be fa-
modified by drug treatment, or studies of miliar with the basic principles of NMR. How-
the metabolites of drugs themselves. ever, for completeness and to define some of
5. NMR imaging. Such studies provide ana- the terms that will be used in this chapter it is
tomical information in an animal model or useful to give a brief overview of the princi-
human patient. This includes, for example, ples. Excellent texts are available to provide
monitoring the size of plaques or tumors in more detail (18, 19).
the brains of Alzheimer's or cancer pa-
tients, respectively, during drug therapy. 1.3 Principles of NMR Spectroscopy
The underlying basis of NMR is that when
It is clear from these descriptions that nuclei with a nonzero spin quantum number
NMR covers a wide range ?f applications in are placed in a magnetic field they take up one
the pharmaceutical i n d u s t q although for the of a discrete number of quantized states. The
remainder of this chapter we will focus on application of radiofrequency (rf)energy pro-
NMR in the drug design/discovery phase of duces transitions between these states. The
drug development, that is, on categories 1-3 of energy changes associated with these transi-
the preceding list. Together, the studies in cat- tions are detected as small voltages induced in
1 Introduction
rf pulse
I
Sample tube FID
inside coil
Figure 12.2. Overview of the principles of NMR spectroscopy. Polarization of nuclear spins by a
magnetic field is perturbed by application of a radiofrequency (rf) pulse. The resultant signal is
Fourier transformed, to yield a spectrum reflecting the number and environments of nuclei in the
sample.
a receiver coil that are subsequently amplified, nuclei in a molecule is such that chemical
digitized, and processed to yield spectra, as il- shifts range up to only a few hundred parts per
lustrated in Fig. 12.2. The most commonly million (ppm)of the base resonance frequency
studied NMR-active nucleus is the proton, 'H, for 13C and 15N. For 'H the range is smaller
but in modern NMR experiments 2H, 13C.and still, covering only about 10 ppm. Despite this
15N nuclei are also very important. For these small range, chemical shifts provide valuable
heteronuclei it is common to isotopically en- diagnostic information on the environment of
rich the sample because of their low natural the nucleus giving rise to the signal.
abundance. This is particularly important for The chemical shift is an extremely impor-
studies of proteins, as will become apparent tant NMR parameter but there are many
later in this chapter. Occasionally, other nu- other parameters that can be discerned from
clei find specialist applications. For example, NMR spectra. Indeed, NMR is unique among
in fluorine-containing drugs it is possible to many forms of spectroscopy in that there are
use sensitive lSF-NMR signals to monitor in- so many parameters associated with a spec-
teraction with target proteins, as described trum other than just peak intensity and fre-
later in this chapter. quency. These include coupling constants,
In modern spectrometers the rf energy is which provide information on local conforma-
supplied in the form of short pulses (typically, tions and also on molecular connectivities; nu-
-10 ps) that simultaneously excite all nuclei clear Overhauser effects (NOES),which pro-
of a given isotope type (e.g., all protons or all vide information on internuclear distances:
13Cnuclei). Nuclei of a given isotope that are and relaxation parameters, which provide in-
in different chemical environments by virtue formation on molecular dynamics. Table 12.1
of their atomic locations in the molecule have summarizes the main NMR parameters that
slightly different resonance frequencies and may be measured and highlights their applica-
lead to different oscillating voltages in the re- tions in the drug discovery process.
ceiver coil. The resultant combined signal, The following sections of this chapter pro-
termed a free induction decay (FID), is Fourier vide specific examples of how these various pa-
transformed to give a spectrum that is basi- rameters are useful in the drug discovery pro-
cally a plot of peak intensity vs. frequency, cess. Before doing this, though, it is useful to
with one peak for each chemically distinct nu- consider some of the limitations of one-dimen-
cleus. These features are schematically illus- sional (ID) NMR spectroscopy, particularly
trated in Fig. 12.2. The frequency axis is when the detected nucleus is 'H. as is most
termed the chemical shift because it reflects commonly the case. With one signal coming
the local chemical environment of each nu- from each chemically distinct proton and with
cleus. The range of chemical environments of those signals spread only over 10 ppm, it is
51 2 NMR and Drug Discovery
Table 12.1 NMR Parameters and Their Applications in Drug DesignDiscovery

Parameter Information Provided Relevant to Drug Design
Chemical shift Reflects local chemical environment; provides a fingerprint marker of
structure (particularly in HSQC spectra)
Coupling constants Conformational analysis, establishing molecular connectivity
Nuclear Overhauser effect Determining interproton distances, three-dimensional structures
Relaxation times Molecular dynamics
Line-shape Detecting and quantifying chemical exchange processes
Peak intensities Reflect relative number of nuclei, molecular symmetry
Amide exchange rates / Hydrogen bonding or solvent exposure of amide protons
temperature coefficients
clear that spectral overlap can potentially be a drug cyclosporin, and includes a region of both
major problem for anything but the simplest the 'H/l5N and lH/l3C HSQC spectra. In
of molecules. The development of higher field HSQC spectra-overlap problems are alleviated
NMR spectrometers, which effectively provide because, even if two protons have the same
greater dispersion in the frequency dimen- chemical shift and would hence be overlapped
sion, has contributed significantly to overcom- in a 1D spectrum, chances are that the respec-
ing this limitation and increasing the applica- tive heteronuclear signals will not be over-
tion of NMR for studying pharmaceutically lapped, allowing the signals to be resolved in
relevant molecules. In addition to such instru- the 2D spectrum. HSQC spectra are widely
mental developments, methodological ad- used in NMR-based drug screening and we
vances have also played a key role in extending will return to them later.
the use of NMR. Multidimensional NMR Multidimensional NMR spectra are not re-
methods have revolutionized biomolecular stricted to cases where the separate frequency
NMR spectroscopy by removing the limita- axes encode signals from different nuclear
tions of a single frequency dimension, leading types. Indeed, much of the early work on the
to the development of 2D, 3D, and 4D spectra. development of 2D NMR was performed on
A simple way of illustrating multidimen- cases where both axes involved 'H chemical
sional NMR is through reference to hetero- shifts. The main value in such spectra comes
nuclear correlation spectroscopy, in which two from the information content in cross peaks
or more separate frequency dimensions are between pairs of protons. In COSY-type spec-
correlated with one another. For example, a tra (COSY = Correlation SpectroscopY) cross
particularly valuable 2D experiment is 'H-15N peaks occur only between protons that are sca-
heteronuclear single quantum correlation lar coupled (i.e., within 2 or 3 bonds) to each
(HSQC) spectroscopy, in which the resultant other, whereas in NOESY (NOE Spectros-
spectrum has two frequency axes, correspond- copy) spectra cross peaks occur for protons
ing to 'H and 15Nfrequency dimensions, and that are physically close in space (<5 A apart).
one intensity axis. Analogous 'H-13C HSQC A combination of these two types of 2D spectra
spectra are also widely used. Such spectra are may be used to assign the NMR signals of
normally represented with the intensity axis small proteins and provides sufficient infor-
in contour form so that they may be drawn in mation on internuclear distances to calculate
two dimensions as a set of contour peaks. three-dimensional structures. Figure 12.3 in-
Spectral peaks occur for pairs of l5NI1H or cludes a panel showing the COSY spectrum of
l3C/lH nuclei that are directly bonded to one cyclosporin and highlights the relationships
another, and with each frequency being char- between 1D 'H-NMR spectra and correspond-
acteristic for the local chemical environment ing 2D homonuclear (COSY) and hetero-
they represent a relatively simple, but highly nuclear (HSQC) spectra.
characteristic fingerprint of the sample. Fig- Homonuclear 2D spectra are generally ap-
ure 12.3 shows the relationship between 1D plicable for the study of proteins up to only
and 2D spectra for the immunosuppressive approximately 80 amino acids in size. For
Figure 12.3. A schematic representationof the
(a) 1D 'H; (b) 2D DQF-COSY; (c) l5NI1H-HSQC;
and (dl 13C/1H-HSQCspectra
- of the immuno-
suppressive agent cyclosporin. Example reso-
nanceslcorrelations from residues 6 and 7 have
been highlighted to illustrate the assignment
process.
Magnet
Sample tube
Probe
Figure 12.4. Block diagram of a modern NMR spectrometer. These systems use superconducting
magnets that are based on a solenoid of a suitable alloy (e.g., niobium/titanium or niobiumltin)
immersed in a dewar of liquid helium. The extremely low temperature of the magnet itself (4.2 K) is
well insulated from the sample chamber in the center of the magnet bore. The probe in which the
sample is housed usually incorporates accurate temperature control over the range typically of 4 to
40°C for biological samples. The rf coil in the probe is connected in turn to a preamplifier, receiver
circuitry, analog-to-digital converter (ADC), and a computer for data collection.
larger proteins the increased number of sig- The above discussion provides a basic over-
nals leads to overlap problems and, in addi- view of some of the methods important in
tion, COSY-type spectra suffer from poor sen- modern NMR spectroscopy. Before examining
sitivity when the signal linewidths are of the specific applications in drug discovery it is use-
same order as or larger than 'H, 'H scalar ful to describe the instrumental requirements
coupling constants. Such limitations are refor such studies.
duced by use of spectra of higher dimensional-
1.4 Instrumentation
ity (i.e., 3D or 4D spectra) that are based on
correlations involving heteronuclear rather NMR spectrometers constitute a powerful and
than homonuclear coupling constants. Such homogeneous magnet, a radiofrequency con-
spectra are important in the structure deter- 'sole for generating appropriate rf pulses, a
mination process for larger proteins and are probe for applying this rf energy to the sample .
typically recorded for samples that incorpo- and receiving the resultant signals, and a com-
rate uniform labeling with 15N, or both 13C puter console for controlling the experiments
and 15Nnuclei. Multidimensional spectra that and acquiring the resultant data. These fea-
involve irradiation of 'H, 13C, and 15N nuclei tures are summarized in Fig. 12.4. Spectrom-
are referred to as triple resonance spectra. eters are normally specified in terms of the
The details of how multidimensional spec- resonant frequency of protons at the given
tra are obtained is beyond the scope of this magnetic field (e.g., 500 MHz corresponds to a
chapter, but it suffices to say that, like most magnetic field of 11.7 Tesla). Both sensitivity
other modern NMR experiments, they involve and dispersion of signals increase with in-
irradiation of the sample with a set of rf pulses creasing magnetic field.
of defined length, frequency, and phase, with There have been some major break-
specific interpulse delays. The pulse programs throughs in both NMR instrumentation and
for such experiments are commonly provided methodology over the last decade that have
with the spectrometer as part of a standard greatly increased the utility of NMR for drug
library of experiments and may easily be run discovery applications. These are summarized
by novice users after input of an appropriate in Table 12.2, which also includes some of the
set of parameters to define the relevant spec- earlier milestones in the development of
tral widths and type of experiment required. NMR. Most notable among recent innovations
1 Introduction 515
Table 12.2 Milestones in the Development of NMR Spectroscopy

Year Development Nature
1970 FT NMR Instrumental
1975 Superconducting magnets Instrumental
1980 2D NMR Methodological
1985 Protein structure determination Methodological
1990 Isotope labeling/multidimensional NMR Methodological
1990 Pulsed field gradients InstrumentaVmethodological
1995 NMR screening Methodological
1997 TROSY Methodological
1998 LC-NMRLCMS-NMR Instrumental
2000 Cryoprobes Instrumental
are the use of pulsed-field gradient methods currently available but, at the time of writing,
for improving spectral quality and allowing only a few have been installed. Numerous 800-
new types of experiments to be performed, MHz systems dedicated to structure-based de-
transverse relaxation-optimized spectroscopy sign have been installed in pharmaceutical
(TROSY) methods (20) for increasing the size laboratories. The high field instruments pro-
of macromolecules that can be examined, and vide another advantage in that TROSY exper-
cryoprobes for enhancing sensitivity. The de- iments (20) can be used to produce a marked
velopment of cryoprobes has resulted in the improvement in spectral quality for larger
biggest single gain in sensitivity over recent proteins. Such developments promise to push
years, effectively giving 500-MHz spectrome- higher the size of proteins whose structure can
ters the sensitivity of 800-MHz spectrometers be determined by NMR.
(although without the gain in resolution!). For NMR drug screening programs, the ba-
The enhanced sensitivity is obtained by cool- sic requirement of a spectrometer of 500 MHz
ing the receiver coil and associated circuitry to or greater remains, but in addition, an inter-
near liquid helium temperatures, thereby re- face that allows the spectrometer to sample a
ducing thermal noise. There were consider- library of compounds of potential binding li-
able technical barriers to be overcome in de- gands needs to be present. This may be done
veloping such probes because of the large either by use of a discrete sample changer or a
difference in temperature between the re- flow-type system. Flow systems have the po-
ceiver coils and the sample, which are only a tential advantage of increased throughput but
few millimeters apart. These barriers have have the potential disadvantage of precipita-
now been overcome and cryoprobes are being tion of protein samples. In practice this ap-
installed in a large number of laboratories. pears not to have been a major problem and
They are also becoming available for higher both types of systems are in use in the phar-
field systems (800 MHz), thus providing fur- maceutical industry. Sample changer systems
ther sensitivity gains. currently have the advantage that they may be
Although the basic configurations of in- adapted for use with cryoprobe technology
struments tailored for structure-based design (currently unavailable for flow systems). Cryo-
or for NMR drug screening are similar, there probes allow dramatically enhanced sensitiv-
are some minor differences. For structure- ity gains, which bring particular advantages to
based design applications a relatively high the study of macromolecule-ligand interac-
field spectrometer is required (>500 MHz), tions used in screening programs (21).
usually equipped with three or four radiofre- Pulsed-field gradients have become inte-
quency channels for the simultaneous irradi- gral to most modern NMR spectrometers and
ation of 'H, 13C, 15N, and in some cases 'H are routinely used both for structure determi-
nuclei. The greatest sensitivity and dispersion nation and screening experiments. Another
are obtained with the highest possible mag- recent development has been the interface of
netic field. Instruments of up to 900 MHz are NMR spectrometers with other instrumenta-
516 NMR and Drug Discovei
NMR ligand Lead molecule Therapeutic drug

M 1 M 1 M
rn ____)
NMR scrc--
jell -' \
UI
I :^^-
u y a i IId
Structure-based
compound library enhancement ~ ua
I design
L ~
Figure 12.5. A summary of the relationship between NMR screening and structure-based design.
(Adapted from Ref. 15.)
tion such as liquid chromatography (LC) design new drugs. The questions that may b
andlor mass spectrometry (MS). The applica- asked when embarking on structure-based df
tions of these instrumental developments to sign projects are:
drug discovery have been recently reviewed (8,
13,22). What are the solution and bound conform2
tions of the ligand?
1.5 Applications of NMR in Drug Design
What is its charge/tautomeric state?
and Discovery
Which functional groups bind to the recer
Our focus here is on the use of NMR in the tor and what charge state are they in?
discovery and design phase of drug develop-
0 What is the structure of the receptor?
ment. The major role of NMR in the design
process comes about by its exquisite ability to Which parts interact with the ligand?
provide structural information, whereas the What is the geometry of the ligand-recepto
major role of NMR in discovery comes through complex?
its use as a screening tool to detect the binding What are the kinetics of binding and arm
of novel ligands to macromolecular targets. As there dynamic motions of ligand, receptor
already noted, the latter application is a rela- or the complex?
tively recent development but has created
much interest in the pharmaceutical industry Table 12.3 summarizes these and othe
and promises to significantly enhance applica- questions and indicates the type of NMR ap
tions of NMR in this industry. The impact of proaches that can provide answers. Remain
the methodology is already becoming evident ing sections of this chapter are organize(
even at this early stage, with several SAR-by- around the headings identified in Table 12.3.
NMR-derived leads currentlv" in clinical devel- In considering these questions it is conve
opment. As already noted, though, the discov- nient to distinguish between ligand-based de
ery and design phases are often intimately sign, where the structural focus is on the smal
connected, with lead molecules discovered in lead molecule, and receptor-based design
screening programs routinely being optimized where the aim is to determine the structure o
by use of structure-based design approaches the macromolecular target. The NMR meth
(Fig. 12.5). ods used in ligand-based design have been we1
In the context of this chapter structure- established for many years, based on the use o
based design refers to the process of determin- NMR by organic and natural product chemist!
ing the three-dimensional structure of a lead for more than four decades. However, therc
molecule or macromolecular target, or deter- have been some important recent advances ir
mining the structure of the macromolecule- NMR methods such as the use of pulsed fiek
ligand complex, and using this information to gradients, and in the combination of NMF
2 Ligand-Based Design 51 7
Table 12.3 Information on Ligands, Macromolecules, and Their Complexes Sought in

Structure-Based Design and Relevant NMR Technologies Used to Derive This Information
Target Information NMR Technology
Ligand Solution conformation 1Dl2D NMR
Chargeltautomeric state Chemical shiftltitrations
Solution dynamics Line-shapelrelaxation analysis
Pharmacophore models All of the above, and TrNOE, of
multiple ligands
Bound ligand conformation TrNOE
Macromolecule 3D structure 2DI3Dl4D NMR
Macromolecular dynamics Relaxation time measurements
Structure of articulated TROSY
macromolecules (e.g., multimeric
or membrane-bound receptors)
Ligand-macromolecular Stoichiometry of complex Chemical shifthitration
complex
Kinetics of binding Line width, titration analysis
Location of interacting sites HSQC, isotope editing
Orientation of bound ligand NOE docking
Bound ligand conformation TrNOE
Structure of complex 3Dl4D NMR
Dynamics of complex Relaxation time measurements
with other technologies such as LC and MS a designed lead compound, with the aim of
that promise to enhance applications in this improving the activity or druglike properties.
field (13). The use of NMR to determine the The following sections examine various as-
three-dimensional structures of macromole- pects of ligand-based design and illustrate
cules is a newer field, commencing only in them with examples.
around 1985, and is one that is still rapidly
evolving. NMR screening is a still newer ap- 2.1 Structure Elucidation
proach, developed since around 1996. Ligand- If the bioactive molecule is a synthetic prod-
based and receptor-based design are exam- uct, its structure may be rapidly deduced by a
ined in Sections 2 and 3, respectively, and simple comparison of NMR parameters (often
screening-based approaches are examined in combined with MS) of the product relative to
Section 4. those of the known precursor, to see whether
the desired chemical transformation has
2 LIGAND-BASED DESIGN taken place. If the bioactive compound is an
unknown molecule discovered in an active
Many naturally occurring molecules have po- fraction in bioassay-guided screening, then
tent bioactivity that renders them useful leads the first step is to elucidate its structure. Typ-
in the drug design process. These may be nat- ical molecules that form the basis of such nat-
urally occurring hormones, neurotransmit- ural products-based drug discovery studies in-
ters, or other endogenous molecules, or they clude "organic" natural products as well as
may be bioactive molecules from plants or mi- small peptides and proteins. The approaches
croorganisms. Furthermore, screening pro- to structure elucidation for natural products
grams on synthetic compound libraries fre- and peptides/proteins are a little different
quently result in the discovery of bioactive from each other and are described in turn.
molecules that then become starting points in
drug design. The general aim of ligand- or an- 2.1 .I Structure Elucidation of Natural Prod-
alog-based design is to determine the struc- ucts. In the case of nonpeptidic natural prod-
ture and conformation of a known bioactive ucts the main structural focus initially is to
molecule and then mimic this conformation in elucidate the carbon framework. This nor-
Figure 12.6. Illustration of the HMBC corre-

lations (arrows) used to assign the positions of
two of the methyl quaternary methyl groups
in taxol.
mally involves a combination of 1D 'H and tion of peptide-based natural products in-
13C-NMR, followed by homonuclear (DQF- volves two distinct steps: (1)the elucidation of
COSY, TOCSY, ROESY, or NOESY) and het- the primary structure (amino acid sequence)
eronuclear (HSQC, HMBC) 2D experiments. followed by (2) a determination of secondary/
Heteronuclear multiple bond correlation tertiary structure. The primary structure de-
(HMBC) spectra are particularly valuable be- termination is routinely done through Edman
cause they assist in tracing the backbone of sequencing, or more recently by MS-MS meth-
the molecule. Such spectra display cross peaks ods. NMR plays a key role in the elucidation of
between a 13C nucleus and protons connected the secondary and tertiary structure of pep-
within two or three bonds and, in doing so, tides, mainly based on 2D homonuclear NMR
provide valuable information on molecular spectroscopy. A combination of DQF-COSY
connectivity. Figure 12.6 shows typical HMBC and TOCSY (Total Correlalation Svectros--
correlations seen for selected regions of taxol, copy) spectra are used to assign spin systems
a plant-derived natural product that is cur- to amino acid types and then NOESY spectra
rently a leading treatment for breast and ovar- are used to sequentially assign the resonances
ian cancers. Although the structure of taxol to individual protons in the peptide (23). The
itself was originally deduced from a combina- three-dimensional structure is then deter-
tion of X-ray crystallography on a degradation mined by deriving a series of internuclear dis-
product and a range of 'H and 13C spectra in tance restraints from the NOESY svectrum
the 1970s, before HMBC spectra had been in- and using them in a simulated annealing algo-
vented, HMBC spectra have been widely used rithm to calculate a family of structures con-
for studies of the many taxol derivatives that sistent with them.
have been examined in the last decade. Because the structure determination of
Elucidation of the carbon framework of peptides and proteins represents a very impor-
natural products often yields substantial in- tant contribution of NMR to the drug develop-
formation about the three-dimensional struc- ment process, it is informative to describe the
ture at the same time. but if there are remain- vrocess in more detail. To do this we will use
A
ing questions on the stereochemistry of chiral the recently developed peptide-based drug
centers or other factors affecting the three- MVIIA as an example.
dimensional structure, these can usually be 2.1.2.1 NMR Structure of Ziconotide: A
-
resolved from NOESY svectra andlor an anal- Novel Treatment for Pain. MVIIA, now known
ysis of coupling constants. We will return to as Ziconotide, is a 25-amino acid peptide orig-
the taxol example later in Section 2.2 when inally discovered from the venom of the ma-
describing conformational analysis. rine cone snail, Conus magus. Like other
o-conotoxins it is a potent blocker of N-type
2.1.2 Structure Determination of Bioactive calcium channels, giving it a wide range of po-
Peptides. In contrast to the process described tential therapeutic applications. When deliv-
for organic molecules, the structure elucida- ered intrathecally (i.e., through spinal infu-
2 Ligand-Based Design
sion), it is approximately 1000 times more ids units, the repeated NH, Ha, and side-chain
potent than morphine as an analgesic and has protons tend to fall in characteristic chemical
great potential for the treatment of intracta- shift ranges that can be useful in looking for
ble cancer pain (24). Figure 12.7 shows the patterns to identify amino acid types. Table
peptide sequence and illustrates selected re- 12.4 shows typical chemical shifts for each of
gions of the TOCSY and NOESY spectra. the 20 common amino acids when located in a
As seen in Fig. 12.7, the TOCSY experi- "random-coil" environment (23. . , 25. 26). It is
ment is useful for classifying spin systems to important to stress that these shifts can vary
amino acid type, with typically the most useful quite considerably in structured proteins (by
region being the "skewers" emanating from up to several ppm) and are more useful for
individual NH shifts (-7-10 ppm). For each pattern recognition purposes than for exact
NH proton in the peptide a series of cross identification of a particular residue. In the
peaks to the a, P, and other side-chain protons case of the H a protons, the differences be-
is observed and these patterns define the spin tween the actual shifts in a structured protein
system as belonging to a particular type of and these random-coil values have an addi-
amino acid. Note, however, that there is some tional important use, in that they provide an
degeneracy in the resultant patterns. The NH indication of the local secondary structure. In-
side-chain pattern is truncated if there is a tuitively, the further a chemical shift is from a
break of more than three bonds between pro- random-coil value. the more likely it is attrib-
tons within the spin system. This means, for
example, that the skewers for aromatic resi-
a
uted to that residue's being in structured
environment.
dues extend only as far as the P-protons and After the assignment is complete it is pos-
they therefore appear similar to other "AMX" sible to derive substantial information about
residues such as Cys, Ser, Asp, or Asn. Never- the secondary structure from an analysis of
theless, the ability to assign signals to either chemical shifts, coupling constants, and
individual amino acid types or to the AMX NOEs, even before the three-dimensional
group is a useful starting point in the assign- structure calculations are commenced. Fimre .,
ment. However, such spectra provide no infor- 12.8 shows a typical summary of the relevant
mation about the sequential location of an NMR information, again using the data for
amino acid if it is not unique in the sequence. MVIIA as an example (27,28). Trends in these,
These sequential assignments are obtained data provide a general indication of major ele-
from the NOESY spectrum, as illustrated in ments of secondary structure. For example, a
the sequential walk shown in the middle panel series of strong daN(i, i + I), relative to
of Fig. 12.7. The aim of the sequential assign- dNN(i, i + 1) NOEs often indicates an ex-
ment process is to locate adjacent amino acid tended or P-type structure, whereas strong
spin systems, principally through a cross peak dNN(i, i + 1) NOEs indicate local helical
between the a H proton of one residue (i) and structure or turns. Large J a N coupling con-
the NH of the following residue (i + l), often stants ( X . 5 Hz) are associated with extended
denoted as daN(i, i + 1). Additional support structure and small ones (<5 Hz) with helical
for the assignment is usually also sought in structure. Similarly, deviations of chemical
dpN(i, i + 1)and dNN(i, i + 1)correlations. At shifts from random-coil values, often repre-
the early stages of an assignment it is impos- sented in terms of "chemical shift indices"
sible to be certain whether a particular cross (29),indicate extended (positive values) or he-
peak is a sequential or longer range cross lical structure (negative values).
peak; however, as the assignment procedure An additional useful parameter is the ex-
progresses, ambiguities become resolved. The change rate of amide protons after dissolution
assignment process is generally highly conver- of the sample in D,O. Slowly exchanging
gent, in that once a series of correct assign- amide protons indicate protection from sol-
ments is made the number of choices for re- vent and possible involvement in intramolec-
maining cross peaks diminishes, in principle ular hydrogen bonds associated with elements
making their assignment easier. of secondary structure. All of the NMR and
Because peptides are polymers of amino ac- slow exchange data can be consolidated to give
10.0 9.5 9.0 8.5 8.0 7.5

5' (PP~)
1 10 20
CKGKGAKCSRLMYDCCTGSCRSGKC
Figure 12.7. Schematic representations of 2D-NMR spectra of the conotoxin MVIIA. (a) The fin-
gerprint region of the TOCSY spectrum with selected spin systems marked. (b) Fingerprint region of
the NOESY spectrum showing two (K2-A6 and Lll-Y13) sequential walks. (c) NH-NH region of the
NOESY spectrum showing correlations between the NH protons of Dl4 and G15; C16 and T17; and
S22 and G23.
Table 12.4 'H Chemical Shifts for the 20 Common Amino Acid Residues
in Random-Coil Peptidesa
Residue NH aH pH Others
Ala
Arg
Asn
Asp
CY~
Gln
Glu
G~Y
His
Ile
Leu
LY~
Met
Phe
Pro
Ser
Thr
T~P
TY~
Val
"The backbone shifts (aHand NH, ppm) are from Wishart et al. (26).The remaining shifts are from Wiithrich 1986 (23).
an accurate representation of secondary struc- tance restraints. These are then used in a sim-
ture, as indicated in the lower panel of Figure ulated annealing algorithm to calculate a fam-
12.8. In the case of MVIIA a triple-stranded ily of 3D structures consistent with the input
P-sheet may be deduced on the basis of the restraints. Fig. 12.9 shows two commonly
local NOE, coupling, chemical shift, and used methods of representing such NMR-de-
amide-exchange NMR data. rived structures, either as a stereoview of the
Once all peaks in the 2D spectra have been superimposed family of structures or as a rib-
assigned, cross peaks in the NOESY spectrum bon diagram, in which elements of secondary
are used to derive a series of interproton dis- structure are highlighted. For the latter rep-
522 NMR and Drug Discovery
(e) NHID 00.00 0 0 0 0 0
(i) CSI
C K G K G A K C S R L M Y D C C T G S C R S G K C
Figure 12.8. A summary of the NMR data observed for MVIIA. (a) Ha-NH sequential NOEs. (b)
NH-NH sequential NOEs. (c)HP-NH sequential NOEs. (f-h)Other short-range NOEs. The thickness
of the bar indicates the strength of the observed NOE (weak, medium, or strong). (d) Three-bond
NH-Ha coupling data, where upward-pointing arrows indicate a large coupling (>8Hz) and down-
ward-pointing arrows indicate a small coupling (<5 Hz). (e) H/D exchange data, where a filled circle
represents a slow exchanging NH. (i) Chemical shift index (CSI) data. The CSI uses a scoring system
that compares Ha shifts to random-coil chemical shifts. A sequence of consecutive +1 scores is
indicative of f3-structure,whereas a sequence of consecutive -1 scores suggests helical structure. Cj)
The P-sheet of MVIIA. Double-headed arrows indicate observed NOES and broken lines indicate
proposed H-bonds.
ligand-Based Design 523
Figure 12.9. (a) A stereoview of the superimposed backbone structures of the 20 lowest energy
conformations for MVIIA(27). (b) Ribbon diagram of MVIIA.
sentation the lowest energy or average coupled receptors. In mammalian species two
ember of the ensemble is often chosen as receptors, ETA and ET,, have been cloned;
presentative of the structure. It is impor- both are widely distributed in human tissue
ta:nt, however, to examine the full ensemble to and are distinguished by different responses
gain a complete understanding of the struc- to various ET isoforms.
ture. Regions of disorder in the ensemble can The NMR-derived three-dimensional struc-
be indicative of a lack of sufficient distance ture of ET-1 consists of several distinct re-
re1straints, perhaps attributable to overlap or gions, including a random-coil N-terminus, a
as)signment errors, or may be related to local p-turnover residues 5-8, followed by a short
flexibility. helical region and a flexible C-terminal tail (as
In the case of MVIIA the peptide itself is summarized in Ref. 31). The presence of the
being clinically developed as the active drug flexible tail in solution is not surprising, as
f0l administration through the intrathecal may be imagined from the primary sequence
(s1~inalinfusion) route. However, in general, shown in Fig. 12.10. Although solution struc-
Peptides have a range of potential disadvan- tures of ET and its analogs (32-46) deter-
tal;es as drugs, including poor bioavailability mined by - NMR have been valuable in defining-
and susceptibility to proteolytic breakdown. the gross conformation of these molecules, the
Thus, for many cases involving peptide-based flexibility of the tail in solution makes it diffi-
lea~dsthe structural information of the type cult to extrapolate to the bound state. Indeed,
de!scribed above might be used as a starting an X-ray structure of ET-1 has quite a differ-
po:int to design smaller constrained peptides ent structure for the C-terminal tail than for
or nonpeptidic mimics. This is the case, for the random-coil arrangement in solution (46).
ex;imple, in the development of endothelin an- The bound conformation may be different
tagconists described below. again.
2.1.2.2 Endothelin as a Lead in Ligand- There is clearly an advantage to having
Ba:sed Design. Endothelin (ET), shown in Fig. lead molecules with reduced flexibility, given
12..lo, is a 21-amino acid endothelial-derived that their solution conformation will intrinsi-
COllstricting factor that has gained promi- cally provide a better reflection of the bound
neince as a pharmacological lead molecule. In- conformation. In addition, the development of
ter'est in the peptide arose because of its po- a more rigid drug will reduce unfavorable en-
terit renal, pulmonary, and neuroendocrine tropic contributions to binding energy. In-
act.ivities. Endothelin and its isoforms have deed, a range of small cyclic peptides that are
bet?n implicated in a wide variety of disease ETA-or ET,-selective antagonists have been
sta.tes including ischemia, cerebral vaso- discovered and provide valuable leads to the
sp: ism, stroke, renal failure, hypertension, development of potential therapeutics. NMR
antd heart failure (30). It exerts its pharmaco- studies have been instrumental in determin-
loeical effect by acting on specific G-protein-
0
ing their solution conformations. For exam-
Leu 4 d-Trp 5
Figure 12.10. (a) Primary sequence and disulfide connedivities of endothelin-1(ET-1). (b) Primary
structure of the cyclic endothelin antagonist BE18257B and (c)a family of 36 NMR structures, which
demonstrate the well-defined nature of the cyclic peptide backbone.
ple, the rather well defined solution confor- 2.1.3 Instrumental Advances and their Im-
mation (47) of the ETA-selective antagonist pact on Structure Elucidation. Over the last
BE18257B (shown in Fig. 12.10) contrasts few years there have been several exciting in-
with the flexibility of the tail region of ET that strumental developments that promise to dra-
this peptide is thought to mimic. The discov- matically expand the role NMR will play in the
ery and development of these molecules illus- drug discovery process. These relate to the
trate the principle that cyclic peptides are of- combination of NMR with other technologies
ten more suitable than linear peptides as lead such as LC and/or MS and the use of NMR to
ligands in drug design. In addition to their bet- directly monitor reactions carried out on solid-
ter-defined and less-flexible conformations phase resins (8,13,22).The latter promises to
than those of their linear counterparts, they indirectly enhance drug discovery programs
generally have improved bioavailability and by improving the monitoring and hence effi-
resistance to protease attack. ciency of solid-phase combinatorial synthesis.
We shall return to endothelin as a lead in Effectively, resin-based syntheses can be mon-
drug design, in relation to a nonpeptidic an- itored at successive stages without the need to
tagonist. The underlying theme illustrated by cleave intermediate products from the resin.
the endothelin example is that ligand-based As already mentioned, the additional sensi-
design often proceeds from initial studies of tivity brought about by cryoprobe technology
flexible endogenous molecules (particularly promises to enhance a wide range of NMR ap-
peptides) to constrained mimics (e.g., cyclic plications, but will be particularly important
peptides) and often culminates in the develop- in natural products-based drug discovery. In
ment of nonpeptidic drug leads. NMR assists many cases only limited amounts of pure com-
by defining the structures of the lead and sub- pounds are isolated from natural products ex-
sequent molecules. tracts and sensitivity has been a major limit-
Figure 12.11. Illustration of the Karplus relationship between three-bond scalar coupling constants
and the dihedral angle of the intervening bond. The relationship is indicated for the 4 torsion angle
of the H2 and H3 protons within the rigid core of taxol and related derivatives. See Fig. 12.6 for the
structure of taxol.
ing factor on structure elucidation. LC/MS/ ity. Three-bond vicinal coupling constants are
NMR systems will greatly improve the particularly valuable because their depen-
efficiency of such analyses by minimizing the dency on the intervening dihedral angle'
need for separate sample-handling steps for through the Karplus relationship allows local
the different analytical technologies. geometry to be determined. This is illustrated
in Fig. 12.11 for taxol. Although there are sev-
2.2 Conformational Analysis eral vicinal coupling constants in this mole-
Usually only 1D or 2D NMR methods are re- cule (Fig. 12.6), only one 3J,,,, occurs in a
quired to determine the solution conformation region of the molecule that is expected to be
of bioactive ligands. Useful tools include anal- conformationally rigid and thus suitable for
ysis of chemical shifts, coupling constants, and conformational determination by use of cou-
NOEs. An assumption inherent in the applica- pling constants. In taxol and a range of ana-
tion of such studies to drug design is that the logs this coupling is in the range 4-7 Hz, con-
solution conformation will be maintained on sistent with partially eclipsed dihedral angles
binding to the receptor. This is justified in the of approximately 120-140" for this ring-con-
case of relatively rigid ligands. However, for strained structure. This is in good agreement
potentially flexible ligands the possibility of with the X-ray structure of a taxol analog,
changes in conformation on binding must be where the angle is 120". Note that, in general,
considered, as noted above for the case of en- such a Karplus analysis does not give a unique
dothelin. solution unless several coupling constants
Coupling constants and NOEs are the main sampling the same dihedral angle are present
NMR parameters used in determining the so- and is reliant on the assumption that the mol-
lution conformations of drug leads. NOEs pro- ecule exists only in a single conformation in
vide information about through-space proxim- solution. Although it is generally believed that
Figure 12.12. Chemical shift changes of the P-protons of Asp14 in MVIIA illustrating the lack of
titration of the adjacent carboxyl group, indicating its invovlment in salt bridge. By contrast, the shift
of a control random-coil peptide varies with an apprent pK, value of 3.7, as expected for an uncom-
plexed carboxyl moiety in peptides.
this is the case for the core of taxol, recent function of pH for nuclei near these ionizable
relaxation data (48) described in section 2.4 groups provide a convenient way of determin-
suggest that this conclusion may need to be ing the pKa value and hence charge state. Thib
reexamined. is illustrated for ziconotide in Fig. 12.12,
In addition to studies of the taxol core, where it was suspected that one of the ioniz-
there have been a large number of studies of able groups in the molecule, Asp14, may be
the conformations of the side chains of taxol involved in a stabilizing salt-bridge interac-
and it appears that these are certainly flexible tion (28). This was confirmed by noting that
and that the molecule may adopt both ex- the pK, value for this residue is lowered con-
tended and folded conformations of the side siderably relative to the usual value for Asp.
chains. In a case like this the observed vicinal- The P-proton chemical shifts were essentially
coupling constants are a weighted average of independent of pH over the range 3-7 (indicat-
those from the participating conformers. ing a pKa < 3), whereas those of a control,
random-coil peptide, titrated as expected over
2.3 Charge State this range.
An advantage of NMR over other structural
2.4 Tautomeric Equilibria
techniques such as X-ray crystallography is
that it has the potential to provide informa- Tautomerization is a relatively common fea-
tion not only on structure but also on the elec- ture of drug molecules that is amenable to
tronic properties of molecules. Many drug analysis through the use of chemical shifts or
leads contain ionizable groups and a determi- coupling constants as probes. This was re-
nation of their charge state in solution and/or cently demonstrated in a study of some non-
at the bound site is important in the design of peptide endothelin analogs (49).Starting from
analogs. Simple plots of chemical shifts as a the modestly active compound (1) (Table
- Base
Acid
12.51, derived by screening a compound li- tion of pH, from 2.65-9.05 (49). At acidic pH,
brary for ETAantagonists, the nanomolar in- compound (2) exists essentially in the closed
hibitor (2) was developed. Further optimiza- butenolide form. Because the pH is slowly
tion through examination of electronic and raised by addition of NaOD, the spectrum be-
structural requirements led to the subnano- gins to exhibit properties associated with the
molar inhibitor (3),which was subsequently open form keto-acid, and at basic pH the com-
put forward for evaluation in a number of pre- pound is essentially all in the open form. The
clinical disease models for stroke. coupling pattern shown by the benzylic pro-
These molecules display keto-en01tautomer- tons is a particularly characteristic marker of
ization, as illustrated in the following struc- the tautomeric process. At acidic pH the ben-
tures. The open form keto-acid salts and the zylic protons exhibit an AB quartet pattern
closed form butenolides exist in a pH-dependent consistent with the ring-closed structure. As
equilibrium in solution, and at physiological pH the pH is raised this pattern coalesces to a
both forms exist. In principle, the biological ac- singlet, broad at neutral pH and sharp at basic
tivity could reside in either or both forms. pH, as would be expected with the open form
The extent of tautomerization was estab- keto-acid structure. After the pH was basic,
lished by evaluation of NMR spectra as a func- addition of DC1 to acidify the solution caused
Table 12.5 Substitution Pattern and Receptor-Binding Affinity of Nonpeptidic

Endothelin Antagonistsa
Compound R1
(1) PD012527 C1 H 430 27000
(2) PD155080 OCH, H >0.4 4550
(3) PD156707 0CH3 3,4,5-0CH3 0.3 780
(4) OCH, 3,5-0CH3,4-0(CH2),S03Na 0.38 1600
"From Refs. 49 and 50
the spectrum to return to its original appear- tivity (48,51-54). For example, dynamics may
ance, consistent with a reversible tautomer- influence entropic contributions to the free
ization process. energy of binding. In general the more flexible
Identical biological results were obtained a ligand is, the more unfavorable will be the
with the salt and closed butenolide form in all loss in entropy on binding, assuming a rela-
pharmacological assays, reflecting equilibra- tively rigid bound state of the ligand. How-
tion at physiological pH. This made it difficult ever, in some cases flexibility of a ligand may
to identify the biologically active form from be a positive factor. This applies, for example,
these experiments alone, although methyl- if a degree of flexibility is required to allow a
ation of the OH group in compounds 1-3 re- ligand access to a buried active site, or if acti-
sulted in a loss of activitv.
" Because these ana- vation of a receptor requires a conformational
logs cannot tautomerize to form open keto- change mediated by ligand binding (9). There-
acids, it seems likely that the open form is fore, a knowledge of the flexibility of lead mol-
responsible for activity. ecules is an important supplement to the
In addition to its impact on the biologically structural and electronic information avail-
active form, the tautomeric process has pro- able from NMR.
found implications for formulation of drug The two major NMR methods for obtaining
candidates, as illustrated in some recent fur- information on ligand flexibility are line-shape
ther development work on compound (3)(50). analysis and relaxation measurements (usu-
Although it is easy to synthesize and isolate ally 13C or 15N TI, T2,or heteronuclear NOE
water-soluble salts of the keto-acids, once they measurements). In general terms, the former
are placed in aqueous solution the tautomeric is sensitive to motions on the milli- to micro-
eauilibrium determines how much of each second timescale and the latter to nanosecond
form is present. Indeed, if the closed butenol- timescales. To some extent, structure calcula-
ide tautomer is sufficiently water insoluble, it tions on peptide-based lead molecules can also
can precipitate out of solution and the equilib- give an indication of regions of flexibility from
rium can drive the complete precipitation of an examination of local regions of disorder
the compound. Although (3)has good oral ac- among a family of calculated structures. Cau-
tivity, its intravenous use is limited by the in- tion must be exercised because other factors
solubility of the closed-form butenolide tau- can contribute to disorder, although in many
tomer without the use of a specific and cases there is a connection between disorder in
complex buffered formulation. Thus in recent a structural ensemble and molecular flexibil-
work a series of water-soluble butenolides was ity (55). A recent example concerns the solu-
developed (50) to overcome this limitation for tion structures of three isomers of the a-cono-
parented uses. This culminated in the devel- toxin GI (56). Attempts to increase structural
opment of (4) (Table 12.51, currently in pre- diversity through the engineering of nonna-
clinical evaluation. tive disulfide bonds showed that nonnative
This description of the development of (4) isomers were not onlv " different in conforma-
provides a good illustration of the fact that the tion but were also considerably more flexible
availability of an active molecule is not the end than the native isomer and had reduced activ-
of the drug development pathway, and that ity.
formulation considerations can be critical. In In an example that illustrates the applica-
this case NMR played a significant role in un- tion of NMR relaxation measurements for
derstanding tautomeric processes that had a studying ligand flexibility, Kessler and col-
direct bearing on solubility and hence formu- leagues (57) investigated the role of disulfide
lation. bonds in the a-amylase inhibitor tendamistat.
This small protein contains two disulfide
2.5 Ligand Dynamics: Line-Shape
bonds (Cll-C27 and C45-C73) and opening of
and Relaxation Data
the latter is known to reduce the melting tem-
It is increasingly being recognized that the so- perature of the protein (i.e., reduce its stabil-
lution molecular dynamics of drugs may have ity), but in this case does not affect its a-amy-
an important role in modulating biological ac- lase inhibitor function. The latter observation
Table 12.6 13C-NMRChemical Shift and Relaxation Data for Thyroxine

-
Theoreticalb
Two-State
a
Experimental Isotropic Motion Internal Motion
Chemical Shift
Position (PP~) TI( 5 ) NOE TI(s) NOE T I (6) NOE
- -
"Measured relaxation data at 75 MHz and 305 K (52).

b~heoreticalbest-fit values based on the indicated models for molecular motion.
suggests that this disulfide bond may not af-

fect either the structure or the dynamics of the
molecule. To investigate this possibility het-
eronuclear NOE measurements were used to
determine the effect of the selective removal of
the single disulfide bond on the dynamics of
the molecule. To assess structural changes,
chemical shift differences, intrastrand NOE
effects, and protected amide protons were
measured for the disulfide-deficient variant
C45AJC73A and wild-type tendamistat. Re-
moval of the C45-C73bond by the C45AJC73A ine is widely used for the treatment of thyroid
mutation was found to have no influence on disorders and indeed has been the second most
the p-barrel structure, apart from very local widely prescribed drug in the United States
changes at the mutation sites. 13C and 15N over recent years.
relaxation data showed that the only region of Table 12.6 shows experimental 13C T1 and
significant internal mobility in either the wild heteronuclear NOE data for thyroxine to-
type or variant protein was at the N-terminus. gether with best-fit theoretical values for
There was little difference in internal flexibil- these parameters based on two different mod-
ity between the two proteins, consistent with els of the motion of thyroxine in solution. A
their similar a-amylase inhibitory activity. In simple isotropic model, in which the drug is
this case it appears that the role of the C451 regarded as a rigid body tumbling in solution,
C73 disulfide bond is more related to thermo- is clearly unable to simultaneously fit the T,
dynamic stability and the secretion efficiency and NOE data, whereas the two-state jump
of the protein rather than to limiting its flexi- model, which incorporates a degree of internal
bility. mobility based on rapid flipping between con-
The thyroid hormones, exemplified by thy- formational states, is better able to account for
roxine (fi), provide an example of the use of the experimental data. Although it is difficult
both line-shape analysis and NMR relaxation to precisely define the nature of the internal
time measurements, to give an insight into the motion, it is clear that rapid (nanosecond) in-
internal flexibility, and perhaps the mode of ternal motion is present, and this appears to
action, of pharmaceutically important mole- involve small-amplitude torsions about the ar-
cules (52,53,58).The thyroid hormones act by omatic ring axes. The correlation time for
binding to a nuclear receptor and appear to overall tumbling of thyroxine was deduced to
control receptor function by inducing a confor- be approximately 0.35 ns, with the internal
mational change that directs the alignment of motion approximately 30-fold faster (52).
functionally critical secondary-structure ele- Studies of the 'H-NMR spectrum of thyrox-
ments of the receptor (59). Synthetic thyroxine as a function of temperature had earlier
Ha' 0
COOH
NH2
Figure 12.13. Schematic illustrations of motions of the outer ring of thyroxine. The dotted line
through the outer ring shows the jump axis about which the ring rotates. (a) Ha is shown in the
proximal position and is closer to the viewer than Hb because the torsion angle +' is greater than O".
This conformation corresponds to one of the two states of the two-state jump model and agrees with
the "twist" of the outer ring observed in the crystal structure. (b) Rotation about the dotted line
through the center of the outer ring moves Ha away from the viewer and brings Hb toward the
viewer. This corresponds to the second state of the outer ring in the two-state jump model. (c) Hb is
now in the proximal position and closer to the viewer than Ha. (d) Hb is in the proximal position and ,
is now further from the viewer than Ha. Transition from a to b and from c to d involves small
amplitude jumps on the nanosecond timescale and is detected by NMR relaxation measurements.
Although not illustrated in the figure, the inner ring also exhibits this type of motion. Transitions a
to c and b to d result in 180" flips of the outer ring and exchange of the environments of Ha and Hb.
This ring flip occurs on a microsecond timescale and is detected by variable temperature line-shape
studies. (Reprinted with permission from Ref. 52. Copyright 1996 American Chemical Society.)
demonstrated the presence of additional

larger amplitude, but slower ring flips (60).At
low temperature two signals were seen for the
The derived barriers for several thyroid hor-
H2' and H6' protons. These signals broadened
mones are in the range 36-38 kJ/mol, which
with increasing temperature, then coalesced
and sharpened as the temperature was further corresponds to large-amplitude ring flips on
increased. This was attributed to exchange of the milli- to microsecond timescale. From a
the environments of the two protons brought combination of the relaxation data and the dy-
about by 180" rotation of the "outer" ring of namic line-shape analysis data it was possible
thyroxine. Substitution of the observed coales- to propose a unified model that accounts for
cence temperature (T,) and the chemical shift both the fast and slow internal motions, as
difference of the two signals at low tempera- summarized in Fig. 12.13.
ture (Sv)allowed the free energy of activation In this model, both aromatic rings of the
for this slow ring flip process to be established thyroid hormones jump rapidly between two
from equation 12.1 (53,601. energetically equivalent conformations on a
nanosecond timescale (a eb and c @ d in Fig. the ligand and may in fact trigger receptor
12.13). The half-angle of the jump varies, de- conformational changes (52).
pending on the solvent, corresponding to an As briefly mentioned earlier, taxol provides
average displacement of about 90" between another exam~lewhere relaxation time mea-
the two extreme jump positions. These sepa- surements provide an insight into dynamics
rate states are not detectable on the chemical- processes. Although it is generally thought
shift timescale but lead to an average proximal that the taxane core is rigid, 13C relaxation
environment for Ha and an average distal en- data suggest that a degree of flexibility (on the
vironment for Hb (attributed to rapid inter- nanosecond timescale) may be present and
change between a and b in Fig. 12.131, which may vary for different taxol analogs (48). In
particular it appears that the removal of cer-
are seen in the low temperature spectra. How-
tain side chains mav " introduce additional flex-
ever, these fast motions are detected by relax-
ibility into the core region that would not eas-
ation studies. Although the rate of this motion
ily be predicted based on a simple inspection of
is rapid, its amplitude is not sufficient to aver- the structure.
age the environment of proximal and distal Another example of the application of line-
protons. Occasionally (about once every 1000 shape analysis to ligand dynamics is described
jumps) the outer ring jumps further than the in Section 3.2 for the drug trimetrexate when
nominal 90" range, exchanging the environ- bound to dihydrofolate reductase (DHFR).
ments of the proximal and distal protons (a @ From that example and earlier studies on
c and b @ d in Fig. 12.13). Although the actual DHFR (63-65), it is clear that the techniques
rate of an individual ring flip is rapid, the ef- described above can equally be applied to li-
fective rate of the process is on the microsec- gands when bound to their receptor. In some
ond timescale, because on average a large cases significant but highly specific mobility
number of small amplitude jumps occur for appears to be present at the bound site.
, every large amplitude ring flip. It is the ex-
2.6 Pharmocophore Modeling:
changing of proximal and distal protons on the
Conformations of a Set of Ligands
microsecond timescale that is detected by the
variable-temperature line-shape studies. Determination of the conformations of a range
The fact that thyroxine is apparently able of ligands that all act at the same receptor site
to so freely move over a moderately large re- can provide significantly more information
gion of conformational space has implications than just a single ligand structure. With a suf-
for receptor binding. The crystal structure of ficiently broad range of ligands, it is often pos-
the thyroid receptor ligand-binding domain sible to generate a pharmocophore model of
complexed with the thyroid agonist 3,5-di- the receptor site, deduced based on conserved
methyl-3'-isopropylthyronine (59) shows that structural features and the conformations of
the thyroid hormones bind at the center of the the ligands. This has been done recently, for
hydrophobic core of the ligand-binding do- example, for the o-conotoxins, the broad class
main and may play a structural role in the of conotoxins to which Ziconotide (or MVIIA,
conformational changes that activate the re- mentioned above) belongs. From structural
ceptor. The structures of the retinoid-X recep- studies of a range of o-conotoxins and from
tor ligand-binding domain (61) and the reti- literature data on various mutants with al-
noic acid-retinoic acid receptor ligand-binding tered binding affinities, it was determined
domain complex (62) indicate that significant that only a localized region of the surface of
conformational changes accompany ligand these molecules is involved in receptor binding
binding in those cases. The conformational (66). This allowed a pharmocophore model of
flexibility exhibited by the thyroid hormones putative receptor-binding pockets to be devel-
may also be required for binding. It has been oped.
suggested that the rapid "wiggling" of the ar- The advantage of such a pharmocophore
omatic rings could enable the hormone to model is that smaller, nonpeptide molecules
work its way to the center of the ligand-bind- that might have improved stability and bio-
ing domain as the protein reorders itself about availability over their peptidic counterparts
can, in principle, be designed. The NMR ap- tor is crucial. It is clearly better if this can be
proach used in such pharmocophore modeling measured directly rather then be inferred
often involves a combination of many of the from the conformation of the free ligand. In
techniques already described. By determining certain circumstances this information on the
information about structure and electronic bound conformation can be obtained from the
properties for a range of different ligands, all transferred NOE (TrNOE) technique (67,68).
acting at the same receptor site, it is often This method takes advantage of the fact that
possible to infer information about the bind- NOEs build up more rapidly in a ligand-mac-
ing site, even if direct structural studies of this romolecule complex than they do in free
site are not possible. ligand, and given appropriate exchange
conditions for a mixture of ligand and macro-
2.7 Limitations of Analog-Based Design molecule (typically satisfied for KD 2
M-'1, then signals from a free ligand may be
Although a determination of the structure of
used to determine the bound conformation.
bioactive molecules is of key importance, there
The theory of the technique was reviewed
are distinct limitations on the use of solution
previously (69) and recent developments that
structures for drug design. In particular, un-
minimize potential artifacts from spin diffu-
less the molecule is rigid there is no certainty
sion have been described (5). Because it is not
that the solution conformation is the same as
necessary to monitor signals from the macro-
the bioactive bound conformation. For this
molecule in this technique, it is usually
reason there has been a shift over recent years
present in substoichiometric amounts, thus
to approaches in which information about the
requiring only minimal amounts of what is
bound state is obtained. The other approach
sometimes the more expensive component of
has been to probe the bound conformation by
ligand-macromolecule complexes. In addition,
making a range of constrained analogs of a
the molecular weight restrictions inherent in
flexible lead molecule, as illustrated earlier for
full 3D-structure determinations of complexes
endothelin.
are ameliorated and the conformations of li-
The most direct way of determining the
gands bound to very large macromolecules
conformation of a drug lead is to determine
may be determined. For example, the tech-
the full three-dimensional structure of its re-
nique was recently used to determine the
ceptor complex. This has now been achieved in
structure of an antibiotic bound to the ribo-
a significant number of cases but represents a
some (70).A range of other applications in-
substantial undertaking, as described in later
cluding enzyme-substrate, protein-carbohy-
sections of this chapter. A simpler approach
drate, and protein-peptide interactions have
that has also been applied is to use transferred
recently been summarized (5).
NOE methods, as described below. This ap-
In addition to its application as a tool for
proach fits at the interface of ligand-based de-
determining bound conformations of ligands,
sign and receptor-based design. It fits with the
the TrNOE method has also been used re-
former because no knowledge of the receptor
cently as a screening aid for the identification
structure is required, but it also fits with the
of ligands from mixtures th atbind to a protein
latter because it reauires the macromolecule
of interest. This application is addressed in
of interest to be inckded in the mixture to be
more detail later in this chapter.
analyzed. It is appropriate therefore to intro-
duce the topic here but also to discuss it fur-
ther in Section 3.
3 RECEPTOR-BASED DESIGN
2.8 Conformation of Bound Ligands:
Receptor-based design refers to the process of
Transferred NOEs
determining the three-dimensional structure
In ligand-based drug design it is not necessary of a macromolecular target and using this in-
to know the structure of the receptor, or even formation to design ligands to interact with it.
the location of the binding site, although the In general there have been few cases where
conformation of the ligand bound to the recep- the structure of a macromolecule or receptor
alone has been successfully used to design, de phy. Despite this limitation, NMR has made
novo, a ligand to interact with that receptor. major inroads into the macromolecular struc-
However, such an approach is likely to become ture determination process, and currently ap-
more common with improved computer-based proximately one-fifth of all new structures de-
approaches to molecular design in the future posited in the protein database have been
(71).Currently, the most common approach is determined by NMR spectroscopy.
to study a ligand-macromolecule complex and
to initiate the design process based on the in- 3.1.1 Overview of Approach. The basis for
teraction of the lead ligand with the macro- structure determination by NMR is that, by
molecule. determining a large number of distance re-
Although the structure of the macromole- straints between pairs of protons, it is possible
cule alone is of less interest than that of the to reconstruct a three-dimensional image of
complex, in many cases a determination of the the molecule. These distance restraints are de-
structure of the complex follows from earlier rived primarily from nuclear Overhauser ef-
studies on the unbound macromolecule. It is fect (NOE) measurements, which detect dis-
thus useful to describe the approaches to tances up to about 5 A. Over recent years such
structure determination of macromolecular distance restraints have been supplemented
targets. This is followed by a discussion of the by a range of other restraints, including dihe-
dynamic aspects of protein structures in dral angle restraints derived from coupling
Section 3.2, before addressing the main topic constant measurements and orientation re-
of macromolecule-ligand interactions in Sec- straints derived from residual dipolar cou-
tion 3.3. plings. These restraints are input into a simu-
lated annealing algorithm, which is used to
3.1 Macromolecular Structure
calculate a familv " of structures consistent
Determination
with the restraints.
The two major techniques for determining NMR is unique in that it can provide de-
three-dimensional structures of proteins or tailed and specific information on molecular
nucleic acids are X-ray crystallography and dynamics in addition to structural informa-
NMR spectroscopy. The crystallographic ap- tion. The use of relaxation time measure-
proach to structure determination is de- ments allows the relative mobility of individ-
scribed elsewhere of this volume and here the ual atomic positions within a macromolecule
focus is on NMR. NMR has been used to deter- to be determined. The dynamic information
mine the structures of proteins only for about obtained includes not oily the rates or fre-
the last 15 years, with the first NMR structure quencies of internal motions but also their am-
determination being made in 1985. NMR has a plitudes. Such amplitudes are often expressed
number of advantages over X-ray crystallogra- by order parameters. Not surprisingly, it is
phy, including the fact that the requirement observed in many cases that the termini of
that the protein needs to be crystallized is proteins are more flexible than internal re-
avoided, and that the dynamic information gions. More interestingly, NMR has provided
available from NMR studies com~lementsthe a number of examples where internal loops in
structural information. A major disadvantage proteins have been shown to have dynamics
of NMR spectroscopy, though, is that it is cur- that may be associated with their function. A
rently limited to the determination of struc- good example of this is HIV protease, where
tures of <35 kDa. With the development of NMR studies have identified reduced-order
new NMR techniques, such as TROSY (201, parameters in the flap region of the molecule
this seems certain to increase significantly that may reflect flexibility to allow entry of
over coming years, although the fact remains substrates or inhibitors into the active site.
that among all structures currently deposited In summary, a major strength of NMR is
in the protein database the average size of that a global picture not only of the structure
NMR structures is about 8 kDa (72),substan- but also of the dynamics of the macromolecu-
tially smaller than the average size of protein lar target is obtained. Further, NMR provides
structures determined by X-ray crystallogra- information on ionization states of titratable
groups and other electronic features within of structural genomics programs being devel-
macromolecules that may have an impact on oped. The demands arising from such pro-
ligand binding and function. grams will no doubt stimulate new methods
for the large-scale production of labeled pro-
3.1.2 Sample Requirements and Assignment teins (77, 78), and for speeding up the rate of
Protocols. Structure determination by NMR structure determination by both NMR and
typically requires 500 pL of a 1-2 rnM solution crystallography.
of the protein of interest. It is important that
the macromolecule does not aggregate be- 3.1.4 Dynamics. Proteins exhibit a range
cause this causes spectral broadening and may of internal motions, from the millisecond to
preclude assignment. The sample should pref- nanosecond timescale, and a full understand-
erably be stable in solution over the extended ing of how small drugs might interact with
period of time required to collect the range of such a "moving target" requires more than
NMR experiments (73-75) needed for assign- just the time-averaged macromolecular struc-
ment and structure determination. Individual ture. Thus, over recent years much effort has
experiments may last from a few hours to sev- been directed toward defining motions within
eral days, with several weeks of data acquisi- vroteins.
A
tion required for studies of larger proteins. The most commonly applied approach has
The particular set of NMR experiments re- been to use 13C or 15N relaxation parameters
quired for NMR structure determination de- such as TI, T,, and the heteronuclear NOE to
pends on the size of the protein. It suffices to derive correlation times for overall motion, to-
say that for smaller proteins ( 5 7 kDa) it is gether with rates and amplitudes of internal
usually possible to determine the structures, motions (79). Although the precise interpreta-
mainly using 2D NMR, without the need for tion of the NMR relaxation data in terms of
isotopic labeling, by use of procedures de- motional parameters remains dependent on
scribed in detail above for Ziconotide. For pro- the appropriateness of the motional model
teins in the range 7-14 kDa, 15N-labelingand chosen, the results from many studies on the
a combination of 2D13D NMR experiments is dynamics of proteins are sufficiently clear to
usually sufficient, whereas for larger proteins confirm that nanosecond timescale motions in
13C/15Nlabeling and 3D or 4D NMR is more or proteins are common. The functional signifi-
less mandatory. For proteins at the top end of cance of motions on the nanosecond timescde
the currently accessible range (25-35 kDa), remains unclear and so far there have been
there are additional advantages associated few cases where significant differences in mo-
with partial deuteration of the protein. tions on this timescale between ligand-free
and ligand-bound forms of proteins have been
3.1.3 Recent Developments. A number of measured. It will be interesting to assess the
recently developed methods offer the potential functional significance of such motions as
for improving the quality of NMR structures more data become available. However, slower
and for increasing the size of proteins that can motions have been correlated with function in
be examined. In particular, the use of residual a number of proteins, with a good example
dipolar couplings and of anisotropic contribu- being HIV protease, described in more detail
tions to relaxation provide new kinds of re- in Section 4.2.
straints that promise to lead to more accurate Relaxation measurements require a con-
NMR structures (74, 76). As already men- siderable investment of svectrometer time
A
tioned the TROSY method (20) exploits relax- and in some cases it may be possible to derive
ation phenomena to produce spectra with nar- basic information about molecular dynamics
row lines and promises to significantly expand from the structural ensemble alone. Although
the size of protein targets that can be exam- regions of disorder can reflect factors other
ined by NMR, from the current limit of about than dynamics, a recent analysis (55)suggests
35 kDa to perhaps >100 kDa. that ill-defined regions in structural ensem-
Another development that is likely to have bles often do reflect slow, large-amplitude mo-
a significant impact is the increasing number tions. Even if relaxation measurements are
3 Receptor-Based Design
done, it is often not necessary to undertake including hormone, neurotransmitter or drug

extensive analysis to derive correlation times, binding, antigen recognition, and enzyrne-
given that trends are often apparent from the substrate interactions. Fundamental to each
raw experimental data. For example, in the of these interactions is the recognition by a
case of tendamistat,described above, it is clear ligand of a unique binding site on the macro-
directly from the heteronuclear NOE data molecule. Through an understanding of the
that significant internal mobility is present at specific interactions involved it may be possi-
the N-terminus. ble to design or discover analogous ligands
with altered binding properties that might in-
3.1.5 Nucleic Acid Structures. Most of the hibit the biochemical function of the macro-
discussion on macromolecular targets so far molecule in a highly specific manner. The
has focused on proteins. DNA represents an- study of macromolecule-ligand interactions
other valuable target in drug design. Most thus forms the cornerstone of most structure-
studies in which DNA is the target are done based drug design applications. The macro-
using short model oligonucleotides to mimic molecule of interest may be a protein or a nu-
the binding region of DNA. The regular re- cleic acid, although the majority of drug design
peating nature of DNA structures makes this applications have focused on protein-ligand
a more successful approach than similar at- interactions. For this reason we will refer
tempts to dissect out binding regions of recep- mainly to protein-ligand interactions in the
tor proteins, where often the whole protein following discussion, but will include some ex-
must be present to maintain a viable binding amples of drug-DNA interactions.
site. Similar comments apply for RNA, where
improvements in synthetic methods have led 3.2.1 Overview. There are several imvor-
to an increasing number of structure determi- tant aspects of macromolecule-ligand interac-
nations over recent years. The principles in- tions that have a bearing on structure-based
volved in structure determination of nucleic design. The simplest question that might be
acid targets are similar to those of proteins, asked is "what is the strength of the binding
but in practice nucleic acid structures are interaction?," whereas the most detailed task
somewhat more difficult to solve. would be to precisely define the atomic coordi-
nates of the complete protein-ligand complex.
3.1.6 Challenges for the Future: Membrane- In between these extremes there are many -
Bound Proteins. The majority of targets for other questions important to the drug design
currently known drugs are membrane-bound process; these include questions about the
receptors, yet this represents the class of pro- binding stoichiometry and kinetics, the con-
teins for which least structural information is formation of the bound ligand, and about the
known. Membrane proteins are notoriously nature of functional group interactions be-
difficult to characterize at a structural level tween the protein and bound ligand. These
because they are difficult to crystallize, thus and other important questions were intro-
inhibiting X-ray crystallographic studies, and duced briefly in Table 12.3 and are examined
are both too large and too difficult to reconsti- in more detail later in this section. Before do-
tute in suitable media for NMR studies. Nev- ing this it is first necessary to consider NMR
ertheless, solid-state NMR methods are begin- timescales because the ability of NMR meth-
ning to show promise that eventually such ods to answer questions about macromole-
targets may be structurally characterized cule-ligand complexes depends critically on
(80). Rotational resonance solid-state NMR the kinetics of the binding interaction. Section
measurements, for example, allow precise dis- 3.2.2 thus describes how various NMR param-
tances to be measured in membrane-bound eters depend on binding kinetics and in partic-
proteins (81). ular how fast- and slow-exchange conditions
affect the interpretation of NMR data.
3.2 Macromolecule-Ligand Interactions
Having identified the exchange regime, the
Macromolecule-ligand interactions are inte- task then becomes to decide which NMR -pa-
gral to a wide range of biological processes, rameters can be used to answer the questions
Table 12.7 N M R Parameters and Their Changes on Binding

Parameter Difference Typical Magnitudea
Chemical shift VL - VML 0-1000 s-l
%I - V M L 0-500 s-'
Coupling constant J
I. - JML 0-12 s-l
J~ -J M ~ 0-12 s-l
Relaxation rateb l / T L- l/TML 0-50 s-' (for T I ,larger for T,)
l/TM- l / T M L 0-10 s-l
"Ranges are approximate only and larger effects may be seen in some cases.
bllT refers to either 1/T, or 1/T,.
posed above about the complex. Many of the that KD values may vary over a wide range,
NMR parameters that were described earlier typically from millimolar to nanomolar (i.e.,
for deriving information about ligands are also K, = lop3-lop9 M) for cases of interest, is a
applicable to studies of complexes. These in- reflection of a variation in k,, for different
clude chemical shifts, NOES, and relaxation ligands. Consideration of the k,, value above
parameters. However, the presence of two in- and the range of KD values noted suggests a
teracting partners means that there are some range in k,,from 10-I to lo5 s-I. The lifetime
differences in the way such parameters are of the bound complex (T, = llk,,) may thus
measured and this has led to the development vary from much less than a millisecond to tens
of several techniques that are particularly im- of seconds (lop5to 10 s based on the above off
portant for the study of macromolecule-ligand
rates). The exchange rate for the second-order
interactions, including chemical-shift map-
binding process is given by (82):
ping, isotope editing, and various NMR titra-
tions. Section 3.2.3 describes these tech-
niques. Finally, illustrative examples of the
application of these techniques to specific drug
design problems are given in Section 3.2.4.
where pMLand p, are the mole fractions of
3.2.2 Influence of Kinetics and NMR Time- bound and free ligand, respectively.
scales. Macromolecule-ligand interactions are The appearance of an NMR spectrum of a
characterized by an equilibrium reaction that protein-ligand complex is dependent on the
potentially has a wide range of affinities and rate of chemical interchange between free and
rates: bound states. In particular, the effects of ex-
change on an individual NMR parameter (e.g.,
chemical shift, coupling constant, or relax-
ation rate) depend on the relative magnitude
The rate constant for the forward reaction is of the exchange rate and the difference in the
referred to as the on rate (k,,), whereas disso- NMR parameter between the two states. The
ciation of the complex is characterized by the cases where the rate of interchange is greater
reverse rate constant, k,, The equilibrium than, about equal to, or less than, the param-
constant for this interaction, represented in eter difference are referred to as fast, interme-
terms of the dissociation constant of the com- diate, and slow exchange, respectively, as in-
plex KD, reflects a balance of the on and off dicated in Table 12.7.
rates, as shown in Equation 12.2: Table 12.7 shows that the changes in chem-
ical shifts on ligand binding (for signals either
from the ligand or from the macromolecule)
are in general greater than those for coupling
For many protein-ligand interactions k,, is of constants or relaxation rates. Given that 100
the order of 10sM-' s-I, and is typically quite s-' might represent a typical exchange rate
similar for different ligands. The observation between free and bound states, it is clear that
Bound
Intermediate
- 1 Figure 12.14. Schematic illus-
h tration of the effects of slow, in-
termediate, and fast exchange
on the appearance of peaks in
NMR spectra of macromolecule-
ligand complexes. In the slow ex-
change case separate peaks are
seen for free and bound forms.
Note the broader peak for the
bound ligand because it now
adopts the correlation time of .
the macromolecule. In the fast
exchange case only an averaged
Fast peak is observed.
individual NMR signals may be found in either This reflects the sensitivity of relaxation pa-
slow, fast, or intermediate exchange on the rameters to molecular mobility: a ligand un-
chemical-shift timescale, but it is more likely dergoes a greater relative change in mobility
that couplings or relaxation parameters will on binding than does a protein, given that the
be in fast exchange. Thus, in most cases where relative increase in molecular weight in the
the term "NMR timescale" is used in the liter- complex is much greater for the ligand than
ature, it refers to the chemical-shift timescale. for the protein.
The table also emphasizes that there are two The exchange regime (slow, intermediate,
types of signals that can be monitored, those or fast) determines how a spectrum of a pro-
from the ligand and those from the macromol- tein-ligand mixture changes during a titra-
ecule. In general, the typical magnitude of tion, or as a function of temperature. Figure
changes to chemical shifts or couplings of ei- 12.14 schematically illustrates the various ex-
ther type of signal on binding are similar, al- change regimes for macromolecule-ligand
though the changes to ligand signals may be binding interactions. Slow exchange, corre-
larger than those from the macromolecule. sponding to tight binding, is potentially the
However, changes to relaxation parameters most useful regime, given that much detailed
for signals from ligands are much more likely information on the nature of a complex can be
to be greater than those for protein signals. deduced in this case. Nevertheless, fast ex-
change also allows valuable kinetic and ther-

modynamic parameters to be derived. The
analysis is more complex for intermediate ex-
change and few quantitative studies are at-
tempted for this situation.
3.2.2.1 Slow Exchange. This situation ap-
plies when the rate of exchange is much slower
than the difference in chemical shifts between
the two states (i.e., k << v, - v,), where we
now change to a nomenclature using sub-
scripts to refer to the bound (B) and free (F)
states. It should be understood that the sig-
nals may derive from either ligand or macro-
molecule, so although B is always related to
the ML complex, F might refer to either free
ligand (L) or free macromolecule (M). In this
situation separate peaks are potentially ob-
servable for both free and bound states at
their respective chemical shifts. Whether such
signals are actually observed depends on the
mole ratio of the individual species in a titra-
tion, and on the signals not being obscured by
overlap or broadening.
Addition of a ligand to a solution of a pro-
tein results in the appearance of new signals
attributed to bound protein resonances, with a
concurrent decrease in the intensity of the
free protein resonances, reflecting the de- Figure 12.15. 19FNMR spectra at 282 MHz of the
creased proportion of free protein during the 4-fluorobenzenesulfonamide-carbonic-anhydrase-1
system at various ratios of enzyme to inhibitor, as
titration. Once a stoichiometric mole ratio is
indicated on the traces. The peak at - 6 ppm is
achieved (usually 1:1, but sometimes 2:l or caused by bound inhibitor. The enzyme concentra-
higher if multiple binding sites are present on tion was 1 rnM a t pH 7.2 in D,O at 25°C. (Reprinted
the protein), peaks from free ligand appear with permission from Ref. 83; O 1998, American
with increasing intensity as the excess of free Chemical Society.)
ligand increases.
From such a titration it is possible to deter- orobenzenesulfonamide to the enzyme car-
mine the stoichiometry of the complex, to- bonic anhydrase (83). Figure 12.15 shows 19F
gether with the chemical shifts of the bound spectra of the enzyme inhibitor complex at
states of the ligand and protein. In ID NMR various mole ratios. The broadened peak for
spectra, overlap of peaks makes it difficult to the bound ligand has a chemical shift of ap-
monitor more than a few resonances from ei- proximately 6 ppm and is in slow exchange
ther species and such studies are most readily with the peak from free ligand at 0 ppm. The
done when there is a well-resolved signal on stoichiometry of the complex in this case is
one of the interacting species. Selective iso- 2:1, so that no signal from free ligand is visible
tope labels have been used in the past for such until more than 2 moles of inhibitor are
studies but it is now more common to use uni- present. Addition of increasing amounts of li-
form 15N- or 13C-labeling of the protein and gand results in an increase in the free ligand
detect the chemical shifts in 2D HSQC spec- signal, but no change in the bound ligand
tra. It is often more difficult to label the ligand signal.
but in some cases the presence of rare nuclei Determination of the binding constant
such as 19Fcan be used to advantage. A good from slow exchange spectra is not usually at-
example is the binding of the inhibitor 4-flu- tempted. Generally for slow-exchange condi-
tions to exist in the first place, the binding is
submicromolar in affinity and non-NMR
methods are more suitable for determining af-
finities in these cases. NMR studies are done
at millimolar concentrations, making it diffi- These equations show that, in the fast-ex-
cult to determine KD with any accuracy for change limit, addition of a ligand to a protein
tight binding systems. solution will cause a progressive change in
In principle, kinetic information on the chemical shift. Signals from the protein ini-
complex can be obtained from slow-exchange tially reflect the free state, but as ligand is
spectra, as seen from the expressions for T, for added the population of bound protein in-
free and bound ligand signals: creases and the observed signals move toward
those of the bound state. Similarly, when li-
gand signals are first detected they reflect pre-
dominantly the bound state, but with increas-
ing amounts of ligand they move toward the
chemical shift of the free state. By regression
analysis to Equation 12.7, taking into account
Because the linewidth of a peak is related to T,
the dependency of the mole fractions on KDby
by the standard quadratic binding equation, it is
possible to obtain estimates ofboth KDand the
bound shift. The procedure works best for
rather weakly binding ligands (e.g., millimolar
then measurements of linewidth during a ti- dissociation constants) (85).
tration can be used to derive kOff.Equations When exchange is somewhat slower, but
12.4 and 12.5 show that, although the signal still within the fast-exchange limit, there is an
from bound ligand is independent of concen- exchange contribution to linewidth, as shown
tration, that of the free ligand decreases in in Equation 12.9:
linewidth as more ligand is added. A plot of
linewidth vs. ligand/macromolecule mole ratio
allows k,, to be determined, as illustrated for
example in a study of the 31Plinewidths for
the 2'-phosphate of NADPt binding to dihy-
drofolate reductase (84). In this case a maximum in the broadening of
Although the determination of off rates is ligand or protein peaks occurs during the ti-
of significance in assessing the stability of the tration at a mole ratio of approximately 0.3, as
complex, the major interest in more recent illustrated below.
studies of complexes in the slow-exchange The spectral changes that occur in the fast-
limit has centered on determining the com- exchange regime can conveniently be illus-
plete geometry of the complex through the use trated by studies on the binding of a series of
of intra- and intermolecular NOES. A recent, terephthalamide ligands to an oligonucleotide
but already classic, example of this approach is model of DNA. The ligands, referred to as
illustrated by the binding of immunosuppres- L(NO,), L(NH,), and L(Gly), were synthe-
sant peptides such as cyclosporin and FK506 sized as precursors for potential anticancer
to their receptors. These types of examples are agents (86).To establish whether they bind in
discussed in more detail in Section 3.2.4.2. the minor groove of AT-rich DNA, a series of
3.2.2.2 Fast Exchange. When exchange be- NMR titration experiments was undertaken.
tween free and bound states is very fast, ob- Figure 12.16 shows an expansion of the al-
served NMR parameters are a simple iphatic region of a series of 'H-NMR spectra of
weighted average of those from the two con- 0.5 mM of the oligonucleotide d(GGTAAT-
tributing states, illustrated by Equation 12.7 TACC),, to which increasing amounts of
for chemical shifts and Equation 12.8 for line- L(NH,) were added (86), of which the spectra
widths. cover mole ratios of ligand to DNA duplex
ranging from 0:l to 2.6:l. Although the spec- In the fast-exchange cases such as this it is
tra are complicated by overlap in some re- possible to obtain an estimate of the dissocia-
gions, it is clear that addition of the ligand tion constant for the complex (KD) and the
causes significant changes to the DNA peaks. bound chemical shift (vB) of DNA resonances
A typical example is seen for the T6 methyl by fitting the observed chemical shift changes
peak, for which addition of ligand causes both as a function of ligand concentration to equa-
an upfield shift and broadening of the peak at tion 12.7 (85). The parameters that best fit the
- of the titration. The chemical
certain stages experimental data for the T6 methyl peak
shift moves monotonically with ligand concen- were KD = 1.2 x M and (v, - v,) = 46
tration up to a mole ratio of 1:l and then Hz. Limitations on the accuracy of KDvalues
reaches a plateau, remaining constant as derived in this way were described previously
larger amounts of ligand are added. Broaden- (85).
ing of the peak reaches a maximum at a ligand: To further define the thermodynamic con-
DNA mole ratio of approximately 0.3. Both ob- stants associated with binding, the linewidth
servations are consistent with there being data were also quantitatively examined by use
moderately fast exchange on the chemical- of Equations 12.6 and 12.9. In the case of mod-
shift timescale between the free and ligand- erately fast exchange, a maximum linewidth is
bound forms of the DNA in solution. In this predicted at a 1igand:DNA mole ratio of 0.33
case, the observed spectral peaks reflect nei- (82, 85), and this was indeed observed in the
ther the free nor the bound form of DNA, but current case. Derived binding parameters
are averaged signals. were KD I1.0 x M, kOff= 250 s-l, (v, -
Ligand peaks are also in fast exchange, as v,) = 49 Hz, and LWB = 12 Hz, consistent
seen with the L( NH,) methyl peak, which first with the values derived from the analysis of
appears at a 1igand:DNA ratio of 1.36:l as a chemical shifts. Subsequent studies with the
shoulder on the overlapped T3 and T7 methyl related ligand L( NO,) showed similar binding
peaks at approximately 1.27 ppm. This peak is to L(NH,). However, a third ligand, L(Gly),
not initially visible in spectra at low ligand: was found to bind somewhat more tightly,
DNA mole ratios because of the small popula- with some signals in the intermediate ex-
tion of bound species and the overlapping change regime.
DNA peaks. It moves upfield with increasing 3.2.2.3 Intermediate Exchange. In this re-
ligand concentration and, again, represents an gime the rate of exchange between bound and
averaged peak intermediate in chemical shift free states is comparable to the differences in
between free and bound forms, reflecting fast- NMR parameters associated with the ex-
exchange kinetics. Eventually, the chemical change. In general the spectral peaks often be-
shift of this signal approaches that of the free come very broad and analysis is difficult. This
ligand at 1.1 ppm, measured in a separate ex- is the case, for example, for L(G1y). In the
periment with a solution of ligand alone. methyl region of the spectra shown in Fig.
Figure 12.16. Expanded regions from 300-MHz 'H-NMR spectra for complexes between L(NH,)
and d(GGTAATTACC), recorded at 10°C. The two small peaks at 1.12 and 1.14 ppm arise from an
impurity. Increasing ligand concentration causes an upfield shift of the T6 methyl resonance (a),and
causes the T7 and T3 resonances to become overlapped a t later stages of the titration (c). Peak (b) is
an averaged resonance from the ligand methyl groups intermediate in shift between the bound and
free forms of the ligand. (Reprinted with permission from Ref. 86.)
12.17, the T7 CH, signal moves upfield and mediate exchange between the free and bound
the T3 CH, signal moves slightly downfield forms, with k,, = (v, - v,). Based on the mag-
with increasing ligand concentration, as seen nitude of v, - v, for this resonance, k,, for
previously for L(NH,) and L(N0,). However, L(G1y) is estimated to be 50 s-l, which is sig-
in contrast to the case for the other ligands, nificantly slower than that for L(N0,) and
the characteristic broadening of peaks at in- L(NH2).
termediate ratios is non-Lorentzian, suggest- At a 1igand:DNAratio of approximately 1:1,
ing kinetics in the intermediate exchange re- the ratio of the integrals of the T6 methyl peak
gime. The T6 CH, peak does not shift in the and the overlapped T3 and T7 methyl peaks is
characteristic fast-exchange manner but, in- about 1:6. The expected value is 1:2, which
stead, a new broad resonance appears close to indicates that the bound ligand methyl peak (4
the expected position of the bound T6 CH, x CH,) is overlapped with the T7 and T3
chemical shift on the first addition of ligand, methyl peaks, as observed with L(NH,) and
and increases in intensity with increasing li- L(N0,). When the 1igand:DNA ratio is in-
gand concentration. This observation is con- creased beyond a 1:l ratio, a new peak appears
sistent with the ligand being in slow to inter- at about 1.15 ppm and increases in intensity as
Figure 12.17. Expansions from the 600-MHz 'H-NMR spectra for complexes formed between
L(G1y)and d(GGTAATTACC), showing the methyl resonances. The two small peaks at 1.12 and 1.14
ppm are attributed to an impurity. The complex nature of the T6 methyl resonance at 1igand:DNA
ratios less than 1:l (a), and the manner in which signal intensity increases at about 1.15 ppm a t
DNAligand
- ratios greater
- than 1:l (b). are indicative of intermediate exchange. (Reprinted with
permission from Ref. 86.)
the ligand concentration is increased. This methyl resonances of the ligand have complex
new peak corresponds to the methyl peak of characteristics reflecting slow-intermediate
the free ligand and its appearance in this man- exchange. At higher temperatures, k,, 2 (v,
ner is consistent with slow exchange on the -
v,), SO the signal appears as a fast-ex-
chemical-shift timescale. To confirm this, changed average between the free and bound
spectra of a 2:l mixture of L(G1y) and d(GG- resonances. From a qualitative analysis of the
TAATTACC), were acquired at different tem- spectra, k,, for L(G1y) was estimated to be
peratures (86), as illustrated in Fig. 12.18. 50-60 s-' at 283 K.
At low temperatures, signals at 1.15 and The fact that some peaks (e.g., the oligonu-
1.30 ppm (overlapped with the T7 CH, and T3 cleotide T7 and T3 methyl signals) exhibit fast
CH, peaks) attributable to the methyl groups exchange, whereas others in the same spec-
from the free ligand and bound ligand, respec- trum of the same complex exhibit slow-inter-
tively, are distinguishable. As the tempera- mediate characteristics, is a reflection of the
ture is increased, a broad peak appears be- different (v, - v,) values for different peaks.
tween these two signals (at -1.22 ppm). At the This emphasizes the point made earlier that
lower temperatures k,, I(v, - v,), so that the "exchange regime" is a relative expression
Figure 12.18. Expansions of 'H-
NMR spectra of a 2:l mixture of
L(Gly) and d(GGTAATTACC1, ac-
quired at different temperatures.
(Reprinted with permission from
Ref. 86.)
and depends not only on the rate of exchange, ligand interactions. As well as being able to
but also the size of the chemical shift differ- observe different nuclei, measurements may
ences involved. In summary, the observations be made of a range of different NMR parame-
suggest that k,, for binding of L(G1y) to the ters, including chemical shifts, linewidths,
oligonucleotide duplex is much slower than coupling constants, and relaxation parame-
that for the other two derivatives. This pro- ters. In addition, there are several specific
vides an illustration of the value of NMR as a NMR techniques that have been applied for
quick method for comparing the binding of dif- the measurement of these parameters. T h e ,
ferent ligands and for confirming ligand-bind- techniques that are particularly valuable for
ing hypotheses. the study of macromolecule-ligand interac-
The change in binding kinetics may be rations are described in the following sections.
tionalized by considering the different struc- 3.2.3.1 Chemical-Shift Mapping.
.. - Chemical
ture of the L(G1y) ligand relative to the other shifts are exquisitely sensitive markers of the
ligands. It was anticipated that, upon binding local charge state and environment. Although
to the minor groove, the terephthalamides it is not possible to construct an accurate
would adopt a conformation in which the sub- model of a binding site from a knowledge of
stituent on the central ring would form part of the chemical shifts of a bound ligand, a quali-
the convex edge of the ligands and therefore be tative interpretation of changes in chemical
directed toward the "mouth" of the groove. shifts of the macromolecule on binding pro-
Given this binding arrangement, the ligand vides significant insight into the location of
L(G1y) would have a positively charged alkyl- the binding site. Traditionally, such studies
amine group positioned to interact with the were done using 1D NMR but are now increas-
negatively charged phosphate groups of the ingly done by 2D HSQC spectra. By simulta-
DNA backbone. The L(G1y) derivative also has neously obtaining information on chemical
a bulkier substituent than that of the other - number of sites in a macro-
shifts for a large
ligands and this is also consistent with some molecule and seeing which ones change when
differences in its binding. a ligand binds, and which ones do not, it is
possible to deduce the location of the binding
3.2.3 NMR Techniques. NMR is a particu- site. This *~rocedureis referred to as chemical-
larly versatile tool for the analysis of protein- shift mapping. A prerequisite of the approach
544 N M R and Drug Discovery
- -------
-G= G-=m ,=2EE' 2- If IP -q -e 2- e"L I- 3L, o x Gz
u , i ; i c o c u a c u w m c o S i ; i S S S
I-
I I I I I I I I
Figure 12.19. Chemical-shift perturbations of DNA protons upon ligand binding. The lighter and
darker columns represent shifts attributed to L( NO,) and L(NH,) derivatives, respectively.
is that the chemical shifts have been assigned. fects. However, significant chemical shift
Chemical-shift mapping by use of HSQC spec- changes were also observed for some major
tra is widely used in NMR screening ap- groove protons. This illustrates the general
proaches and we will defer a more detailed dis- point that sometimes allosteric effects can
cussion on it until Section 4. cause changes at sites not directly involved in
The relative simplicity of how chemical binding. In the case of DNA, binding pertur-
shift information localizes binding sites may bations in the major groove have also been
be illustrated by continuing with the example observed for other established minor groove
introduced above of terephthalamide binding binders such as distarnycin (go), netropsin
to DNA. Figure 12.19 shows that, upon bind- (91), and Hoechst 33258 (92). Based on NOE
ing of the terephthalamides L(NH,) and and crystallographic data, it was concluded
L(N0,) to d(GGTAATTACC),, the DNA pro- that the effects were caused by distortions of
tons on the four base pairs between A5 and A8 the B-DNA duplex, including changes in the
are perturbed to a much larger degree than "base roll" of residues within the binding site,
protons in the rest of the sequence. It is thus upon complexation. Electronic effects arising
likely that these four residues form the bind- from the close proximity of charged groups
ing site. on the ligand to neighboring nucleotides
A more detailed analysis allows the binding were also found to perturb major groove
site to be further localized to the minor, rather resonances.
than to the major, groove in the region of these In the case of the terephthalamides a com-
bases. A4, A5, and A8 are the only residues parison of the minor and major groove pertur-
containing easily detectable minor groove pro- bations for a particular residue shows that the
tons (H2). These resonances, which originate minor groove protons are affected to a much
from the floor of the minor groove, are shifted greater extent. This is particularly evident for
downfield with ligand binding, whereas most AS, where the H2 proton shifts by approxi-
other resonances are shifted upfield. This ob- mately 0.25 ppm and the H8 proton is not af-
servation is consistent with the ligands bind- fected (Fig. 12.19). It is difficult to conceive of
ing in the minor groove and has been reported a binding mode in the major groove that would
for other minor groove binders such as account for such a large effect on the minor
Hoechst 33258 (87) and SN-6999 (88, 891, groove A8 H2 resonance without a simulta-
where adenine H2 protons on the floor of the neous effect on the major groove protons of T7
groove experience deshielding ring current ef- and AS. The observed 1:l stoichiometry of the
: 3 Receptor-Based Design
complex excludes the possibility that the li- ple, the bound conformation of a ligand to be
gand binds to the major and minor groove at determined from NOESY data recorded in
the same time. It is more likely, therefore, that D20. By rerunning the spectrum in H20, ad-
binding in the minor groove causes distortion ditional NOEs to exchangeable amide protons
of the DNA structure so that perturbations on the protein may be detected, thereby pro-
are observed for the major groove protons of viding information on contacts between ligand
A5 and T6, but not neighboring nucleotides. and protein. Alternatively, 15Nor 13C signals
Other examples of the use of chemical-shift may be introduced selectively into either the
mapping to locate binding sites have been ligand or protein and editing techniques used
made for ligands binding to a range of drug to select only signals attached to these labels
targets, including immunophilins, matrix and their proximate protons. This was used in
metalloproteases, and DHFR. Some of these the first example of an isotope-edited study, in
examples are described in more detail in sec- this case to examine the binding of a 15N-la-
tion 3.2.5. beled peptide-based inhibitor to pepsin (93).
3.2.3.2 NMR Titrations. There are a num- Potentially, the most useful approach in-
ber of advantages in undertaking a titration of volves uniform labeling of one of the compo-
ligand against macromolecule or vice versa nents with either 15N or 13C and leaving the
rather than just examining the final complex. other component unlabeled. It is then possible
These include introducing the possibility of to edit the spectrum by selecting for interac-
distinguishing signals from the individual tions (either through bond or through space)
components on the basis of intensities at in- that connect protons that are both one-bond
termediate stages of the titration in the slow- coupled to 15N or 13C.Alternatively, the spec-
exchange case and obtaining kinetic and ther- trum may be filtered to specifically remove
modynamic parameters associated with the such signals, thereby selecting only signals in-
interaction in the fast-exchange case. Such ti- volving protons coupled to 14Nor 12C(i.e., on
trations may be done using either 1D or 2D the unlabeled component). It is generally eas-
spectra and are very useful for establishing ier to uniformly label the protein rather than
the exchange regime of the complex, as de- the ligand, and editing methods are highly ef-
scribed in Section 2.1. A variety of parameters ficient, thus making it easy to visualize just
may be monitored in the titration, although the protein. However, because ligand signals ,
the two most common are chemical shifts and are often of interest, filtering experiments
linewidths. Examples of such titrations are play a valuable role in visualizing them. Un-
given in Figs. 12.16 and 12.17. fortunately, filtering experiments are more
3.2.3.3 Isotope Editing and Filtering. Iso- susceptible to artifacts than are editing exper-
tope editing provides a powerful way of distin- iments, although there have been recent ad-
guishing between the components in a com- vances in reducing artifacts (94).
plex without the need for a titration. It is one Another possibility is to use half-edited/
of the most useful tools for the studv " of mac- half-filtered 2D experiments to detect NOEs
A
romolecule-ligand complexes, and indeed the that specifically involve interactions between
background NMR technology that underpins protons attached to 15Nor 13Cand those that
isotope editing was developed specifically for are not. This approach is used, for example, to
the study of complexes. The principle of the detect intermolecular NOEs between a la-
approach is illustrated in Fig. 12.20 and is beled protein and an unlabeled ligand. Exam-
based on the use of isotopes to select for sig- ples of isotope editingtfiltering are given in
nals from either the ligand or macromolecule, section 3.2.4.
or signals exclusively linking both of them. 3.2.3.4 NOE Docking. In many cases the
The conceptually simplest approach is to study of a complex may follow a previous
uniformly deuterate the macromolecule, structure determination of the isolated macro-
thereby removing its signals from 'H-detected molecule and in that case it may be possible to
NMR spectra, and allowing signals from only determine much information about a complexA
the ligand to be observed. This substantially by obtaining a relatively small number of

simplifies the spectrum and allows, for exam- NOEs linking the ligand and macromolecule.
Unlabelled Deuteration
Editing
Filtering
Figure 12.20. Isotope editing and filtering can be used to select signals from either the ligand or the
protein. (a) Normal protein and ligand with no filtering or editing. (b)Selection of the ligand signals
by 2H labeling of the protein. (c) Selection of protein or ligand signals by 13C andlor 15N labeling1
editing. (d) Removal of protein or ligand signals by 13C or 15Nfiltering.
Gradwell and Feeney (95) recently analyzed structure determinations, but it was found
factors important in such NOE docking exper- that more constraints per torsion angle are
iments. In their analysis, a high resolution X- required to define docked structures of similar
ray structure of a protein-ligand complex was quality. This is because the conformation and
used to simulate loose distance restraints of orientation of the ligand are defined only by
varying degrees of quality that might typically NOES and not by covalent attachment, as is
be estimated from experimental NOE intensi- the case for amino acid side chains in a protein
ties. These simulated data were used to exam- structure. The effectiveness of different NOE-
ine the effect of the number, distribution, and constraint averaging methods was explored
representation of the experimental con- and the benefits of using "Rp6 averaging"
straints on the precision and accuracy of the rather than "center averaging" with small
calculated structures. A standard simulated sets of NOE constraints were demonstrated.
annealing protocol was used, as well as a more With these considerations in mind it appears
novel method based on rigid-body dynamics. that NOE docking can be a very cost-efficient
The results showed some parallels with those procedure for defining the environment, ori-
from similar studies on complete protein NMR entation, and conformation of ligands.
3.2.4 Selected Examples. Applications of ally related molecules such as distarnycin and
the various NMR techniaues described are netropsin (87, 90, 91, 97-99). The first struc-
now illustrated with selected examples. The tural studies of Hoechst 33258 complexed to
examples have been chosen to give a broad short sequences of synthetic oligonucleotides
perspective on the types of NMR experiments were done using X-ray crystallographic meth-
that can be done and the types of information ods (100-102). NMR and further X-ray stud-
they provide. Specifically, the first example ies followed (92, 103-107). Three of the X-ray
covers the case of drug-nucleic acid binding studies (100, 101, 103) used the EcoRI
and focuses on more traditional NMR experi- sequence d(CGCGAATTCGCG), and another
ments, involving relatively standard homo- (102) used the sequence d(CGCGATAT-
nuclear methods. The second example covers CGCG),. Both sequences fulfil the require-
binding of moderately large ligands to immu- ment of at least four consecutive AT base
nophilins and highlights modern isotope edit- pairs, and the resulting complexes showed
ing techniques. The third example, covering similar modes of binding. In all of the X-ray
ligand binding to a matrix metalloproteinase, studies, the Hoechst ligand was found to bind
also highlights the importance of these tech- to the minor groove.
niques and shows how relatively simple spec- The NMR studies of complexes between
tra involving lsF-containing ligands can be Hoechst 33258 and oligonucleotide sequences
very informative. The fourth example de- provided complementary information to the
scribes ligand binding to DHFR, one of the crystal structure data (92, 103-106). Because
most extensively studied systems by NMR, the binding is reversible, the NMR data offer
and illustrates the derivation of a broad range the opportunity to derive information about
of kinetic and geometric information on inter- the kinetics of the interaction. As with the
molecular complexes. The final example, on crystallographic studies, the oligonucleotide
HIV protease, describes how NMR comple- sequences were designed to contain runs of AT
ments X-ray studies and provides information base pairs. Some NMR studies were per-
on dynamic motions within complexes. formed with dodecanucleotide sequences used
3.2.4.1 DNA-Binding Drugs. The NMR ap- in crystallographic studies, including d(CGC-
proaches that have been used to examine the GAATTCGCG),, which allowed a direct com-
interactions of minor groove binding drugs parison with the crystallographic data. Exper-
with DNA can be illustrated with studies iments were also performed with sequences
on the bisbenzimidazole-based compound, specifically designed to investigate different
Hoechst 33258, (9).It has been used widely as aspects of the interaction. The sequence
a fluorescent cytological DNA stain and is also d(CTTTTGCAAAAG), was designed to offer
active as a n anthelmintic agent. It has activity two binding sites, and it was shown that two
against intraperitoneally implanted L1210 Hoechst molecules interacted with the DNA
and P388 leukemias in mice (96). duplex in symmetry-related orientations at
Footprinting studies (96) have shown that the 5'-TTTT-3' and 5'-AAAA-3' sites (92).
sequences of four AT base pairs are a prereq- 3.2.4.7.7 Stoichiometry and Kinetics. The
uisite for strong binding to DNA, consistent starting point in studies of ligand-DNA com-
with similar observations for other structur- plexes is usually a titration experiment to es-
1 2 3 4 5 6 7 8 9 1 0
G G T A A T T A C C
C C A T T A A T G G
1 0 9 8 7 6 5 4 3 2 1
1.4
1
1 .O
Chemical shift (ppm)
-
1 2 3 4 5 6 7 8 9 1 0
G G T A A T T A C C
CCATTAATGG
20191817161514131211
Figure 12.21. 1D 'H-NMR spectra (recorded at 20°C) illustrating the thymine methyl region for the
symmetrical ligand-free duplex (a) and for the 1:lHoechst:d(GGTAATTACC), complex (b), which is
no longer symmetrical because of the ligand binding. x corresponds to a small impurity peak. The
DNA strands are numbered to the right of the spectra and the approximate location of the ligand is
indicated by a black bar. (Reprinted with permission from Ref. 105. Copyright 1993, Blackwell
Publishing Science.)
tablish the nature and stoichiometrv " of the added to d(GGTAATTACC),, the free DNA
complex. Complexes between the ligand and signals completely disappear at a DNA:drug
DNA duplex are obtained by adding small ali- ratio of 1:1,and the number of new resonances
quots of ligand solution to a sample of the is twice the number of previously observed
DNA dudex* with one-dimensional 'H NMR free DNA resonances (Fig. 12.21). This is a
spectra acquired after each addition. The ef- common feature of com~lexeswith 1:l stoichi-
fects observed on the NMR spectrum after ometry and reflects a loss of the dyad symme-
each addition reveal whether an interaction is try of the duplex attributed to ligand binding.
taking place and allow the interaction to be Upon addition of Hoechst 33258 to
characterized as fast or slow exchange on the d(CTTTTCGAAAAG),, the free DNA signals
NMR timescale. The stoichiometry of the in- completely disappeared at a ratio of 2:l drug:
teraction can also be determined from the ti- DNA and there was no doubling of the number
tration. of DNA resonances in the spectrum (92).From
In general, the addition of Hoechst 33258 this, it could be concluded that two molecules
to the oligonucleotide duplexes causes a de- were bound per duplex in a manner that re-
crease in the intensity of free DNA resonances tained the dyad symmetry of the DNA duplex.
and a concomitant increase in the intensitv" of The binding was also determined to be coop-
new resonances, which appear in previously erative, in that no intermediate 1:l complex
unoccupied spectral regions. This is consistent was detected (92). The formation of a 1:l com-
with the free and bound forms of the DNA plex would have resulted in a very complicated
duplex being in slow exchange with each spectrum at intermediate 1igand:DNA ratios,
other. For example, when Hoechst 33258 is given that resonances arising from the free
Figure 12.22. Aromatic re-

gion of NOESY spectrum of
a 1:l mixture of Hoechst vs.
d(CTTTTCGAAAAG1, re-
corded with a 200-ms mixing
time. Chemical exchange
cross peaks between protons
of the free DNA and the 2:l
Hoechst:DNA complex are
labeled with their identify-
ing base pair. Below the di-
agonal the H6 and H8 cross
peaks are shown, whereas
those of the adenine H2 res-
onances are highlighted in
the upper portion of the fig-
ure and labeled with a sub-
script 2. (Reprinted with
permission from Ref. 92.
Copyright 1990, Oxford
-~ -
University Press.)
DNA, the 1:l complex and the 2:l complex, indicating that the exchange is slowed at lower
would have produced four times as many ob- temperatures. The exchange rate was esti-
servable peaks relative to the free DNA spe- mated to be <10 s-' at 10°C (92).
cies. At intermediate ligand concentration, The ability to observe such dynamic ex-
however, only two sets of peaks arising from change phenomena is one of the strengths of
DNA molecules were detected. In the 2:l com- NMR relative to X-ray crystallography and
plex only four thymine methyl resonances several examples of these phenomena are de-
were detected (1.0-1.5 ppm), as expected for a scribed later in the chapter.
symmetrical DNA duplex. These are all over- 3.2.4.1.2 Binding Site. A combination of
lapped in the free DNA spectrum. In the 1:l chemical shift and NOE information can be
mixture, only signals from free DNA and from used to locate and characterize binding sites.
the 2:l complex were detected. Chemical-shift differences between reso-
The reversible nature of the Hoechst:DNA nances arising from free and bound forms of
interaction is illustrated by the observation of DNA are indicative of the nature of the inter-
chemical-exchange cross peaks in NOESY action. In all studies of the Hoechst complexes
spectra of mixtures of free and complexed oli- described above (92, 104-107) significant
gonucleotides (92, 104). This may be seen in changes to the chemical shifts of thymine H1'
the NOESY spectrum of a mixture of free and protons and adenine H2 protons were ob-
complexed d(CTTTTCGAAAAG),, shown in served, in contrast to the generally small per-
Fig. 12.22, in which many chemical exchange turbations observed for the base HWH6 and
cross peaks are observed between resonances CH, resonances located in the major groove.
arising from the free and bound oligonucleo- perturbations of this nature are consistent
tide. In a NOESY spectrum acquired at lower with binding to the minor groove. In some in-
temperature, the intensity of these chemical- stances, significant perturbations were ob-
exchange cross peaks is significantly reduced, served to major groove protons located well
ing site. In the case of the 2:l complex with

d(CTTTTCGAAAAG), (92), the largest chem-
ical shift changes occur over the 5'-TTTT-3'
and 5'-AAAA-3' regions of the duplex. In the
case of 1:l complexes, where the DNA duplex
contains an AT base-pair segment located at
the center of the sequence, greater chemical
shift perturbations are observed for reso-
nances in that region (104-106), consistent
with the binding site's being located there.
Assignment of the bound ligand and DNA
resonances enables the identification of inter-
molecular NOEs, which are required for a pre-
cise determination of the binding site. The
interaction of Hoechst 33258 with the
oligonucleotides produced a large number of
intermolecular NOEs (-25-30), placing con-
siderable constraints on the structure of the
complex and enabling the orientation of the
ligand within the binding site to be deter-
mined. The NOE contacts observed for differ-
ent complexes have a few features in common.
Figure 12.23. Schematic representation of ligand-
The contacts generally involve DNA protons
induced ring-current effects on nucleotide protons
that form the walls (deoxyribose HI') and floor (ad- associated with the minor groove, such as ri-
enine H2) of the minor groove. (+), shielding effects; bose HI' and adenine H2, clearly locating
(-), deshielding effects. (Reprinted with permission Hoechst in the minor groove. Protons of all
from Ref. 105. Copyright 1993. Blackwell Publish- four spin systems of the ligand show NOEs to
ing Science.) protons of the DNA, demonstrating that the
interaction occurs along the entire length of
within the binding site, reflecting changes in the drug. Typically, protons along one edge of
the conformation of the DNA duplex (e.g., the ligand (e.g., NH and H4'/H4") exhibit ciose
base roll, propeller twisting) (92, 106). contacts to protons on the floor of the minor
Further evidence of minor groove binding groove, showing that the bound drug is cres-
is provided by the fact that resonances arising cent shaped and isohelical with the DNA (92,
from protons on the floor of the groove, such as 104-106).
the adenine H2 and imino resonances, are Models of the interaction of Hoechst 33258
shifted downfield, whereas resonances from with the oligonucleotides studied were gener-
protons on the minor groove walls, such as the ated based on the intermolecular NOEs. The
HI' protons, are shifted upfield. This is a con- models of the 1:l complexes indicated that the
sequence of the ligand's being inserted ligand interacted with the four AT base pairs
edge-on into the minor groove. The deoxyri- located at the center of the sequence. Interest-
bose protons that form the walls of the minor ingly, there was no evidence for interactions
groove are positioned above the T-plane of the with GC base pairs on the periphery of the
aromatic rings and consequently receive up- binding sites. In the 2:l complex reported by
field perturbations to their chemical shifts. Searle and Embrey (92), the array of contacts
Protons positioned on the floor of the groove, observed located the ligand in the minor
however, generally lie in the plane of the aro- groove at the center of the 5'-TTTT-3' and
matic rings and experience downfield pertur- 5'-AAAA-3' sites, as illustrated in Fig. 12.24.
bations to their chemical shifts, as illustrated As well as defining the location of the bind-
in Fig. 12.23. ing site, intermolecular NOEs can be used to
The magnitude of chemical shift changes is determine the orientation of the ligand at that
a strong indicator of the location of the bind- site. In the case of the 2:l complex, the N-
Figure 12.24. Schematic representation of Hoechst 33258 bound to the minor groove of the 5'-
TTTT sequence. (a) Highlights of some of the NOEs that determine the position and orientation of
the Hoechst molecule within the minor groove. (b) Intermolecular hydrogen-bonding scheme. Mo-
lecular-modeling studies with an idealized B-DNA helical structure indicate that the benzimidazole
H3' is capable of forming bifurcated interactions with A11N3 and T302, whereas the benzimidazole
H3" hydrogen bonds in a similar manner, but with A10N3 and T402. In the proposed model of the
complex, these distances fall within 3.5 A and are thus within acceptable hydrogen-bonding limits.
(Reprinted with permission from Ref. 92. Copyright 1990 Oxford University Press.)
methylpiperizine moieties were found to point change cross peaks between symmetry-related
toward the center of the duplex, as indicated protons on opposite sides of the dyad axis of
by NOEs between the protons from the piper- the DNA duplex. The mechanism by which
izine ring and the 5'-terminus of the adenine this occurs has been described as dissociation
tract (Fig. 12.24). Corresponding NOEs were of the Hoechst molecule from the duplex, fol-
also observed between the drug phenolic pro- lowed by a 180" reorientation and rebinding
tons and the 5'-terminus of the thymine tract, (105, 106). The self-complementary nature of
as well as the 3'-terminus of the adenine tract the sequences ensures that the same complex
of the complementary strand. This model did is formed for either ligand orientation but
not indicate any interaction with the central with the net effect of interchanging the two
GC base pairs (92). strands with respect to the orientation of the
The orientation of the ligand was similarly Hoechst molecule. The rate at which this pro-
determined in the 1:1 complexes based on in- cess occurs was estimated using cross-peak in-
termolecular NOEs between protons located tensities in the NOESY spectrum (106). When
at the extremities of the Hoechst molecule and interacting with d(GGTAATTACC), and
protons of the binding site. For example, in the d(GTGGAATTCCAC),, the lifetime of the
interaction with d(GTGGAATTCCAC),, Fede complex in each state (Ilk,) was reported to
et al. (106) reported NOEs between protons be approximately 0.8 and 0.45 s, respectively
from the piperizine moiety and the HZ and (105, 106). These values indicate a small but
HI' protons of the dinucleotide fragment significant difference in the affinity of Hoechst
d(A5T5).d(A6T6). for TAATTA and GAATTC sites.
3.2.4.1.3 Dynamic Processes. The binding Intramolecular dynamic processes that are
of the Hoechst molecule to the self-comple- fast on the NMR timescale are also observable
mentary oligonucleotide duplexes in a 1:l ra- in the 'H-NMR spectrum of the bound
tio lifts the dyad symmetry of the duplexes so Hoechst molecule. Resonance averaging is ob-
that two sets of DNA resonances are observed. served for the H2/H6 and H3/H5 protons of
This indicates that the drug is in slow ex- the phenol group, which is consistent with the
change between the free and the bound forms. environments on either side of the ring being
Close examination of the 2D NOE data, how- averaged by rapid ring-flipping motions about
ever, reveals the presence of chemical-ex- the C4-C2' axis. This occurs despite the appar-
ent tight fit between the phenyl ring and the duction pathways leading to T-lymphocyte ac-
walls of the minor groove, which, in a static tivation. FK506 (10) and rapamycin (11)in-
model of the complex, must present a large
barrier for rotation. It was estimated (105)
that the rate for this process is as high as 1000
s-l. This is much higher than the rate of in-
terconversion between free and bound forms
of the duplex; thus, dissociation of the drug
from the complex cannot be the rate-limiting
factor for phenol ring flipping. Dynamic fluc-
tuations of the DNA conformation are more
likely to provide the rate-limiting step.
3.2.4.7.4 Summary of Solution Studies. The
data obtained from these NMR studies are
consistent with the bound ligand fitting
tightly within the minor groove of AT tetram-
ers, with the aromatic rings of the ligand being
roughly coplanar. The AT tract provides the
key recognition features required for binding,
including the narrowness of the minor groove.
The importance of van der Wads interactions
is evident, given the large number of NOE con-
tacts between the ligand and the walls and (10) FK506 R = CH2CHCH2
floor of the groove. Hydrogen bonding also (12) Ascomycin R = CH2CH3
plays a significant role in stabilizing the inter-
action, as do electrostatic interactions be-
tween the positively charged piperizine ring
and the minor groove. Electrostatic interac-
tions are also likely to play a significant role in
orienting the ligand within the binding site, as
shown in the 2:l complex, where the pipera-
zine rings point toward the center of the du-
plex where the positive charge is best stabi-
lized (92).The information derived from these
studies, as well as from NMR studies of the
interactions of other minor groove binders
with DNA, is useful for the design of ligands
with altered specificity or increased binding
affinity, with the overall goal being the devel-
opment of novel drugs.
3.2.4.2 lmmunophilins: Studies of FK506
u
Analog Binding to FKBP. Some of the most de- (11) Rapamycin
tailed investigations of the interaction be-
tween ligands and their target proteins have hibit the cis-trans isomerase activity of FKBP,
been made for the immunophilin class of pro- whereas cyclosporin A (structure shown in
teins. The major FK506 binding protein Fig. 12.3) inhibits that of Cp. NMR has con-
(FKBP) has a molecular mass of about 11.8 tributed significantly to the understanding of
kDa, whereas cyclophilin (Cp) has a mass of binding interactions to both proteins.
about 17 kDa. These proteins are unrelated in Initial studies on FK506 focused on the
amino acid sequence but both have peptidyl- structure of the free ligand to aid in the design
prolyl cis-trans activities that are inhibited by of further analogs (108-110).However, it was
immunosuppressants that block signal trans- established from studies of the cyclosporin A-
Figure 12.25. Three-dimensional

structure of ascomycin bound to
FKBP. Protons on the ligand that
showed NOES to the rote in are de-
A
noted by a black shading of the car-

bons to which they are attached. Al-
though no NOES were observed
from the protons at position 3 to the
protein, the upfield shift of their res-
onances, -1.09 and 0.25 ppm, sug-
gests that they are in close proxim-
ity to an aromatic region of FKBP.
(Reprinted with permission from
Ref. 116; 01991, American Chemi-
cal Society.)
cyclophilin complex that the conformation of a In another study (116), a uniformly 13C-
molecule bound to its target site may be very labeled ascomycin, (12),was prepared, allow-
different from that in the free state (111-113). ing the bound conformation of ascomycin to be
In addition, analog design is assisted by know- determined in the presence of FKBP. The en-
ing the location of the binding region of the hanced 13C signals were used to edit the 'H
ligand. Studies were therefore undertaken to NOESY spectra used for the structural analy-
determine the bound state of the ligand as well sis. Not only were the assignments of side-
as to identify those portions of the drug inter- chain methyls made possible by the 13C en-
acting with the binding protein. richment, but ligand resonances could be
The first investigations involved the analy- distinguished readily from those of the pro-
sis of 13Ccarbonyl chemical shifts of C8 and C9 tein. The conformation of the ligand was de-
and the 'H chemical shifts of the piperidine termined from NOEs observed in a 3D
ring of FK5O6 bound to FKBP (114,115). The HMQC-NOESY spectrum. The resulting asco-
upfield shifts of the piperidine ring protons, as mycin structure (Fig. 12.25) differed consider-
well as NOEs observed between these protons ably from that of the uncomplexed FK506 ob-
and aromatic protons of FKBP, suggested that tained by X-ray crystallography, but was
the bound site on FKBP resided in an aro- similar to that of rapamycin. In particular, the
matic-rich domain, and allowed a putative bound ascomycin displayed a trans orientation
binding site on FKBP to be proposed. It was of the 7,8-amidebond, whereas this bond is cis
also evident that the pipecolinyl functionality in free FK506 and trans in rapamycin. The
of FK506 and analogs was involved in the backbone structure of the macrocyclic ring dif-
binding face of the ligand. fered from that of uncomplexed FK506, but
showed a similarity in the piperidine ring re- and CH-CH NOEs from the same experiment
gion to that of rapamycin. This study also repeated in D,O. Hydrogen bond constraints
showed that both the piperidine ring and the were obtained by the identification of slowly
pyranoyl moiety of ascomycin are involved in exchanging amide protons from a series of
the binding interface in the complex with HSQC spectra acquired over several days.
FKBP. Ligand protons that show NOEs to the Torsional angle- constraints were obtained
protein are in bold in Fig. 12.25. X-ray studies from coupling constants measured in a 2D
since have confirmed these results for both the HMQC-J spectrum of [U-15N]FKBP/ascomy-
FK506-FKBP and rapamycin-FKBP com- cin. In all, 1958 distance constraints were ap-
plexes, showing the trans orientation of the plied to the structure calculation, with the ex-
ligand amide bond in the bound conformation, tra resolution afforded by isotopic labeling, as
and verifying the involvement of the piperi- compared with the 590 and 1047 restraints
dine and pyranoyl regions of the ligand in the used in earlier homonuclear studies (118,
binding interface (117). 119). Restraints defining the structure of
The binding site of the FKBP complex has bound ascomycin were obtained from the pre-
also been investigated through use of NMR viously reported data of Petros et al. (116)and,
spectroscopy. Michnick et al. (118)and Moore along with the intermolecular NOE-derived
et al. (119) solved the structure of uncom- distance constraints also reported in their
plexed FKBP by use of 'H-NMR methods. Al- study, the complete ascomycin/FKBP solution
though spectral overlap did not allow every structure was calculated.
structural constraint present to be identified The extra detail afforded by the multi-
unambiguously, convergent structures defin- dimensional NMR approach allowed the
ing the global fold of the 107-residue FKBP ligand-protein contact area to be located un-
protein were obtained. Previous biochemical ambiguously and even specific intermolecular
data allowed the extensive aromatic cluster hydrogen bonds identified. The structure of
within the core of the structure to be identified the complexed FKBP was essentially similar
as the ligand-binding pocket. The loop regions to that of the uncomplexed structure, except
of the protein between residues 37-43 and 83- that the "ill-defined" loop regions between
90, situated at the open end of the binding residues 36-45 and 78-92 were found ,to
pocket, were also of interest. The loops were adopt well-defined conformations in the com-
the least well defined regions of FKBP and plexed proteins, as preempted by previous
were thought to be flexible, and perhaps in- studies. Although this difference may partially
be a result of the differences in resolution
volved in the binding interaction. Examina-
achieved in the complexed and uncomplexed
tion of lH and 15N chemical-shift changes on
FKBP NMR studies, generally it was thought
addition of ligand supported this notion and
that binding involved some rearrangement of
suggested that significant structural changes the 36-45 and 78-92 loops. This provides a
in these loop regions occurred upon ligand good example of the dynamic nature of protein
binding (118). binding as revealed by NMR spectroscopy.
In a later study, a high resolution structure The dynamic aspects of the ligand-FKBP
of the complete ascomycin-FKBP complex was complex formation were pursued by Cheng
calculated by heteronuclear 3D and 4D NMR et al. (121) through analysis of 15N-NMRre-
by Meadows et al. (120). Uniformly labeled laxation data. In particular, the increased
[15N]FKBP and [13C,15N]FKBP were pre- backbone mobility for several residues
pared and incubated with unlabeled ascomy- within the 36-45 and 78-95 loops compared
cin to form the complexes. Three-dimensional with that of the rest of the protein was
NOESY-HSQC spectra, resolved according to noted. From analysis of the 15N relaxation
15N shifts, were used to obtain the NH-NH rates of FKBP complexed with FK506, it was
NOEs within FKBP. CH-NH NOEs were de- found that flexibility was restricted along
rivedfrom a4D [13C,1H,15N,1H]-NOESYspec- the entire polypeptide chain (122). This con-
trum of the doubly labeled material in H,O firmed the proposition that the binding in-
teraction of FKBP with ligand involves sta-
bilization and structuring of the protein
loops adjacent to the binding site.
In summary, it was possible not only to de-
fine the free and bound conformations of the
ligand but also to identify the two binding in-
terfaces involved in the interaction and dem-
onstrate a reduction in protein mobility in a
defined region of the protein upon binding.
This level of analysis was possible because of
the tight binding of the FKBP-ligand complex,
its small size, and the availability of labeled
species. The information proved to be comple-
mentary to X-ray crystallographic studies and
will help to clarify the role of FKBP complex
formation in immunoregulation.
3.2.4.3 Matrix Metalloproteinases. Matrix
metalloproteinases (MMPs), including stro-
melysin, collagenase, and gelatinase, are in-
volved in tissue remodeling associated with
embryonic development, growth, and wound
healing. Unregulated or overexpressed MMPs
have been implicated in several pathological
conditions, including arthritis and cancer, and
inhibitors of stromelysin and other MMPs
have attracted much interest because of their
potential for the treatment of these diseases.
Several NMR structural studies of strome-
lysin (123-127) and collagenase (128, 129) Isotope editingtfiltering studies played an
complexes have been reported. The secondary important role in defining interactions be- a
structure and global fold have been found to tween the ligands and stromelysin. For exam-
be quite similar for the catalytic domains of ple, for the stromelysinPNU-107859 complex
both enzymes and their various complexes a 3D 12C-filtered,13C-editedNOESY spectrum
with ligands. The active site in each enzyme is recorded on the [12C,14NlPNU-107859/[ C, Nl -
13 15
a cleft spanning the width of the enzyme, with stromelysin complex was used to assign pro-
a catalytic zinc atom coordinated by three his- teinlligand NOEs. Of the 11 observed NOEs
tidine residues located in the center. Different between the ligand and protein aliphatic pro-
dynamic properties of active-site residues in tons, nine involved the aromatic ring of (13)
stromelysin/ligandcomplexes (3) and of colla- and one involved the terminal methyl group.
genase with and without bound inhibitor (128,
NOEs were observed between (13) and pro-
129) have been reported. It has been proposed
tons of Tyr155, His166, Try16', and Ala16'. All
that structural and dynamic differences can
be exploited in structure-based drug design to four of these residues are located in the Sl-S3
achieve broad inhibitor activity against sev- binding sites on one side of the active site.
eral MMPs or to obtain more selective inhibi- Comparison of 2D 'H-15N HSQC spectra
tion (3). showed that differences between the lH and
Of recent interest have been structural 15Nchemical shifts for the stromelysin/l3 and
data on a novel class of MMP-binding inhibi- stromelysin/l4 complexes are concentrated in
tors, represented by PNU-107859 (13) and the active site, indicating that no gross confor-
PNU-142372 (14), which contain a thiadiazole mational differences in protein structure ex-
moiety that coordinates the catalytic zinc ist. The aromatic rings of (13)and (14) bind in
atom through its exocyclic sulfur atom (130). the same region of the protein.
556 NMR and Drug Discover
Bound
-1 40 -1 45 -150 -155 -1 60
'F Chemical shift (ppm)
Figure 12.26. Region of the ID 19Fspectrum of the stromelysiflNU-142372 complex. Signals from
free (sharp) and bound (broad) PNU-142372 are observed. (Adapted from Ref. 3 and reprinted with
permission from Elsevier Science.)
A region of the 1D 19F spectrum of the in chemical shift between bound and free sig
stromelysin/l4 complex is shown in Figure nals reflects the different environment of thl
12.26 (3). Two separate resonances were ob- bound and free states. Third, signals from t h ~
served for the two ortho fluorine atoms of the bound ligand are broader than those from t h ~
bound ligand in contrast to the single reso- free ligand because of the higher molecula
nance observed for both ortho protons of weight of the complex but are still clearly vis
stromelysin-bound (13), indicating that the ible for a complex of this size.
ring flip rate (rotation about the CP-CY bond) NMR studies have also been reported fo:
is reduced for stromelysin-bound (14) com- ligands bound to collagenase. Interest so fa
pared to stromelysin-bound (13). A ring flip has focused on hydroxamate-containing li
rate of approximately 100 s-' was estimated gands, where it has been shown that bindinl
from the difference in linewidths for the causes a decrease in mobility of some but no
bound ortho and para fluorine atom reso- all active-site residues (128, 129). Interest
nances of (141, more than two orders of mag- ingly, some active-site residues adjacent tc
nitude slower than the ring flip rate for (13). residues that interact directly with inhibito:
The 19Fspectrum in Figure 12.26 illustrates were found to have high mobility both in thc
several general principles that are useful in presence and the absence of inhibitor (129)
NMR studies of ligand macromolecule com- This contrasts with what is observed foi
plexes. First, note that the use of a rare probe stromelysin complexed to hydroxamate li
nucleus such as 19F produces spectra of ele- gands and a more complete understanding o
gant simplicity. Because there is no naturally the dynamics of the respective interaction!
occurring 19Fin the macromolecule, it gener- may provide critical information for drug de
ates no interfering signals. Second, the offset sign (3).
Hydroxamate-containing ligands have also tracellular enzyme that is the target of several
featured in other NMR studies, this time using clinically used drugs, including methotrex-
transferred NOESto determine their bioactive ate (151,an anticancer compound, and tri-
conformations (131). TrNOE data were used methoprim (16),an antibacterial. These act by
to determine the conformation of the inhibi- inhibiting the enzyme in malignant cells and
tors when bound to stromelysin. The NOE- parasites, respectively. The small size of
derived structures of the bound inhibitors DHFR (18-20 kDa) makes it amenable to
were used as templates to screen a database of structural studies and there have been numer-
260,000 compounds. Eighteen of the 23 com- ous complexes determined using both X-ray
pounds identified for which stromelysin bind- and NMR methods. The focus here will be on a
recent illustrative example of the structure of
ing data were available had affinities less than
a new complex of DHFR with trimetrexate
200 nM, demonstrating the value of deriving a
(17). Trimetrexate was initially investigated
conformationally restricted template for
as an antimalarial agent but has subsequently
structure-based drug design (131). This study been found to have antineoplastic activity
also demonstrates the close synergy that ex- against breast, neck, and head cancers. It has
ists between structure-based design and also been used as an antibacterial for the
screening approaches, either in silico or exper- treatment of Pneumocystis carinii pneumonia
imental. in AIDS patients. As seen from the following
3.2.4.4 Dihydrofolate Reductase. Dihydro- structures, trimetrexate combines some of the
folate reductase (DHFR) is an important in- features of trimethoprim and methotrexate:
558 NMR and Drug Discove
Figure 12.27. Stereoview of a superposition over the backbone atoms (N,Ca, and C) of residues
1-162 of the final 22 structures of the DHFR-trimetrexate complex. (a)View of the protein backbone
and the trimetrexate heavy atoms. (b) View of trimetrexate in the binding site of enzyme. (c) Con-
formation of trimetrexate in the binding site of enzyme. The orientation of trimetrexate is identical
for (a)-(c)and only its heavy atoms are shown. (Reprinted with permission from Ref. 132. Copyright
1999 The Protein Society.)
The three-dimensional structure of the sufficiently slowly to give narrow signals i

complex of DHFR with trimetrexate was de- slow exchange, which give good NOE crot
termined using about 2000 distance re- peaks. At higher temperature these broade
straints, 300 angle restraints, and 100 hydro- and their NOE cross peaks disappear, thus a
gen-bonding restraints (132). Simulated lowing the signals in the lower temperatul
annealing calculations produced a family of 22 spectrum to be identified as NOEs involvin
structures consistent with the constraints. ligand protons. Figure 12.27 shows the stru~
Several intermolecular protein-ligand NOEs ture of the complex, including the orientatio
were obtained by using a novel approach that of the ligand in the binding site.
monitored temperature effects of NOE signals The binding site for trimetrexate is well dc
resulting from dynamic processes in the fined and was compared with the binding sitc
bound ligand. At low temperature (5°C) the in related complexes formed with methotre:
trimethoxy ring of bound trimetrexate flips ate and trimethoprim. No major conform;
:
i
Figure 12.28. Correlated motions of

a carboxylate group from methotrex-
ate and Arg57 of DHFR detected by
NMR. (a)Structure of an arginine-car-
H boxylate complex formed with sym-
metrical end-on interactions and (b)
structure of methotrexate showing its
H2N CH3 interaction with the guanidine group
of Arg57 of DHFR. (Reprinted with
permission from Ref. 133.)
tional differences were detected between the ments were also used to probe dynamics of the
different complexes. The 2,4-diaminopyrimi- protein and no large amplitude motions were
dine-containing moieties in the three drugs found, apart from that at the C-terminus
bind essentially in the same binding pocket (132). The power of NMR methods for study-
and the remaining parts of their molecules ing dynamics of complexes is further illus-
adapt their conformations such that they can trated by an earlier study of the complex of
make effective van der Wads interactions DHFR with methotrexate (133). In this case a
with essentially the same set of hydrophobic correlated dynamic rotation of a carboxylate
amino acids. The side-chain orientations and group on the ligand and of the protein
local conformations are not greatly changed in was detected, as illustrated in Fig. 12.28.
the different complexes. 3.2.4.5 HIV Protease. Because of its essen-
The ring flipping of the trimethoxy aro- tial role in the HIV life cycle, H N protease is a
matic ring mentioned above was detected by major target for structure-based design of
variable-temperature studies of the spectral antiAIDS drugs. There are now more than
line shape. The presence of such dynamics 100 structures of HIV protease and protease
processes involving the ligand appear to be not inhibitor complexes in the HIV-protease
uncommon in macromolecule-ligand com- structure database (134-136) and the avail-
plexes and the ability of NMR methods to de- ability of this wealth of high resolution struc-
tect such phenomena represents one distinct tural information has been the driving force
advantage of NMR over X-ray methods of behind numerous structure-based design pro-
structure determination. Relaxation measure- grams (134, 135, 137). Most of the high reso-
560 NMR and Drug Discove
Figure 12.29. (a) View of the super-

imposed heavy atom ( N, Ca, C) of the
ensemble of structures of the HIV-1
proteasesDMP323 complex. (b) Rib-
bon diagram of the minimized aver-
age structure of the complex. (Re-
printed with permission from Ref.
138. Copyright 1996 The Protein So-
ciety.)
lution structural information on HIV protease by the fact that the protease undergoes rapi
has been obtained from X-ray crystallography autocatalysis in solution. It required the dc
data (136). Although there are relatively few velopment of potent inhibitors before NMI
examples of HIV proteaselinhibitor complexes studies of the complex became feasible. Th
that have been determined by use of NMR first solution structure (Fig. 12.29) of HIV prc
spectroscopy, the NMR data, taken together tease bound to the cyclic urea inhibitor DMF
with the structural data from X-ray experi- 323 (18)was reported in 1996 (138).
ments, have contributed to an understanding
of protease-inhibitor recognition and dynam-
ics. Indeed, studies of HIV proteasefinhibitor
complexes are a powerful example of the way
in which complementary information ob-
tained from X-ray crystallography and NMR
spectroscopy can be used to facilitate struc-
ture-based drug design.
HTV proteaselinhibitor complexes have a
molecular weight of approximately 22 kDa. Al-
though NMR spectroscopy is well suited to de-
termination of the structure of molecules in
this size range, efforts to determine the solu-
tion structure of the complex were hampered
The protease exists as a homodimer. Each the crystal structure of the complex. A more
99-residue monomer contains 10 P-strands recent NMR study investigated the role of
and the dimer is stabilized by a four-stranded these water molecules to determine whether
antiparallel p-sheet formed by the N- and C- any had a structural role in the formation of
terminal strands of each monomer. The active the HIV proteasePMP-323 complex (141). In
site of the enzyme is formed at the interface, favorable cases, NMR can be used to estimate
where each monomer contributes a catalytic the residence times of hydration water mole-
triad ( A ~ p ~ ~ - T h r ' ~ - G l y ~is~ responsible
that ) cules (142), thus providing information about
for cleavage of the protease substrates. The the timescale of the interaction of buried wa-
"flap region" is located above the reactive site ter with the bulk solvent. This analysis led to
and is formed by a hairpin from each monomer the identification of a symmetry-related pair
of two antiparallel p-strands joined by a of water molecules that may have a structural
p-turn. There is little difference between the role in formation of the complex. Such infor-
solution and crystal structures of protease-in- mation may prove useful in the design of fu-
hibitor complexes, except in those regions ture cyclic urea inhibitors. An interesting
where the polypeptide chain is disordered. finding in this study was the fact that each of
However, experiments in solution have al- the hydroxyl protons of DMP-323 is in rapid
lowed access to parameters that are not exchange with solvent. This is a surprising re-
directly accessible from crystal data. These pa- sult, given that two of these hydroxyl protons
rameters, such as the amplitude and fre- are completely buried and form a network of
quency of backbone dynamics, the protona- hydrogen bonds with the catalytic Aspz5/
tion states of the catalytic aspartate residues, Asp125 side chains (143). Furthermore, the
and the rate of monomer interchange, are es- dissociation rate of DMP-323 is less than 1 s-'
sential in understanding the interaction of under the conditions of the experiment, which
HIV protease with potent inhibitors. is too small to average the chemical shifts of
The cyclic urea inhibitor DMP-323 was de- the hydroxyl protons and the bulk water. The
signed by analysis of crystal structures of HIV observation is ascribed to local fluctuations in
proteaselinhibitor complexes. A feature com- the complex that allow solvent molecules to
mon to many of the complexes of HIV protease penetrate into the binding site. This conclu-
'
is a buried water molecule that bridges the sion is supported by the observation that the
inhibitor and Ile50 in the flam. Interactions catalytic protons of the side
with this water molecule are thought to in- chains in the proteasePMP-323 complex un-
duce the fit of the flaps over the inhibitor dergo H-D exchange with solvent, even
(139). In contrast, mammalian aspartic-pro- though they are buried and hydrogen bonded
teaselinhibitor com~lexesare unable to ac- to the inhibitor (143). These studies highlight
cornmodate an equivalent water molecule that even well-ordered structures such as the
(135). This observation led to the design of a proteasePMP-323 complex may be flexible on
series of cyclic urea-based inhibitors that are the millisecond to microsecond timescale.
capable of displacing the buried water mole- Interestingly, in the DMP-323 complex,
cule (139).As well as improving the specificity both of the catalytic Asp25/Asp125side chains
of inhibitors to the viral protease, displace- are protonated over the pH range 2-7 (143).
ment of the water molecule was expected to The protonated Asp25/Asp125residues form a
increase the entropic contribution to inhibitor network of hydrogen bonds with the hydroxyl
binding and thus enhance the affinity of com- groups of DMP-323. In contrast it has been
plex formation. The cyclic urea inhibitors are shown that in the complex with the asymmet-
highly potent and specific inhibitors of HIV ric inhibitor KNI-272, the side chain of Aspz5
protease (139) and for DMP-323 it has been is protonated, whereas that of AsplZ5is not. A
shown in both the crystal structure (139) and suggested explanation for this is that both
in solution (140) that the urea moiety does oxygens of the AsplZ5side chain are deproto-
indeed replace the buried water molecule. nated to accept two hydrogen bonds, one from
Although DMP-323 replaces one buried a bound water molecule and one from the in-
water molecule, several others are observed in hibitor. In contrast the side chain of AspZ5 is
protonated so that it can donate a hydrogen plex (148). These data again highlight the im-
bond to the inhibitor (144). Consequently, portance of defining both the structural and
the protonation state of the enzyme is influ- dynamic aspects of binding to understand the
enced strongly by interaction with specific requirements for potent interactions between
inhibitors and this knowledge is essential for a HIV protease and its inhibitors.
detailed understanding of the protease/drug The development of inhibitors of HIV pro-
interactions. tease represents a major success for structure-
NMR has also been used to study the rela- based drug design. When HIV was first identi-
tionship between flexibility and enzymatic fied in the early 1980s there were no known
function for HIV protease. For the proteasel drugs effective for treatment of infection. A
DMP-323 complex, 15N spin-relaxation stud- combination of X-ray crystallography, NMR
ies determined that residues that are flexible spectroscopy, computer modeling, and chemi-
correlate well with residues that are disor- cal synthesis has resulted in the development
dered in the NMR structure of the complex of several effective HIV protease inhibitors.
(145). For example, residues in poorly defined However, in common with other retroviruses,
loops were found to undergo large-amplitude HIV has a high transcription error rate that
internal motions on the nanosecond-picosec- results in a rapid mutational rate. One of the
ond timescale. In contrast, two regions of the results of this is the production of a divergent
molecule were found to exhibit motions on the population of viruses in which the sequence of
millisecond-microsecond timescale. The first the HIV protease produced may differ sub-
of these is at the N-terminus of the protein stantially (149, 150). As a consequence, drug-
around Thr4-Leu5. This is adjacent to the ma- resistant strains of the virus emerge. Clearly,
jor site of autolysis of the protease and it has knowledge of the structural principles that
been suggested that the rate of cleavage may govern inhibition of the protease and the
regulate HIV protease activity in vivo (146). mechanism by which the virus develops resis-
Consequently, the observed flexibility may be tance will continue to be important in the de-
important for regulation of protein function. velopment of effective new drugs.
The second region found to be undergoing mil-
lisecond-microsecond motion was the tips of
the flaps around Ile50-Gly5'. In crystal struc- 4 NMR SCREENING
tures, this region of the protease is well or-
dered and not involved in crystal contacts, al- In the past, NMR was predominantly used in
though its conformation varies from structure the design stage of drug discovery rather than
to structure. This motion is interpreted as a the screening stages. Recently, new methods
dynamic conformational exchange process, that make use of NMR to screen ligands for
which is fast relative to the chemical-shift binding to a protein target have been devel-
timescale. Thus when the protease is bound to oped and are proving to be a powerful tool in
a symmetric inhibitor in solution, this confor- the discovery of new drug leads. This section
mational exchange results in the chemical gives an overview of the various experimental
shifts of the flap residues in the two monomers methods, summarized in Table 12.8, which
being identical (138, 145). In contrast, when can be used to screen mixtures of ligands for
the protease is bound to an asymmetric inhib- binding to a drug target. There will also be a
itor, such as KNI-272, crystal structures show brief discussion on the practical consider-
that each monomer interacts with the inhibi- ations that need to be made when designing an
tor in a different way (144). This is reflected in NMR screening program.
the fact that the chemical shifts of the mono-
mers are different when asymmetric inhibi- 4.1 Methods
tors are bound (141, 147). Analysis of spectra
from such an asymmetric complex has re- 4.1.1 Chemical-Shift Perturbation. Chemi-
vealed that the inhibitor is capable of "flip- cal shift is a function of the chemical (and
ping" its orientation with respect to the two hence magnetic) environment that individual
monomers without dissociating from the com- nuclei experience. Perturbations of chemical
Table 12.8 Summary of the Methods Available for NMR Screening and Their Respective Characteristics
Binding Mixture
Screening Signals Protein Information KD Suitable Deconvolution
Methodology Observed Size Limit Labeling Obtained KDLimit Determined for HTS Required?
Chemical shift Protein <30 kDa 15Nprotein Location M
10-~-10-~
perturbation
(e.g., SAR by
NMR)
STD Ligand None None Orientation M
10-~-10-~
Diffhsion-based Ligand None 2H protein for None M
10-~-10-~
WI (e.g., affinity isotope editing
g NMR)
Relaxation-based Ligand None None None 10-3-10-7M
trNOE Ligand None None Bound 10-3-10-7M
conformation
NOE pumping Ligand or None None Bound 10-3-10-7M
protein conformation
(reverse)
Spin labeling Ligand or None Spin label for Orientation, M
l0-~-10-~
protein either ligand or simultaneous
protein binding
"For reverse NOE pumping.
*For primary screening if the protein is spin-labeled or for second-site screening if the first-site ligand is spin-labeled.
NMR and Drug Discover!
Screen
for first
ligand
___)
4
Optimise
Ligand library
fK:inking strategy summary .
ligand
1 0 4 ~
Optimise
IOptimise
second
ligand
- Link
ligands
Figure 12.30. Summary of the SAR by NMR drug discovery methodology. A protein target is
screened against a library consisting of small organic molecules by use of the lH/l5N HSQC experi-
ment. When two ligands that bind in close proximity are identified, they are linked to form a
composite ligand with an increased affinity for the target.
shifts can be used to detect binding of a ligand This is a valuable piece of information in thc
to a protein target. When a ligand binds to a development of more potent second-genera.
protein the local chemical environment is tion drug leads. Binding affinities can also bc
changed, and this is reflected by a change in determined by measuring the change in chem,
the chemical shifts of nuclei in close proximity ical shift as a function of ligand concentration
to the ligand-binding site. The most common One technique that utilizes this screening
experiment used in this screening methodol- method for drug design is "SAR-by-NMR," de.
ogy is the 'H/15N HSQC that generates a dis- veloped by Fesik and coworkers (1, 4, 151-
crete signal for each amide group within the 155). SAR-by-NMR is a fragment-based drug
protein. A reference 'H/15N HSQC spectrum, design approach in which a potent drug candi.
which is acquired in the absence of potential date is derived by chemically linking two 01
ligands, is compared to a spectrum recorded in more small low affinity ligands for a target. In
the presence of ligands and any changes in the theory, the binding energy of the linked com.
amide chemical shifts are indicative of a ligand pounds will be the sum of the binding energies
binding to a location close to the correspond- of the two individual compounds plus contri.
ing amide groups. The major advantage of this butions to binding energy attributed to link.
technique is that, if the NMR assignment of age. Therefore, it is possible to generate a drug
the amide resonances is known, then the site lead with a nanomolar dissociation constant
of binding for each ligand can be determined. (KD)from two milli- to micromolar fragments.
4 NMR Screening
The first step in this process (Fig. 12.30) NMR screening and subsequently two ligands,
involves screening a library of ligands (typi- (19) and (20), were identified with KD values
cally with a MW < 400) in mixtures of up to 10
for binding to a protein target by comparing
the lH/l5N HSQC spectrum of a 15N-enriched
protein in both the presence and the absence
of ligands. Any ligand-induced changes in the
chemical shift of the nitrogen and amide pro-
ton signals indicate binding of one or more
ligands in the mixture to the protein target.
The mixture containing the binding ligand(s)
is deconvoluted and each individual compound
screened to identify the individual ligand(s)
.
res~onsible for the observed chemical-shift
perturbations. Once a binding ligand is iden-
tified analogs can be screened to optimize
binding.
A second ligand, which binds at a proximal
site, is then identified either from the original
screen or by repeating the library screening
with the first ligand site bound to the protein.
This ligand is then optimized and the struc-
ture of the ternary complex determined by use
of either NMR or X-ray crystallography. The
ternary complex structure provides informa-
tion on the conformation and orientation of
the bound ligands, which facilitates the syn-
thesis of hybrid molecules where the two li-
gands are joined by a suitable linking moiety. of 2 pM and 0.1 mM, respectively. A model of
There are several examples that illustrate the ternary complex between the protein and.
the potential of SAR by NMR. As noted earlier, both ligands was produced, which indicated
FK506 binding protein (FKBP) inhibits cal- that the methyl ester of (19) was close to the
cineurin and blocks T-cell activation when benzoyl hydroxyl group in (20). These two
complexed to the immunosuppressant FK506. groups were linked with alkyl chains of vari-
This protein was used as a target for SAR by ous lengths, with the most active compound
(21) having a three-carbon linker and a KD

value of 19 nM (151).
Inhibitors of the matrix metalloproteinase
(MMP) stromelysin have also been designed
through the use of the SAR-by-NMR screen-
ing methodology. As mentioned in section
3.2.5.3, MMPs are involved in matrix degrada-
tion and tissue remodeling, with overexpres-
sion of these enzymes being associated with
arthritis and tumor metastasis. Acetoxyhy-
droxamate (22) was used as one ligand be-
length linkers and the most active compound

produced, (25), had a KDvalue of 15 nM (154).
A variation of SAR-by-NMR is to optimize
binding or improve the pharmacological prop-
cause it was known previously that MMP in- erties of known drug leads generated by other
hibitors contain a hydroxamate moiety. The methods (e.g., natural products isolation or
KD value of (22) was determined to be 17 mM. combinatorial chemistry). A compound can be
To identify a second ligand the protein was fragmented into individual subunits and then
screened against a ligand library in the pres- alternative fragments identified through use
ence of saturating amounts of (22). The li- of 'H/l5N HSQC screening. These fragments
brary was biased for hydrophobic compounds, can then be incor~oratedinto the molecular
A
given that stromelysin demonstrates a substructure in the hope of improving the binding
strate preference for a hydrophobic amino ac- andlor pharmacological properties of the par-
ids and structural studies had identified a hy- ent compound (Fig. 12.31). The alternative
drophobic binding pocket supporting this fragment must bind in the same location
observation. From the library screen a series as the corresponding section of the original
of biphenyl compounds were identified and an- molecule, making 'H/l5N HSQC screening
alogs of these compounds were synthesized. A method ideal as it provides information on the
biphenyl derivative (23) was produced with a binding site of ligands.
In a demonstration of this fragmentation
method, an antagonist of the interaction be-
tween leukocyte function-associated antigen 1
(LFA-1) and intracellular adhesion molecule 1
(ICAM-1) was used as a starting molecule.
This interaction plays a role in the inflamma-
tory response and specific T-cell immune re-
sponses, and inhibitors have applications in
the treatment of inflammation and organ
KD value of 0.02 mM. The NMR structure of a transplant rejection. The p-arylthio cin-
ternary complex, consisting of stromelysin namide antagonist (26) had an IC,, value of 44
(22) and the biaryl derivative (24) (chosen for nM; however, it was envisaged that the mole-
its superior aqueous solubility), was deter- cule's activity and physical properties could be
mined and indicated that the methyl group of improved by replacing the isopropyl phenyl
(22) was in close proximity to the pyrimidine group with a more hydrophilic moiety. Screen-
ring of (24). With this information (22) and ing of a 2500-compound library provided sev-
(23) were subsequently linked by different eral hits, and analogs of (26) were made that
R Screening
Fragment lead
molecule
I Identify alternative
fragment
proved activity (IC,, values of 20 and 40 nM,
respectively) when compared to that of the
parent compound (26) (156).
Many compounds bind to human serum al-
bumin (HSA), which significantly reduces
their in vivo activity and hence their potential
Incorporate alternative as a drug lead. The fragmentation method has
fragment into original lead recently been used to find analogs of diflusinal
(29) that have a reduced affinity toward HSA
Figure 12.31. The fragment optimization ap-

proacf I developed from SAR by NMR. A known li-
gand of a protein is broken into fragments and small
molec~ lles based on the fragments are screened for
binding. Any molecules that are found to bind can
then t)e incorporated into the original lead com-
pound with the hope of improving its binding and/or
physic^ochemical properties.
(157). Diflusinal is a nonsteroidal anti-inflam-

matory that is more potent, longer acting, and
is more tolerated in uiuo than aspirin. How-
ever, 99%of diflusinal is bound to albumin in
plasma and, as a result, high doses are re-
quired for it to be effective. Structural studies
incorplorated these ligands in place of the iso- of the diflusinalIHSA-I11 complex indicated
I phenyl group. Compounds (27) and that, by introducing polar functionality to the
(28)hisd both improved aqueous solubility and difluorophenyl moiety, binding affinity to
pharm~acokineticprofiles, with similar or im- HAS may be reduced without affecting activ-
ity. A series of organic compounds analogous
to the difluorophenyl fragment were screened
using the 'H/l5N HSQC chemical-shift pertur-
bation method and several alternative frag-
ments were identified. These were incorpo-
rated into the diflusinal structure, resulting in
a number of analogs (e.g., 30 and 31) with
reduced affinity for HSA but still maintaining
some activity. It was predicted that the next
as TROSY-based NMR technologies for struc-

ture determination advance. The method also
requires the comparison of lH/15N HSQC of
the protein in both the absence and the pres-
ence of ligands. Changes in solvent conditions
such as pH, polarity, salt concentration, and
viscosity may cause shifts in amide reso-
nances, leading to false positives (158).
Recently, a protocol was described that
screens ligands using the 'H/l5N HSQC exper-
iment but includes a mass spectrometry pre-
screening step. Ligand mixtures are added to a
protein target and then subjected to size-ex-
clusion chromatography, which separates li-
gand/protein complexes from free ligands. The
identity of the bound ligands can then be de-
termined using MS data and, once identified,
screened by 'H/15N HSQC to determine the
location and specificity of binding (159). This
MS/NMR methodology reduces the amount of
15N-labeled protein required because only a
fraction of the library is screened by NMR and
there is no deconvolution step.
4.1.2 Magnetization Transfer Experiments.

Proteins are composed of a large network of
dipole-dipole interactions, resulting in the ef-
ficient transfer of magnetization throughout
the molecule. The saturation transfer differ-
ence (STD) experiment (Fig. 12.32) uses tbis
generation of compounds would incorporate phenomenon to detect the binding of ligands
the fluorine atoms present in (29) into leads to a protein. It relies on the fact that satura-
such as (30) and (31)because the presence of tion of a single protein resonance results in
fluorine increased activity without increasing saturation of all protein resonances and any
the affinity for HSA. ligands that bind to the protein, provided they
Overall, SAR-by-NMR is already showing are not affected directly by the selective satu-
great promise as a major new tool in drug de- ration pulse (160). The STD experiment is
sign, but it has a few limitations. First, the able to detect the binding of ligands with a KD
screening of Iarge ligand libraries and any sub- between and M.
sequent deconvolution steps can be expensive, An STD experiment consists of irradiating
in that a large amount of 15N-labeled protein an isolated protein resonance (either at low or
is needed. However, as already noted, recent high field) with a series of pulses that saturate
advances in cryoprobe technology have re- the entire protein and any binding ligands.
duced the amount of protein required. The use This results in a spectrum containing reduced
of smaller, more druglike libraries such as that signal intensities from both the protein and
described in the SHAPES ideology (7) could the ligands that bind to it. A second spectrum
also be used to reduce the amount of protein of the protein and ligand library is then re-
required for screening. The second limitation corded with the saturation ~ u l s e soff-reso-
is that the NMR assignments for the protein nance. Subtraction of these two spectra re-
must be known or determined. This limits the sults in the STD spectrum that shows only
size of the protein target to 30 kDa or less, those ligands that bind to the protein (residual
although this value will presumably increase protein resonances are removed using a T,re-
I NMR Screening 569
lncreasina saturation
elective saturation pulse 0Binding molecule Non-binding molecule
Figure 12.32. A schematic representation of the saturation transfer effect. The protein resonances
are saturated (indicated by shading) by a selective pulse by spin diffusion. Resonances of nonbinding
ligands (triangles) are not affected by this pulse but ligands that are interacting with the protein
(ellipses)will also become saturated. These interacting ligands are transferred to solution through
chemical exchange where they are detected. (Reprinted with permission from Ref. 16. Copyright
1999, Wiley-VCH.)
axation filter). This subtraction occurs inter-

ally through phase cycling after every scan,
o reduce artifacts attributed to temperature
Ir magnetic field variations (160, 161).
An STD can be added to most forms of
JMR experiments including COSY, TOCSY,
JOESY, and inversely detected 13C or 15N
pectra. A high resolution magic angle spin-
ling (HR-MAS)STD experiment has been de-
.eloped to study the binding of ligands to a
rotei in immobilized on a solid support. HR- GlcNAc-5' residues bind edge-on to the pro-
MS STD NMR provides a way of obtaining tein, with the binding contribution of the ter-
igand-binding information for proteins that minal galactose residue being the greater
re difficult to work with in solution attrib- (165).
~tedto either poor solubility or conforma- STD NMR studies have also been per-
ional changes (161). In addition to screening formed on membrane-bound receptors by em-
gands for binding to a protein (160,162,163), bedding the protein in the phospholipid bi-
he binding epitope of a molecule can also be layer of a liposome (166).
etermined by examining the intensities of li- There are a number of advantages in using
and resonances (164-166). The proton sig- the STD experiment for the detection of ligand
als having the strongest signals will corre- binding. The saturation transfer effect is an
pond to those that are part of the ligand's efficient process, which results in high sensi-
inding epitope. For example, it was shown tivity, and hence only small quantities of pro-
hat when methyl P-D-galactoside(32) bound tein are required (nanomolar concentrations
Ricinus communis agglutin I the H2, H3, of a protein with MW > 10 kDa) (160,165). In
nd H4 were saturated to the highest degree addition, protein size is noncritical; in fact, as
d u e s on structure indicate relative signal the protein becomes larger, the saturation
trengths) and hence were in close proximity transfer effect becomes more efficient. The ac-
the protein protons. This analysis was sub- quisition time for each experiment is also
2quently extended to the decasaccharide NA, quite short and, because the experiment is li-
$3)and demonstrated that the Gal-6' and gand observed, no deconvolution of mixtures
is required, making this a good technique for ter, resonances generated by small molecules
high throughput screening of large ligand li- that do not bind to the protein can be removed
braries. Unlike the chemical-shift perturba- from the spectrum.
tion techniques, STD experiments provide no Diffusion editing is achieved with the use of
information on the site of ligand binding. a pair of gradient pulses. If field homogeneity
A second variation of saturation transfer is ignored, then all spins experience an identi-
experiment has been devised by Dalvit and co- cal magnetic field despite having different po-
workers that uses the transfer of magnetiza- sitions throughout the sample. The applica-
tion from the water (167). Water is intimately tion of a field gradient has the effect of making
associated with proteins being bound either field strength dependent on position. Under
within or on the surface of the macromolecu- the influence of the gradient pulse, the phase
lar structure. Saturation of the water reso- of individual spins become dependent on their
nance will lead to protein saturation through a position within the sample and hence the
variety of mechanisms, including saturation spins are spatially "encoded." If diffusion does
of the CYH resonances, saturation of exchang- not occur, this spatial encoding is fully revers-
ing protein resonances, and NOE interactions ible by a second gradient of inverse polarity
between water and the protein. If a compound and no loss of NMR signal will occur. How-
is bound to the protein it will also become sat- ever, the second gradient pulse will be unable
urated, and this effect can be used as an indi- to "decode" the spins that have undergone dif-
cation of ligand binding (167). fusion and the resulting NMR signal will be
reduced. Acquiring spectra of a sample with
4.1.3 Molecular Diffusion. Molecules can and without the diffusion filter and then sub-
be distinguished based on their diffusion coef- tracting them allows the ligands binding to
ficients, which are related to molecular size. the protein to be identified. This filtering
Large macromolecules, such as proteins, dif- method can be used for both 1D and 2D exper-
fuse more slowly than small molecules and it is iments and can be "tuned" by altering the
this size difference that can be exploited to strength and duration of the gradients.
screen for ligand binding. If a small molecule Because the ligand signals are being ob-
binds to a protein target its diffusion coeffi- served in this screening method, no convolu-
cient is altered to a value more like that of the tion of the ligand mixture is required, given
protein. Therefore, by utilizing a diffusion fil- that any signals can be assigned directly to
individual compounds within the mixture. of stromelysin from a mixture containing non-
However, signals from the protein are always binding compounds (177).
present, which can pose a problem in inter-
preting spectra. An isotope-edited version of 4.1.4 Relaxation. Like diffusion, the trans-
the diffusion experiment has been designed to verse relaxation time (T,) of molecules is also
avoid this problem, although labeled protein is dependent on molecular size. Large molecules,
required (168). Generally, there is no require- such as proteins, have a short T2 and hence
ment for labeling of the protein target or for exhibit broad NMR signals, whereas small
the protein resonances to be assigned and molecules have a longer T2 and hence nar-
thus, in theory, there is no size limit on the rower line widths. Therefore, if a small mole-
proteins that can be screened by use of this cule ligand binds to a protein, its T,value will
method, although no information is obtained decrease and a line-broadening effect of bound
on the location of ligand binding. However, if ligand signals can be observed. Alternatively,
the protein is large, then the transverse relax- a relaxation filter can be used to remove sig-
ation time may be too short to observe the nals from molecules with a short T, value.
bound ligands in the diffusion-edited spec- Subtraction from a reference spectrum will re-
trum (169). Only one sample, containing pro- sult in a spectrum containing only those li-
tein and ligands, is used to obtain both refer- gands that bind to the protein.
ence and screening data and therefore The ability to identify binding ligands us-
differences between the sample and reference ing relaxation filters has been demonstrated
spectra caused by addition of the ligands (pH, using FKBP. A mixture of nine compounds
salt concentration, etc.) are avoided. consisting of one known ligand of FKBP,
Diffusion-filtered NMR screening requires 2-phenylimidazole (34), and eight nonbinding
that there is a significant difference in ob- compounds (e.g., 35-37 were screened and
served translational diffusion between the only signals from (34) were observed (177).
free and bound states. The ligands are in fast
exchange on the diffusion timescale and as a
consequence the observed diffusion coefficient
for binding ligands is an average between the
free and bound diffusion values. Free ligands
diffuse at a much faster rate than those in the
bound state and thus only a small amount of
free ligand has a considerable effect on the
observed average diffusion coefficient. This ef-
fect may be significant enough to reduce the
difference between binding and nonbinding li-
gands, making it more difficult to interpret
results (169). It has also been demonstrated
that chemical exchange and NOE can affect
the interpretation of diffusion experiments
and that these factors need to be taken into
consideration (170, 171).
Shapiro and coworkers developed a meth-
odology based on diffusion filtering, named
"affinity NMR," that they have used to screen
for binding (172-175). Diffusion-edited NMR
experiments were able to identify two known
binding tetrapeptide ligands of vancomycin
from a mixture of 10 peptides (176). Hajduk et 4.1.5 NOE. NOE experiments can also be
al. demonstrated the application of diffusion- used to identify ligands that bind to protein
editing experiments by differentiating ligands targets (178-180).Small molecules have a fast
small positive NOEs. The experiment is then

repeated in the presence of protein and mole-
cules that bind display negative TrNOEs. Sub-
traction of the two spectra provides signals
arising from only those compounds that bind.
These TrNOEs can be interpreted to provide
information regarding the bound conforma-
tion of the active ligands. However, when an-
tumbling rate and, as a consequence, gener- alyzing the conformational data care must be
ally exhibit small positive NOEs. In contrast, taken to ensure that the ligands are in fast
large molecules such as proteins generate exchange and that the observed TrNOEs are
strong negative NOEs because of their slow not affected by contributions from spin-diffu-
tumbling time. When a small molecule binds sion (179). Relative binding affinities between
to a protein, its tumbling rate is slowed to that ligands can also be determined by comparison
of the protein and it exhibits strong negative of TrNOE signal strength but, again, the fast-
NOEs. On dissociation, these are transiently exchange regime and spin-diffusion effects
retained and are known as transfer NOEs need to be taken into account (178,179). If all
(TrNOEs) (Fig. 12.33). TrNOEs and those ligands are in fast exchange, the stronger
arising directly from the free ligand can be binding ligands occupy more binding sites and
distinguished by the rate of signal build up. thus give larger TrNOE intensities. Because
Transfer NOEs accumulate significantly of the need for an averaging effect, brought
faster and therefore can be selected for by use about by fast chemical exchange, TrNOE ex-
of shorter mixing times in the NOE experi- periments are limited to those ligands with a
ment (179). K,, value from lop3 to lop7M. The spectral
In practice, a 2D NOESY spectrum of the properties of excess ligand in solution are
mixture of potential ligands in the absence of evoked by small fractions of bound molecules,
protein is recorded and all molecules exhibit greatly enhancing sensitivity (178).
Transfer NOE experiments have been used
to identify a bioactive disaccharide from a li-
Free ligand Bound ligand
small positive NOEs large negative NOEs brary of 15 mono- and disaccharides that
bound to Aleuria aurantia agglutinin (179).
Another study has described the identification
A of a silalyl Lewis mimetic (38) that binds to
Relaxation
Detection of large negative trNOEs
Figure 12.33. A schematic representation of the

TrNOE experiment used to detect ligand binding
(180). The free ligand (white ellipse) exhibits only
small positive NOEs, although binding to the large
protein target results in the generation of large neg-
ative TrNOEs. The appearance of these large nega-
E-selectin from a library of 10 compounds
tive TrNOE signals can be used to identify ligands (178). As well as being used to detect binding,
within a mixture that are binding to the protein and TrNOEs may also be used to determine bound
also provide some information on the bound confor- ligand conformations, as described earlier in
mation of the ligand. this chapter.
4 NMR Screening
A second technique that uses NOES to de- for any ligands that bind simultaneously and
tect binding is NOE pumping. This method in close proximity to the first ligand-binding
was designed to alleviate some of the problems site. In addition, the degree of reduction in
associated with the diffusion-edited screening signal intensity gives an indication of the ori-
methods (169). Signals from ligand molecules entation of the second ligand in relation to the
are removed using a diffusion filter and then first, given that the effect of the spin label is
transfer of signal from the protein to bound inversely proportional to the distance separat-
ligands by NOE occurs. The inverse of this is ing the electron and proton. This information
possible (known as reverse NOE pumping), is valuable in the design of linkers to join the
which uses a relaxation filter to attenuate the two ligands.
protein resonances, after which the signal is There are several advantages to using the
transferred to the protein by NOE. Ligands spin-label screening method. Currently, it is
may lose signal either by relaxation (for a free the only method that can detect ligands that
ligand) or through relaxation and NOE trans- bind to the protein simultaneously, unlike
fer (for a bound ligand). Therefore by sub- other methods that can produce false positives
tracting spectra (which is done internally to if the first ligand-binding site is not fully sat-
reduce subtraction artifacts) from experi- urated. The concentration of protein required
ments with and without NOE pumping to the for screening is relatively small (-1OCLM) be-
protein, the binding ligands can be detected cause of the substantial enhancement of the
(181). relaxation rate by the spin label. The protein
The ability of NOE and reverse NOE can also be unlabeled and partially purified
pumping to identify ligands has been demon- and there is no molecular weight limit. The
strated through the use of human serum albu- spin labels also quench protein signals, mak-
min (HSA) and several known binding and ing interpretation of spectra easier. The ex-
nonbinding compounds (169, 181). periment is easy to set up and analyze, making
it amenable to automation. It is also insensi-
4.1.6 Spin Labels. Spin-spin relaxation rates tive to small changes in solvent conditions
are proportional to the product of the squares that can generate false positives in other
of the gyromagnetic ratios of the involved methods. The information obtained on the ori-
spins. The gyromagnetic ratio of an unpaired entation of ligands is also valuable and makes .
electron is significantly larger than that of a it an alternative to the chemical-shift pertur-
proton and therefore any spins influenced by bation methods when the proteins are large
this electron will have substantially shortened and NMR assignments have not been made.
relaxation times. The resonances of protons A disadvantage of the method is the re-
that are within 15-20 A from the unpaired quirement for spin-labeled proteins and li-
electron will experience this effect and be sig- gands. In addition, any ligands with slow dis-
nificantly broadened. The introduction of a sociation rates will show no averaging of
short spin-lock period will significantly reduce relaxation rates and therefore tightly binding
the intensity or quench these signals. compounds (KD< M ) will produce false
The spin-label method can be used as either negatives. Protein spin labeling must occur
a primary screening method or to identify a adjacent but not within the binding site to
second ligand-binding site. The primary minimize alteration of its binding properties.
screening method requires residues around The antiapoptopic protein Bcl-xL is respon-
the binding pocket of the target to be spin la- sible for the reduced susceptibility of cancer
beled. Residues suitable for this labeling in- cells to undergo apoptosis and is therefore a
clude lysine, cysteine, histidine, glutamate, target for the development of new anticancer
aspartate, tyrosine, and methionine. Any li- agents. The structure of a previously identi-
gands that bind to the protein in close proxim- fied ligand for Bcl-xL (39) was modified to in-
ity to the spin-labeled residues will be able to corporate a TEMPO spin label (40). By use of
be identified. To screen for second-site ligand spin-labeled (40), an eight-compound library
binding, the known first-site binding ligand is was screened for simultaneous binding to Bcl-
spin labeled. A reduced signal will be observed xL. From this library an aromatic ketoxime
tion that is desired from the screen. For exam-

ple, the SAR-by-NMR method is suitable only
for small, easily expressed proteins because
the NMR assignments for the target need to be
known so the location of binding - can be deter-
mined and a large amount of 15N-enriched
protein is required. If a simple "yes/no" an-
swer on ligand binding is wanted, then the
shorter, less resource intensive ligand-ob-
served experiments (e.g., STD, diffusion-ed-
ited, or TrNOE) may suffice.
It is also important to determine the cor-
rect NMR solvent conditions for the screening-
procedure. These should facilitate good solu-
bility, with little precipitation or aggregation,
and acquisition of good quality data; maintain
protein structure and activity; and provide a
sufficient buffering effect to allow for ligands
to be added. Two methods that permit the
screening of a range of solvent conditions to
determine without the need for a large
amount of protein are the microdialysis but-
ton test (182) and the microdrop-screening
method (183). A review on this subject has re-
cently been published (1841, to which the
reader is referred for a more in-depth discus-
sion on the subject of solvent conditions for
NMR.
4.2.2 Library Design. Effective design qnd

management of the ligand library to be used
for screening is essential if successful results
are to be obtained. The major considerations
in library design are only briefly described
here. There are a number of reviews that pro-
vide a more in-depth discussion of library de-
sign (15, 185).
(41) was identified as binding simultaneously 4.2.2.1 Ligand Properties. Diversity of li-
with and in the vicinity of (40). Analysis of gands is an important factor to consider in the
relaxation enhancements revealed that the design of a library for NMR screening and
protons around the indole ring were closest to there are a number of factors to take into con-
the spin label. sideration. Although it would seem logical to
maximize diversity, this may not always be the
4.2 Practical Considerations most efficient approach. If the system being
studied exhibits neighborhood behavior, then
4.2.1 Screening Approach. The first step in maximizing diversity is a good option. Neigh-
using NMR screening is to select a suitable borhoods are regions of multidimensional mo-
screening method for the target protein being lecular space defined by a set of molecular de-
used. Table 12.8 lists the characteristics of scriptors. By the choice of a molecule that is in
each NMR screening method. The choice of the center of a neighborhood, it is possible, in
experiment will be determined by the charac- theory, to represent all molecules within that
teristics of the protein target and the informa- molecular space. By spreading out the mole-
4 NMR Screening
u Figure 12.34. Examples of

molecular frameworks from the
SHAPES library.
cules that are selected for the library so that all known drugs could be represented by only
each neighborhood does not overlap, diversity 32 different frameworks. When atom type and
is maximized. bond order were included in the analysis, 41
However, if neighborhoods are only small frameworks were found to describe 24% of all
then compound libraries must be very large so drugs (190). A similar analysis of side-chain
that the neighborhoods overlap and hence all frequency indicated that approximately 70%
molecular space is covered. In addition, some of all side chains present in the compound da-
systems do not exhibit neighborhood behavior tabase analyzed were from the top 20 occur-
and relatively small changes to the structure ring side chains (191).
of a compound may lead to large changes in its The presence of these common frameworks
binding affinity for the target. Maximizing di- and side-chains has been exploited in the
versity may also be inefficient because many SHAPES methodology (7) for NMR screening.
molecules do not possess physicochemical This strategy employs a small focused library
characteristics that are suitable as the basis based on these common frameworks and side
for a drug. In practice, the more that is known chains to screen against protein targets
about the drug target, the less diverse and through the use of relaxation and NOE exper-
more focused the library can be. However, if iments. The advantages of this approach are
the library is too focused then some outlying that the library is small and hence only rela-
<(
new" ligand type for the target being tively small amounts of protein are required
screened may be missed. and any hits from the library will posse&
One strategy for library design is to select druglike characteristics. However, a disadvan-
compounds that have druglike characteristics. tage of the method is that it is unlikely to yield
A simple set of rules, determined by Lipinski new drug types, given that the library is based
and coworkers, for determining whether a on known drug frameworks.
compound is druglike is known as "the rule of Diversity of molecular type is not the only
5." According to this set of criteria, the major- factor that must be taken into account when
ity of orally available drugs have five or fewer designing a library to be used in an NMR
hydrogen bond donors, 10 or fewer hydrogen screening program. Because the screening oc-
bond acceptors, a log P of less than 5, and a curs in an aqueous solution, the organic com-
molecular weight less than 500 (186). Addi- pounds chosen for the library must demon-
tional factors that can be taken into consider- strate reasonable solubility in the aqueous
ation include the number of heavy atoms, ro- conditions used. In general, compounds are
tatable bonds, and ring systems (187-189). dissolved in DMSO and then added to the pro-
Another study has revealed that there are a tein solution at the appropriate concentration.
number of frameworks and side-chains that Currently, there are no good methods for de-
commonly occur in many drugs. Drug mole- termining the solubility of a wide range of
cules, from the comprehensive medicinal data- compounds before screening commences. A
base, were broken down into systems consist- simple method is to dilute the DMSO solution
ing of frameworks (Fig. 12.34) and side- in buffer and observe whether any precipita-
chains. Analysis of these two structural tion or aggregation occurs. However, this
features revealed that approximately 50% of method will not be suitable for compounds
that precipitate or aggregate over several especially when one uses large numbers of
hours, and solutions that appear clear may compounds per mixture. It has been demon-
still contain high MW aggregates, which will strated that in random mixtures of 10 com-
cause false positives in experiments such as pounds in DMSO, the probability of a reaction
the diffusion, relaxation, and TrNOE methods occurring between two of the mixture's com-
(15, 185). ponents is 26%. This value can be reduced by
It is also preferable to choose ligands that careful selection of mixture components (e.g.,
are synthetically accessible andlor possess separating acids from bases) to approximately
suitable moieties to build upon or link to other 9% (192).
fragments. This is especially important in the
SAR-by-NMR screening methodology because 4.2.3 Hardware and Automation. Automa-
this relies on the ability to link individual frag- tion is a requirement if libraries containing a
ments to form a more potent drug lead. If the large number of compounds are to be
ligands to be linked are not synthetically ac- screened. Technology has been developed that
cessible or do not possess suitable linking allows the automation of almost all steps of
functional groups then this process is severely the NMR screening process from sample prep-
hindered. aration through to data analysis (193).
4.2.2.2 Mixture Design. The optimal num- The general setup for NMR screening con-
ber of compounds per mixture is dependent on sists of a robot for just-in-time preparation of
the screening method. For ligand-observed ex- each sample, which is then transferred to the
periments the limiting factor for the number magnet either through a flow system or as dis-
of compounds in a mixture is spectral overlap. crete samples on a rail system. There are sev-
Ligands need to be chosen so that spectral eral disadvantages in using a flow system, in-
overlap is minimized, making interpretation cluding the possibility of contamination of
of the resulting data far simpler. In theory, samples by previously screened compounds,
protein-observed experiments could have a
the capillary line can be blocked if the protein
large number of compounds per mixture that
or ligands precipitate or form aggregates, re-
would both minimize screening time and the
requirement for large amounts of protein. covery of the sample is more laborious because
However, because the experiments are protein it has been diluted, and cryoprobe technology
observed then deconvolution of the mixtures (discussed later in this section) is not yet avail-
and rescreening of each individual compound able in the flow system. Many of these prob-
are required to identify any hits. Therefore, lems can be overcome by using the discrete
the number of compounds per mixture is de- samples with the rail system.
pendent on the hit rate in the screening pro- Data acquisition is easily automated and
cedure, given that the greater the hit rate, the there are several software packages that will
more deconvolution steps required and conse- automate data processing for 2D spectra. The
quently more protein and spectrometer time processing of 1D spectra automatically is re-
are needed. The number of experiments reported to be less reliable because of the large
quired is at a minimum when the number of solvent signal and usually require manual ad-
compounds is equal to l/(hit rate)'''; thus, justment of phasing (193). One of the most
with a hit rate of 10% the optimal number of laborious tasks in NMR screening is the anal-
compounds per mixture is three (185). In ad- ysis and comparison of the resulting spectra.
dition to these factors, if the hit rate is high For 1D ligand-observed experiments differ-
then it is likely that several compounds within ence methods (e.g., STD) provide the most re-
a mixture containing a large number of com- liable method for interpretation of results, in
pounds may compete for the same binding that the presence of signals in the spectra will
pocket, which may lead to false negatives. correspond to the ligands that are binding.
In mixtures of organic compounds the pos- In 2D protein-observed experiments (e.g., 'HI
sibility of interactions between compounds, 15N HSQC) a more statistically rigorous anal-
such as reactions or ion pairing, is also possi- ysis of changes in chemical shift is required
ble and should be taken into consideration, and a discussion of this is beyond the scope of
References
this chapter. A more in-depth account of data in both instruments and methodology. On the
analysis is provided by Ross and Senn (193). instrumental side, increases in magnetic field
Currently, approximately 50-100 samples strengths and the development of cryoprobes
can be screened per day and if mixtures con- have greatly increased sensitivity. Linkages of
tain 10 compounds each this provides a sub- NMR to LC and MS have increased versatility.
stantial throughput. This throughput rate will On the methods front there have been a range
increase as technology improves, as has been of new approaches discovered that will en-
demonstrated by the use of cryoprobes. Cryo- hance the study of larger molecular com-
genic NMR probes, where the preamplifier plexes. Advances in protein expression and la-
and radio frequency coils are cooled to low beling have played a major role in stimulating
temperatures, can significantly increase the the development of new NMR pulse sequences
signal-to-noise ratio of an NMR spectrum. By to extract information from such complexes.
use of these probes NMR data can be obtained
in much faster times and by use of lower pro- 6 ACKNOWLEDGMENTS
tein concentrations, which subsequently in-
creases throughput, the total amount of pro- Work in our laboratory on NMR in drug de-
tein needed to screen a library is reduced. sign and development is supported by the Aus-
Hajduk and coworkers (21) demonstrated the tralian Research Council. D.J.C. is an ARC
substantial improvements made through the Professorial fellow. We thank Norelle Daly for
use of a CryoProbe instead of a conventional assistance with some of the figures and Robyn
probe in lH/l5N chemical-shift perturbation Craik, Shaiyena Williams, and David Ireland
screening. Stromelysin (50 pM) was screened for help in preparation of the manuscript.
against mixtures of 100 compounds (50 pit4
each), facilitating the screening of more than REFERENCES
10,000 compounds in one day. The use of lower 1. P. J. Hajduk, R. P. Meadows, and S. W. Fesik,
concentrations of both protein and ligands in- Science, 278,497 (1997).
creases the stringency levels for the binding 2. S. W. Fesik, J. Biomol. NMR, 3,261 (1993).
strength of ligands. At a proteinbigand con- 3. B. Stockman, Prog. Nucl. Magn. Reson. Spec-
centration of 0.5 mM, ligands with dissocia- trosc., 33, 109 (1998).
tion constants in the millimolar range can be 4. P. J. Hajduk, R. P. Meadows, and S. W. Fesik,.
detected, although at a proteinbigand concen- Q. Rev. Biophys., 32, 211 (1999).
tration of 50 pM this dissociation constant 5. G. C. Roberts, Curr. Opin. Biotechnol., 10, 42
limit is reduced to approximately 0.15 mM. (1999).
Although using higher proteinbigand concen- 6. J. M. Moore, Curr. Opin. Biotechnol., 10, 54
trations can be advantageous when screening (1999).
libraries containing small low affinity ligands, 7. J. Fejzo, C. A. Lepre, J. W. Peng, G. W. Bemis,
a higher stringency is required when screen- Ajay, M. A. Murcko, and J. M. Moore, Chem.
ing large libraries, to reduce the number of Biol., 6, 755 (1999).
hits obtained to a manageable number (21). 8. P. A. Keifer, Curr. Opin. Biotechnol., 10, 34
(1999).
9. G. C. Roberts, Drug Discovery Today, 5, 230
5 CONCLUSIONS (2000).
10. D. C. Fry and S. D. Emerson, Drug Des. Dis-
In this chapter we have given an overview of c o ~ .17,
, 13 (2000).
the two major approaches used in NMR and 11. D. J. Craik and M. J. Scanlon in G. A. Webb,
drug discovery, structure-based design and Ed., Annual Reports on NMR Spectroscopy,
NMR-based screening. Both areas are flour- Vol. 42, Academic Press, San Diego, 2000, pp.
ishing and, together with more traditional 115-173.
uses of NMR, they demonstrate the versatility 12. T. Diercks, M. Coles, and H. Kessler, Curr.
of NMR as a tool in medicinal chemistry. The Opin. Chem. Biol., 5,285 (2001).
power of NMR has been dramatically en- 13. R. P. Hicks, Curr. Med. Chem., 8, 627 (2001).
hanced over the last decade by developments 14. M. Shapiro, Farmaco, 56, 141 (2001).
15. J. W. Peng, C. A. Lepre, J. Fejzo, N. Abdul- 34. V. Saudek, J. Hoflack, and J. T. Pelton, FEBS
Manan, and J. M. Moore, Methods Enzymol., Lett., 257, 145 (1989).
338,202 (2001). 35. S. Endo, H. Inooka, Y. Ishibashi, C. Kitada, E.
16. D. J. Craik, Ed., NMR in DrugDesign, CRC Mizuta, and M. Fujino, FEBS Lett., 257, 149
Press, Boca Raton, FL, 1996, pp. 1-476. (1989).
17. U. Holzgrabe, I. Wawer, and B. Diehl, NMR 36. R. G. Mills, S. I. 07Donoghue,R. Smith, and
Spectroscopy in Drug Development and Analy- G. F. King, Biochemistry, 31,5640 (1992).
sis, Wiley-VCH, Weinheim, Germany, 1999, 37. S. Munro, D. Craik, C. McConviIle, J. Hall, M.
pp. 1-299. Searle, W. Bicknell, D. Scanlon, and C. Chan-
18. A. E. Derome, Modern NMR Techniques for dler, FEBS Lett., 278,9 (1991).
Chemistry Research, Pergamon, New York, 38. M. D. Reily and J. B. Dunbar Jr., Biochem.
1987. Biophys. Res. Commun., 178,570 (1991).
19. H. Gunther, NMR Spectroscopy-An Introduc- 39. H. Tamaoki, Y. Kobayashi, S. Nishimura, T.
tion, John Wiley & Sons, Chichester, UK, 1980, Ohkubo, Y. Kyogoku, K. Nakajima, S. Kuma-
pp. 1-436. gaye, T. Kimura, and S. Sakakibara, Protein
20. K. Pervushin, R. Riek, G. Wider, and K. Wuth- Eng., 4,509 (1991).
rich, Proc. Natl. Acad. Sci. USA, 94, 12366 40. A. Aumelas, L. Chiche, S. Kubo, N. Chino, H.
(1997). Tamaoki, and Y. Kobayashi, Biochemistry, 34,
21. P. Hajduk, T. Gerfin, J. Boehlen, M. Haberli, 4546 (1995).
D. Marek, and S. Fesik, J. Med. Chem., 42, 41. A. Aumelas, L. Chiche, E. Mahe, D. Le-
2315 (1999). Nguyen, P. Sizun, P. Berthault, and B. Perly,
22. P. A. Keifer, Prog. Drug Res., 55, 137 (2000). Int. J. Pept. Protein Res., 37, 315 (1991).
23. K. Wuthrich, NMR of Proteins and Nucleic Ac- 42. D. C. Dalgarno, L. Slater, S. Chackalamannil,
ids, John Wiley & Sons, New York, 1986, pp. and M. M. Senior, Int. J. Pept. Protein Res., 40,
1-292. 515 (1992).
24. C. E. Heading, Drugs, 4,339 (2001). 43. Y. Boulanger, E. Biron, A. Khiat, and A.
Fournier, J. Pept. Res., 53,214 (1999).
25. D. S. Wishart, B. D. Sykes, and F. M-Richards,
J. Mol. Biol., 222,311 (1991). 44. K. Arvidsson, T. Nemoto, Y. Mitsui, S. Ohashi,
and H. Nakanishi, Eur. J. Biochem., 257, 380
26. D. S. Wishart, C. G. Bigam, A. Holm, R. S. (1998).
Hodges, and B. D. Sykes, J. Biomol. NMR, 5,
45. C. M. Hewage, L. Jiang, J. A. Parkinson; R.
67 (1995).
Ramage, and I. H. Sadler, J. Pept. Sci., 3,415
27. K. J. Nielsen, L. Thomas, R. J. Lewis, P. F. (1997).
Alewood, and D. J. Craik, J. Mol. Biol., 263,
46. B. A. Wallace, R. W. Janes, D. A. Bassolino, and
297 (1996).
S. R. Krystek Jr., Protein Sci., 4, 75 (1995).
28. L. K. MacLachlan, D. A. Middleton, A. J. Ed- 47. M. Coles, V. Sowemimo, D. Scanlon, S. L.
wards, and D. G. Reid in D. G. Reid, Ed., Pro- Munro, and D. J. Craik, J. Med. Chem., 36,
tein NMR Techniques, Vol. 60, Humana Press,
2658 (1993).
Totowa, NJ, 1997, pp. 337-362.
48. D. J. Detlefsen, S. E. Hill, S. H. Day, and M. S.
29. D. S. Wishart, B. D. Sykes, and F. M. Richards, Lee, Curr. Med. Chem., 6, 353 (1999).
Biochemistry, 31, 1647 (1992).
49. W. C. Patt, J . J. Edmunds, J. T. Repine, K. A.
30. F. Dasgupta, A. K. Mukherjee, and N. Gan- Berryman, B. R. Reisdorph, C. Lee, M. S.
gadhar, Curr. Med. Chem., 9, 549 (2002). Plummer, A. Shahripour, S. J. Haleen, J. A.
31. D. J. Craik, K. J. Nielsen, and K. A. Higgins in Keiser, M. A. Flynn, K. M. Welch, E. E. Reyn-
G. A. Webb, Ed., Annual Reports on NMR olds, R. Rubin, B. Tobias, H. Hallak, and A. M.
Spectroscopy, Vol. 32, Academic Press, San Di- Doherty, J. Med. Chem., 40, 1063 (1997).
ego, 1995, pp. 143-213. 50. W. C. Patt, X. M. Cheng, J. T. Repine, C. Lee,
32. S. R. Krystek Jr., D. A. Bassolino, J. Novotny, B. R. Reisdorph, M. A. Massa, A. M. Doherty,
C. Chen, T. M. Marschner, and N. H. K. M. Welch, J . W. Bryant, M. A. Flynn, D. M.
Andersen, FEBS Lett., 281, 212 (1991). Walker, R. L. Schroeder, S. J. Haleen, and J. A.
33. N. H. Andersen, C. P. Chen, T. M. Marschner, Keiser, J. Med. Chem., 42, 2162 (1999).
S. R. Krystek Jr., and D. A. Bassolino, Bio- 51. S. L. Munro, P. R. Andrews, D. J. Craik, and
chemistry, 31, 1280 (1992). D. J . Gale, J. Pharm. Sci., 75, 133 (1986).
References
52. B. M. Duggan and D. J. Craik, J. Med. Chem., 72. G. Wider and K. Wuthrich, Curr. Opin. Struct.
39,4007(1996). Biol., 9,594(1999).
53. B. M. Duggan and D. J. Craik, J. Med. Chem., 73. G. M. Clore andA. M. Gronenborn, Curr. Opin.
40,2259(1997). Chem. Biol., 2,564 (1998).
54. M. G. Casarotto and D. J. Craik, J. Pharm. 74. N. Tjandra and A. Bax, Science, 278, 1111
Sci., 90,713(2001). (1997).
55. R. Abseher, L. Horstink, C. W. Hilbers, and M. 75. G. F. King and J. P. Mackay in D. J. Craik, Ed.,
Nilges, Proteins: Struct., Funct., Genet., 31, NMR in Drug Design, CRC Press, Boca Raton,
370(1998). FL, 1996,pp. 101-200.
56. J. Gehrmann, P. F. Alewood, and D. J. Craik, J. 76. N. Tjandra, A. M. Garrett, A. M. Gronenborn,
Mol. Biol., 278,401(1998). A. Bax, and G. M. Clore, Nut. Struct. Biol., 4,
57. J. Balbach, S. Seip, H. Kessler, M. Scharf, N. 443 (1997).
Kashani-Poor, and J. W. Engels, Proteins, 33, 77. A.M. Edwards, C. H. Arrowsmith, D. Christen-
285(1998). dat, A. Dharamsi, J. D. Friesen, J. F.Green-
58. D. J. Craik, B. M. Duggan, and S. L. A. Munro blatt, and M. Vedadi, Nut. Struct. Biol., 7
in M. I. Choudary, Ed., Biological Inhibitors, (Suppl.), 970 (2000).
Vol. 2,Harwood Academic, Amsterdam, 1996, 78. D. Christendat, A Yee, A. Dharamsi, Y.
pp. 255302. Kluger, M. Gerstein, C. H. Arrowsmith, and
59. R. L.Wagner, J. W. Apriletti, M. E. McGrath, A. M. Edwards, Prog. Biophys. Mol. Biol., 73,
B. L. West, J. D. Baxter, and R. J. Fletterick, 339 (2000).
Nature, 378,690 (1995). 79. L. E. Kay, Nut. Struct. Biol., 5,513(1998).
60. D. J. Gale, D. J. Craik, and R. T. C. Brownlee, 80. F. M. Marassi and S. J. Opella, Curr. Opin.
Magn. Reson. Chem., 26,275(1988). Struct. Biol., 8,640(1998).
61. W. Bourguet, M. Ruff, P. Chambon, H. Grone- 81. A. Watts, Curr. Opin. Biotechnol., 10, 48
meyer, and D. Moras, Nature, 375,377(1995). (1999).
62. J. P. Renaud, N. Rochel, M. Ruff, V. Vivat, P. 82. L.-Y. Lian and G. C. K. Roberts in G. C. K.
Chambon, H. Gronemeyer, and D. Moras, Na-
Roberts, Ed., NMR of Macromolecules: A Prac-
ture, 378,681(1995). tical Approach, Oxford University Press, Ox-
63. J. Feeney and B. Birdsall in G. C. K. Roberts, ford, UK, 1993.
Ed., NMR of Macromolecules: A Practical Ap-
83. L. Dugad and J. T. Gerig, Biochemistry, 27,.
proach, Oxford University Press, Oxford, UK,
4310(1988).
1993,pp. 181-215.
84. E. I. Hyde, B. Birdsall, G. C . Roberts, J.
64. J. Feeney in I. Bertini, H. Molinari, and N.
Feeney, and A. S. V. Burgen, Biochemistry, 19,
Niccolai, Eds., NMR and Biomolecular Struc-
3746(1980).
ture, VCH, New York, 1991,pp. 189-205.
85. J. Feeney, J. G. Batchelor, J. P. Albrand, and
65. 5. Feeney, Biochem. Pharmacol., 40, 141
G. C. K. Roberts, J. Magn. Reson., 33, 519
(1990).
(1979).
66. K.J. Nielsen, D. Adarns, L. Thomas, T. Bond,
86. S. Pavlopoulos, M. Rose, G. Wickham, and D. J.
P. F. Alewood, D. J. Craik, and R. J. Lewis, J.
Craik, Anticancer Drug Des., 10,623(1995).
Mol. Biol., 289,1405 (1999).
87. G. J. Pelton and D. E. Wemmer, Proc. Natl.
67. A. P. Campbell and B. D. Sykes, Annu. Rev.
Acad. Sci. USA, 86,5723(1990).
Biophys. Biomol. Struct., 22,99 (1993).
88. W. Leupin, W. J. Chazin, S. Hyberts, W. A.
68. B. D. Sykes, Curr. Opin. Biotechnol., 4, 392
Denny, and K. Wuthrich, Biochemistry, 25,
(1993). 5902(1986).
69. D. J. Craik and K. A. Higgins in G. A. Webb,
89. S. M. Chen, W. Leupin, M. Rance, and W. J.
Ed., Annual Reports on NMR Spectroscopy,
Chazin, Biochemistry, 31,4406(1992).
Vol. 22,Academic Press, London, 1990,pp. 61-
138. 90. R. E. Klevit, D. E. Wemmer, and B. R. Reid,
70. G. Bertho, J. Gharbi-Benarous, M. Delaforge, Biochemistry, 25,3296(1986).
and J. P. Girault, Bioorg. Med. Chem., 6, 209 91. D.J. Pate1 and L. Shapiro,J. Biol. Chem., 261,
(1998). 1230 (1986).
71. R. E. Hubbard, Curr. Opin. Biotechnol., 8,696 92. M. S. Searle and K. J. Embrey, Nucleic Acids
(1997). Res., 18,3753(1990).
93. S. W. Fesik, J. R. Luly, J. W. Erickson, and C. 114. M. K. Rosen, R. F. Standaert, A. Galat, M. Na-
Abad-Zapatero, Biochemistry, 27,8297 (1988). katsuka, and S. L. Schreiber, Science, 248,863
94. C. Zwahlen, P. Legault, S. J. F. Vincent, J. (1990).
Greenblatt, R. Konrat, and L. E. Kay, J. Am. 115. T. J. Wandless, S. W. Michnick, M. K. Rosen,
Chem. SOC., 119,6711 (1997). M. Karplus, and S. L. Schreiber, J. Am. Chem.
95. M. J. Gradwell and J . Feeney, J.Biomol. NMR, Soc., 113,2339 (1991).
7,48 (1996). 116. A. M. Petros, R. T. Gampe Jr., G. Gemmecker,
96. K. D. Harshman and P. B. Dervan, Nucleic Ac- P. Neri, T. F. Holzman, R. Edalji, J. Hoch-
ids Res., 13, 4825 (1985). lowski, M. Jackson, J. McAlpine, J. R. Luly, et
97. J. G. Pelton and D. E. Wemmer, Biochemistry, al., J. Med. Chem., 34,2925 (1991).
27,8088 (1988). 117. G. D. van Duyne, R. F. Standaert, M. Karplus,
98. J. G. Pelton and D. E. Wemmer, Proc. Natl. S. L. Schreiber, and J. Clardy, Science, 252,
Acad. Sci. USA, 86,5723 (1989). 839 (1991).
99. M. Coll, J. Aymami, G. A. van der Marel, J. H. 118. S. W. Michnick, M. K. Rosen, T. J. Wandless,
van Boom, A. Rich, and A. H. Wang, Biochem- M. Karplus, and S. L. Schreiber, Science, 252,
istry, 28,310 (1989). 836 (1991).
100. P. E. Pjura, K. Grzeskowiak, and R. E. Dicker- 119. J. M. Moore, D. A. Peattie, M. J. Fitzgibbon,
son, J.Mol. Biol., 197,257 (1987). and J. A. Thomson, Nature, 351,248 (1991).
101. M. K. Teng, N. Usman, C. A. Frederick, and 120. R. P. Meadows, D. G. Nettesheim, R. X. Xu,
A. H. Wang, Nucleic Acids Res., 16, 2671 E. T. Olejniczak, A. M. Petros, T. F. Holzman,
(1988). J. Severin, E. Gubbins, H. Smith, and S. W.
102. M. A. Carrondo, M. Coll, J. Ayrnami, A. H. Fesik, Biochemistry, 32, 754 (1993).
Wang, G. A. van der Marel, J. H. van Boom, 121. J. W. Cheng, C. A. Lepre, S. P. Chambers, J. R.
and A. Rich, Biochemistry, 28, 7849 (1989). Fulghum, J. A. Thomson, and J. M. Moore,
103. J. R. Quintana, A. A. Lipanov, and R. E. Dick- Biochemistry, 32,9000 (1993).
erson, Biochemistry, 30, 10294 (1991). 122. J. W. Cheng, C. A. Lepre, and J. M. Moore,
104. J. A. Parkinson, J. Barber, K. T. Douglas, J. Biochemistry, 33,4093 (1994).
Rosamond, and D. Sharpies, Biochemistry, 29, 123. P. R. Gooley, B. A. Johnson, A. I. Marcy, G. C.
10181 (1990). Cuca, S. P. Salowe, W. K. Hagmann, C. K. Es-
105. K. J. Embrey, M. S. Searle, and D. J. Craik, ser, and J. P. Springer, Biochemistry, 32,
Eur. J. Biochem., 211,437 (1993). 13098 (1993).
106. A. Fede, A. Labhardt, W. Bannwarth, and W. 124. P. R. Gooley, J. F. O'Connell, A. I. Marcy, G: C.
Leupin, Biochemistry, 30, 11377 (1991). Cuca, S. P. Salowe, B. L. Bush, J. ~ . h e r m e s ,
107. A. Fede, M. Billeter, W. Leupin, and K. Wuth- C. K. Esser, W. K. Hagmann, J. P. Springer,
rich, Structure, 1, 177 (1993). and B. A. Johnson, Nut. Struct. Biol., 1, 111
108. T. Taga, H. Tanaka, T. Goto, and S. Tada, Acta (1994).
Crystallogr., C43, 751 (1987). 125. P. R. Gooley, J. F. O'Connell, A. I. Marcy, G. C.
109. B. E. Bierer, P. K. Somers, T. J. Wandless, S. J. Cuca, M. G. Axel, C. G. Caldwell, W. K. Hag-
Burakoff, and S. L. Schreiber, Science, 250, mann, and J. W. Becker, J.Biomol. NMR, 7,8
556 (1990). (1996).
110. P. Karuso, H. Kessler, and D. F. Mierke, J.Am. 126. S. R. Van Doren, A. V. Kurochkin, Q.-Z. Ye,
Chem. Soc., 112,9434 (1990). L. L. Johnson, D. J. Hupe, and E. R. P. Zuider-
111. S. W. Fesik, R. T. Gampe Jr., T. F. Holzman, weg, Biochemistry, 32,13109 (1993).
D. A. Egan, R. Edalji, J. R. Luly, R. Simmer, R. 127. S. R.Van Doren, A. V. Kurochkin, W. Hu, Q.-Z.
Helfrich, V. Klahore, and D. H. Rich, Science, Ye, L. L. Johnson, D. J. Hupe, and E. R. Zuider-
250, 1406 (1990). weg, Protein Sci., 4,2487 (1995).
112. S. W. Fesik, R. T. Gampe Jr., H. L. Eaton, G. 128. M. A. McCoy, M. J. Dellwo, D. M. Schneider,
Gemmecker, E. T. Olejniczak, P. Neri, T. F. T. M. Banks, J. Falvo, K. J. Vavra, A. M. Ma-
Holzman, D. A. Egan, R. Edalji, R. Simmer, R. thiowetz, M. W. Qoronfleh, R. Ciccarelli, E. R.
Helfrich, J. Hochlowski, and M. Jackson, Bio- Cook, T. A. Pulvino, R. C. Wahl, and H. Wang,
chemistry, 31,6574 (1991). J.Biomol. NMR, 9, 11 (1997).
113. C. Weber, G. Wider, K. von Freyberg, R. Tru- 129. F. J. Moy, M. R. Pisano, P. K. Chanda, C. Ur-
ber, W. Braun, H. Widner, and K. Wuetrich, bano, L. M. Killar, M. L. Sung, and R. Powers,
Biochemistry, 30, 6564 (1991). J. Biomol. NMR, 10, 9 (1997).
References
130. E. J. Jacobsen, M. A. Mitchell, S. K. Hendges, 144. E. T. Baldwin, T. N. Bhat, S. Gulnik, B. Liu,

K. L. Belonga, L. L. Skaletzky, L. S. Stelzer, I. A. Topol, Y. Kiso, T. Mimoto, H. Mitsuya,
T. J. Lindberg, E. L. Fritzen, H. J. Schostarez, and J. W. Erickson, Structure, 3,581 (1995).
T. J. O'Sullivan, L. L. Maggiora, C. W. Stuchly, 145. L. K. Nicholson, T. Yamazaki, D. A. Torchia, S.
A. L. Laborde, M. F. Kubicek, R. A. Poorman, Grzesiek, A. Bax, S. J. Stahl, J. D. Kaufman,
J. M. Beck, H. R. Miller, G. L. Petzold, P. S. P. T. Wingfield, P. Y. Lam, P. K. Jadhav, et al.,
Scott, S. E. Truesdell, T. L. Wallace, J. W. Nat. Struct. Biol., 2,274 (1995).
Wiks, C. Fisher, L. V. Goodman, P. S. Kaytes, 146. J. R. Rose, R. Salto, and C. S. Craik, J. Biol.
et al., J. Med. Chem., 42, 1525 (1999). Chem., 268,11939 (1993).
131. N. C. Gonnella, R. Bohacek, X. Zhang, I. Ko- 147. D. I. Freedberg, Y. X. Wang, S. J. Stahl, J. D.
lossvary, C. G. Paris, R. Melton, C. Winter, S. I. Kaufman, P. Wingfield, Y. Kiso, and D. A. Tor-
Hu, and V. Ganu, Proc. Natl. Acad. Sci. USA, chia, J. Am. Chem. Soc., 120, 7916 (1998).
92,462 (1995). 148. E. Katoh, T. Yamazaki, Y. Kiso, P. Wingfield,
132. V. I. Polshakov, B. Birdsall, T. A. Frenkiel, S. J. Stahl, J. D. Kaufman, and D. A. Torchia,
A. R. Gargaro, and J. Feeney, Protein Sci., 8, J. Am. Chem. Soc., 121,2607 (1999).
467 (1999). 149. D. L. Winslow, S. Stack, R. King, H. Scarnati,
133. P. M. Nieto, B. Birdsall, W. D. Morgan, T. A. A. Bincsik, and M. J. Otto, AIDS Res. Hum.
Frenkiel, A. R. Gargaro, and J. Feeney, FEBS Retroviruses, 11, 107 (1995).
Lett., 405, 16 (1997). 150. J. H. Condra, W. A. Schleif, 0. M. Blahy, L. J.
134. A. Wlodawer, M. Miller, M. Jaskolski, B. K. Gabryelski, D. J. Graham, J. C. Quintero, A.
Sathyanarayana, E. Baldwin, I. T. Weber, Rhodes, H. L. Robbins, E. Roth, M. Shiva-
L. M. Selk, L. Clawson, J. Schneider, and S. B. prakash, et al., Nature, 374,569 (1995).
Kent, Science, 245,616 (1989). 151. S. B. Shuker, P. J. Hajduk, R. P. Meadows, and
135. A. Wlodawer and J. W. Erickson, Annu. Rev. S. W. Fesik, Science, 274, 1531 (1996).
Biochem., 62,543 (1993). 152. E. Olejniczak, P. Hajduk, P. Marcotte, D.
Nettesheim, R. Meadows, R. Edalji, T. Holz-
136. A. Wlodawer and J. Vondrasek, Annu. Rev.
man, and S. Fesik, J. Am. Chem. Soc., 119,
Biophys. Biomol. Struct., 27, 249 (1998).
5828 (1997).
137. D. J. Kempf and H. L. Sham, Cum. Pharm.
153. P. J. Hajduk, J. Dinges, J. M. Schkeryantz, D.
Des., 2,225 (1996). Janowick, M. Kaminski, M. Tufano, D. J. Au-
138. T. Yamazaki, A. P. Hinck, Y. X. Wang, L. K. geri, A. Petros, V. Nienaber, P. Zhong, R. Ham-,
Nicholson, D. A. Torchia, P. Wingfield, S. J. mond, M. Coen, B. Beutel, L. Katz, and S. W.
Stahl, J . D. Kaufman, C. H. Chang, P. J. Do- Fesik, J. Med. Chem., 42,3852 (1999).
maille, and P. Y. Lam, Protein Sci., 5, 495 154. P. Hajduk, G. Sheppard, D. Nettesheim, E.
(1996). Olejniczak, S. Shuker, R. Meadows, D. Stein-
139. P. Y. Lam, P. K. Jadhav, C. J. Eyermann, C. N. man, G. Carrera, P. Marcotte, J. Severin, K.
Hodge, Y. Ru, L. T. Bacheler, J. L. Meek, M. J. Walter, H. Smith, E. Gubbins, R. Simmer, T.
Otto, M. M. R a p e r , Y. N. Wong, et al., Science, Holzman, D. Morgan, S. Davidsen, J. Sum-
263,380 (1994). mers, and S. Fesik, J. Am. Chem. Soc., 119,
140. S. Grzesiek, A. Bax, L. K. Nicholson, T. 5818 (1997).
Yamazaki, P. T. Wingfield, S. J. Stahl, C. J. 155. P. Hajduk, M. Zhou, and S. Fesik, Bioorg. Med.
Eyermann, D. A. Torchia, C. N. Hodge, P. Y. Chem. Lett., 9,2403 (1999).
Lam, P. K. Jadhav, and C. H. Chang, J. Am. 156. G. Liu, J. R. Huth, E. T. Olejniczak, R. Men-
Chem. Soc., 116,1581 (1994). doza, P. DeVries, S. Leitza, E. B. Reilly, G. F.
141. Y. X. Wang, D. I. Freedberg, S. Grzesiek, D. A. Okasinski, S. W. Fesik, and T. W. von Geldern,
Torchia, P. T. Wingfield, J. D. Kaufman, S. J. J. Med. Chem., 44,1202 (2001).
Stahl, C. H. Chang, and C. N. Hodge, Biochem- 157. H. Mao, P. J. Hajduk, R. Craig, R. Bell, T.
istry, 35, 12694 (1996). Borre, and S. W. Fesik, J. Am. Chem. Soc., 123,
142. G. Otting, E. Liepinsh, and K. Wuthrich, Sci- 10429 (2001).
ence, 254,974 (1991). 158. A. Ross, G. Schlotterbeck, W. Klaus, and H.
143. T. Yamazaki, L. K. Nicholson, P. Wingfield, Senn, J. Biomol. NMR, 16, 139 (2000).
S. J. Stahl, J. D. Kaufman, P. J. Domaille, and 159. F. J. Moy, K. Haraki, D. Mobilio, G. Walker, R.
D. A. Torchia, J. Am. Chem. Soc., 116, 10791 Powers, K. Tabei, H. Tong, and M. M. Siegel,
(1994). Anal. Chem., 73, 571 (2001).
160. M. Mayer and B. Meyer, Angew. Chem. Znt. Ed. 177. P. J. Hajduk, E. T. Olejniczak, and S. W. Fesik,
Engl., 38, 1784 (1999). J. Am. Chem. Soc., 119, 12257 (1997).
161. J. Klein, R. Meinecke, M. Mayer, and B. Meyer, 178. D. Henrichson, B. Ernst, J. L. Magnani, W.
J. Am. Chem. Soc., 121, 5336 (1999). Wang, B. Meyer, and T. Peters, Angew. Chem.
162. W. Hellebrandt, T. Haselhorst, T. Kohli, E. Int. Ed. Engl., 38,98 (1999).
Baurnl, and T. Peters, J. Carbohydr. Chem., 179. B. Meyer, T. Weimar, and T. Peters, Eur.
19, 769 (2000). J. Biochem., 246, 705 (1997).
163. M. Vogtherr and T. Peters, J. Am. Chem. Soc., 180. M. Mayer and B. Meyer, J. Med. Chem., 43,
122, 6093 (2000). 2093 (2000).
164. H. Maaheimo, P. Kosma, L. Brade, H. Brade, 181. A. Chen and M. J. Shapiro, J. Am. Chem. Soc.,
and T. Peters, Biochemistry, 39,12778 (2000). 122,414 (2000).
165. M. Mayer and B. Meyer, J. Am. Chem. Soc., 182. S. Bagby, K. I. Tong, D. Liu, J. R. Alattia, and
123, 6108 (2001). M. Ikura, J. Biomol. NMR, 10, 279 (1997).
166. R. Meinecke and B. Meyer, J. Med. Chem., 44, 183. C. Lepre and J. Moore, J. Biomol. NMR, 12,
3059 (2001). 493 (1998).
167. C. Dalvit, P. Pevarello, M. Tato, M. Veronesi, 184. S. Bagby, K. I. Tong, and M. Ikura, Methods
A. Vulpetti, and M. Sundstrom, J. Biomol. Enzymol., 339,20 (2001).
NMR, l8,65 (2000). 185. C. A. Lepre, Drug Discovery Today, 6, 133
168. N. Gonnella, M. Lin, M. J. Shapiro, J. R. (2001).
Wareing, and X. Zhang, J. Magn. Reson., 131, 186. C. A. Lipinski, F. Lombardo, B. W. Dominy,
336 (1998). and P. J. Feeney, Adv. Drug Delivery Rev., 46,
169. A. Chen and M. J. Shapiro, J. Am. Chem. Soc., 3 (2001).
120,10258 (1998). 187. A. K. Ghose, V. N. Viswanadhan, and J. J.
170. A. Chen, C. S. Johnson Jr., M. Lin, and M. J. Wendoloski, J. Comb. Chem., 1, 55 (1999).
Shapiro, J. Am. Chem. Soc., 120,9094 (1998). 188. T. I. Oprea, J. Gottfries, V. Sherbukhin, P.
Svensson, and T. C. Kuhler, J. Mol. Graph.
171. A. Chen and M. J. Shapiro, J. Am. Chem. Soc.,
Model., 18,512 (2000).
121,5338 (1999).
189. J. Xu and J. Stevenson, J. Chem. Inf. Comput.
172. A. Chen and M. J. Shapiro, Anal. Chem., 71,
Sci., 40, 1177 (2000).
66911 (1999).
190. G. Bemis and M. Murcko, J. Med. Chem., 39,
173. M. Lin and M. J. Shapiro, J. Org. Chem., 61, 2887 (1996).
7617 (1996).
191. G. Bernis and M. Murcko, J. Med. Chem., 42,
174. M. Lin, M. J. Shapiro, and J. R. Wareing, 5095 (1999).
J. Am. Chem. Soc., 119,5249 (1997). 192. M. Hann, B. Hudson, X. Lewell, R. Lifely, L.
175. M. Lin, M. J. Shapiro, and J. R. Wareing, J. Miller, and N. Ramsden, J. Chem. Inf.Comput.
Org. Chem., 62,8930 (1997). Sci., 39,897 (1999).
176. K. Bleicher, M. Lin, M. J. Shapiro, and J. R. 193. A. Ross and H. Senn, Drug Discovery Today, 6,
Wareing, J. Org. Chem., 63,8486 (1998). 583 (2001).
CHAPTER THIRTEEN
Mass Spectrometry and Drug

Discovery
RICHARD B. VAN BREEMEN
Department of Medicinal Chemistry and Pharmacognosy
University of Illinois at Chicago
Chicago, Illinois
Contents
1 Introduction, 584
2 Current Trends and Recent Developments, 591
2.1 LC-MS Purification of Combinatorial
Libraries, 592
2.2 Confirmation of Structure and Purity of
Combinatorial Compounds, 594
2.3 Encoding and Identification of Compounds in
Combinatorial Libraries and Natural
Product Extracts, 596
2.4 Mass Spectrometry-Based Screening, 597
2.4.1 Affinity Chromatography-Mass
Spectrometry, 598
2.4.2 Gel Permeation Chromatography-Mass -
Spedrometry, 599
2.4.3 Affinity Capillsuy Electrophoresis-
Mass Spectrometry, 599
2.4.4 Frontal Affinity Chromatography-Mass
Spectrometry, 601
2.4.5 BioafEnity Screening using Electro-
spray FTICR Mass Spedrometry, 601
2.4.6 Pulsed Ultrafiltration-Mass
Spectrometry, 603
2.4.7 Solid Phase Mass Spectrometric
Screening, 606
3 Things to Come, 607
4 Web Site Addresses and Recommended Reading
for Further Information, 608

Mass Spectrometry and Drug Discovery
1 INTRODUCTION M + e- (70 eV) + Mf' + 2e- formation

At the beginning of the 20th century, mass of positive molecular ions using (13.1)
spectrometers were invented to help physi- EI ionization
cists and physical chemists prove the existence
of isotopes of the elements. As radioactivity
and nuclear physics was explored, specialized electron capture EI ionization
mass spectrometers were used to characterize
the fission products of radioactive elements as Toward the late 1950s, organic mass spec-
they were created or discovered. In addition, trometers began to be used for the analysis of
mass spectrometers were used for the mea- a wider variety of organic molecules and even-
surement of isotopic enrichment of radioac- tually became a fundamental analytical tool
tive elements, their inorganic derivatives, and for the characterization of synthetic organic
even the isotopic purification of radioactive el- compounds. Today, mass spectrometers are
ements as inorganic compounds. As this era of used routinely to confirm the molecular
mass spectrometry reached maturity in the weights of organic compounds and to verify
their structures based on fragmentation pat-
1940s, some physicists announced that there
terns. Fragmentation results from the cleav-
would no longer be any need for mass spec-
age of chemical bonds within an ion, resulting
trometry because virtually all of the elements in the formation of a nroduct ion of lower mass
had been discovered and characterized. Of
-
and one or more neutral products. Of course,
course, these prognosticators were wrong be- only the fragment ions and not the neutral
cause the entire field of organic mass spec- species are detected in a mass spectrometer
trometry was about to begin. because this instrument measures the mass-
While mass spectrometers were being used to-charge ratio (mlz) of ions in the gas phase.
for the purification of fissionable material for The energy for fragmentation is the result of
atomic weapons as part of the Manhattan excess energy imparted to the molecular ion or
Project of World War 11, organic mass spec- during a process known as collision-induced
trometry was being invented for the analysis dissociation (CID), which will be discussed
and quality control of aviation fuel. In 1945, along with tandem mass spectrometry (MS-
the application of mass spectrometry to or- MS) below. Because the fragmentation pat-
ganic chemistry emerged as a productive new tern reflects the relative strengths of chemical
area of research and discovery. Commercial bonds in a compound, mass spectra (a plot of
production of organic mass spectrometers be- ion relative abundance versus mlz) provide
gan immediately, and petroleum companies structurally significant fragment ions for com-
became the first customers for these new ana- pound identification. Rules for structure elu-
lytical instruments. Early commercial mass cidation of chemical structures through the in-
spectrometers used electron impact (EI) ion- terpretation of mass spectra have been
ization (see Equations 13.1 and 13.2) to gener- developed. (For a review of EI and ion frag-
ate ions from gas-phase molecules that were mentation pathways, see McClafferty et al.
separated by acceleration through an electro- 1997, Section 4).
magnetic field provided by either a fixed mag- In many cases, EI imparts so much excess
net or an electromagnet. After separation, the energy into a molecule that only fragment ions
ions were detected using a simple impact de- and no molecular ions are produced. There-
tector such as a Faraday cup. This basic design fore, "softer" ionization techniques were de-
is still in use today for the identification and veloped to enhance molecular weight informa-
quantitative analysis of volatile organic com- tion. The first of these ionization methods was
pounds. chemical ionization (CI). Developed by re-
1 Introduction
Table 13.1 Types of Mass Spectrometers and Tandem Mass Spectrometers

Instrument Resolving Power mlz Range Tandem MS
Magnetic sector 100,000 12,000 Low resolution
Quadrupole < 4,000 4,000 none
Triple quadrupole < 4,000 4,000 Low resolution
Time-of-flight (TOF) 15,000 > 200,000 none
FTICR > 200,000 < 10,000 MSn, high resolution
Ion trap < 4,000 < 10,000 MS, low resolution
QTOF 12,000 4,000 High resolution
TOF-TOF 15,000 > 10,000 High resolution
searchers in the petroleum industry (I), CI be- carried out by accurately weighing the un-
came another standard ionization technique known ion and comparing its m l . value to that
for organic mass spectrometry. During CI, of a calibration standard. Since the 1960s,
high energy electrons (as in EI) are used to other types of mass spectrometers capable of
ionize a gas called a reagent gas at a constant high resolution exact mass measurements
pressure (usually -1 Torr) in the mass spec- have become available as commercial prod-
trometer ionization source. The reagent gas in ucts, including Fourier transform ion cyclo-
turn ionizes the sample molecules through tron resonance (FTICR) mass spectrometers,
ion-molecule reactions that usually involve reflectron TOF instruments, and recently,
the exchange of protons. Less frequently, sam- quadrupole time-of-flight hybrid (QqTOF)
ple molecule ionization might involve a charge mass spectrometers (see Table 13.1 for a list-
exchange. Two of the most common ionization ing of types of organic mass spectrometers and
mechanisms in CI are summarized in Equa-
a comparison of their performance character-
tions 13.3 and 13.4.
istics). By the early 2000s, FTICR and QqTOF
M + RH + -+MH + + R CI through proton instruments became more popular than mag-
netic sector mass spectrometers for exact
transfer, R = reagent gas (13.3)
mass measurements, high resolution mea-
surements, and drug discovery applications.
As will be discussed below, exact mass mea-
CI through charge exchange
surements are essential to many types of mass
During the 1960s, high resolution double-fo- spectrometry-based screening and drug dis-
cusing magnetic sector instruments became covery today.
available and are now standard tools for the Biomedical applications of mass spectrom-
determination of elemental compositions us- etry began during the 1960s both at academic
ing a type of analysis called exact mass mea- institutions and pharmaceutical companies.
surement. In mass spectrometry, resolution is These applications depended on the volatiliza-
defined as MIAM, where M is the mlz value of a tion (usually by heating) of pharmaceutical
singly charged ion, and AM is the difference compounds and biochemicals before their gas-
(measured in mlz) between M and the next phase ionization using EI or CI. To increase
highest ion. Alternatively, AM may be defined the thermal stability and volatility of these
in terms of the width of the peak. High resolu- compounds, a variety of derivatization meth-
tion is typically regarded as a value of at least ods were developed to mask polar functional
10,000. At this resolution, the molecular ions groups and reduce hydrogen bonding between
of most drug-like molecules (that is com- molecules. These methods were particularly
pounds with molecular weights less than effective for use with gas chromatography-
-500) can be resolved from each other. After mass spectrometry (GC-MS), which was intro-
resolving a sample ion from others in a mass duced during the 1960s as a practical and pow-
spectrum, an exact mass measurement may be erful tool for qualitative and quantitative
analysis of compounds in mixtures. Both EI in 1982 with the invention of fast atom bom-
and CI were immediately useful for GC-MS, bardment (FAB) (2). FAB and its counterpart,
because both of these ionization methods re- liquid secondary ion mass spectrometry
quire that the analytes be in the gas phase. (LSIMS), facilitated the formation of abun-
When capillary GC was incorporated into GC- dant molecular ions, protonated molecules,
MS, this technique reached maturity. GC-MS and deprotonated molecules of non-volatile
may be used to select, identify, and quantify and thermally labile compounds such as pep-
organic compounds in complex mixtures at tides, chlorophylls, and complex lipids up to
the femtomole level. The speed of GC-MS is approximately mlz 12,000. FAB and LSIMS
determined by the chromatography step, use energetic particle bombardment (fast at-
which typically requires several minutes to 1 h oms or ions from 3 to 30,000 V of energy) to
per analysis. By the 1970s, some organic ionize compounds dissolved in non-volatile
chemists were announcing that organic mass matrices such as glycerol or 3-nitrobenzyl al-
spectrometry had reached maturity and that cohol and desorb them from this condensed
no new applications were possible. Like the phase into the gas phase for mass spectromet-
physicists and physical chemists who had pro- ric analysis (see Fig. 13.1). Protonated or de-
nounced the end of mass spectrometry a gen- protonated molecules are usually abundant
eration earlier, this group would soon be and fragmentation is minimal.
proved wrong. Introduced in the late 1980s, matrix-as-
Although GC-MS remains important for sisted laser desorption ionization (MALDI)
the analysis of many organic compounds, this has helped solve the mass limit barriers of la-
technique is limited to volatile and thermally ser desorption mass spectrometry so that sin-
stable compounds that comprise only a small gly charged ions may be obtained up to mlz
fraction of all organic compounds and even 500,000 and sometimes higher (3). For most
fewer biomedically important molecules. commercially available MALDI mass spec-
Therefore, thermally unstable compounds, in- trometers, ions up to mlz 200,000 are readily
cluding many pharmaceutical compounds obtained. Like FAB and LSIMS, MALDI sam-
such as nucleic acid analogs and biomolecules ples are mixed with a matrix to form a solution
such as proteins, carbohydrates, and nucleic that is loaded onto the sample stage for anal-
acids, cannot be analyzed in their native forms ysis. Unlike the other matrix-mediated tech-
using GC-MS. (For more details regarding niques, the solvent is evaporated before
GC-MS and its applications, see Watson 1997, MALDI analysis, leaving sample molecules
Section 4.) Although derivatization facilitates trapped in crystals of solid phase matrix. The
the GC-MS analysis of many of these com- MALDI matrix is selected to absorb the pulse
pounds, alternative ionization techniques of laser light directed at the sample. Most
were needed for the analysis of the vast major- MALDI mass spectrometers are equipped
ity of polar and non-volatile compounds of in- with a pulsed UV laser, although IR lasers are
terest to drug discovery. available as an option on some commercial in-
During the 1970s and early 1980s, desorp- struments. Therefore, matrices are often sub-
tion ionization techniques such as field de- stituted benzenes or benzoic acids with strong
sorption (FD), desorption EI, desorption CI UV absorption properties. During MALDI, the
(DCI), and laser desorption were developed to energy of the short but intense UV laser pulse
extend the use of mass spectrometry toward obliterates the matrix and in the process de-
the analysis of more polar and less volatile sorbs and ionizes the sample. Like FAB and
compounds (see Watson 1997, Section 4, for LSIMS, MALDI typically produces abundant
more information regarding desorption ion- protonated or deprotonated molecules with
ization techniques including DCI and FD). Al- little fragmentation.
though these techniques helped extend the By the time that GC-MS had become a stan-
mass range of mass spectrometry beyond a dard technique in the late 1960s, LC-MS was
traditional limit of mlz 1000 and toward ions still in the developmental stages. Producing
of mlz 5000, the first breakthrough in the anal- gas-phase sample ions for analysis in a vac-
ysis of polar, non-volatile compounds occurred uum system while removing the high perfor-
1 Introduction 587
Figure 13.1. Scheme for desorption ionization using FAB or LSIMS from a liquid matrix (0).
mance liquid chromatography (HPLC) mobile connected to a vacuum pump. As the droplets
phase proved to be a challenging task. Early evaporate, aggregates of analyte (particles)
LC-MS techniques included a moving belt inform and pass through a momentum separa-
terface to desolvate and transport the HPLC tor that removes the lower molecular weight
eluate into an CI or EI ion source or a direct solvent molecules. Finally, the particle beam
inlet system in which the eluate was pumped enters the mass spectrometer ion source
at a low flow rate (1-3 pL/min) into a CI where the aggregates strike a heated plate
source. However, neither of these systems was from which the analyte molecules evaporate
robust enough or suitable for a broad enough and are ionized using conventional EI or CI.
range of samples to gain widespread accep- Particle beam LC-MS is limited to the analysis,
tance. of volatile and thermally stable compounds
Because FAB (or LSIMS) requires that the that are amenable to flash evaporation and EI
analyte be dissolved in a liquid matrix, this or CI mass spectrometry. Therefore, this ap-
ionization technique was easily adapted for in- proach is not used for polar biochemicals such
fusion of solution-phase samples into the FAB as carbohydrates, sugars, peptides, proteins,
ionization source in an approach known as or nucleic acids.
continuous-flow FAB. Then, continuous-flow Because thermospray became the first
FAB was connected to microbore HPLC col- widely used LC-MS technique (during the late
umns for LC-MS applications (4). Because this 1970s and early 1980s), this technique should
method is limited to microbore HPLC applica- be mentioned here. Thermospray facilitates
tions at flow rates of <10 pL/min and requires the interfacing of standard analytical HPLC
considerable operator intervention, it is not systems at flow rates up to 1 mL/min with
ideal for the analysis of large sample sets. In- mass spectrometers. Although the interface
stead, more robust techniques have been de- between the HPLC and mass spectrometer is
veloped to fulfill this requirement. However, inefficient and exhibits low sensitivity for
continuous-flow FAB is still in use in some most analytes, thermospray has been useful
laboratories. for the LC-MS analysis of many types of small
Like continuous-flow FAB, the popularity molecules. During thermospray, the HPLC el-
of particle beam interfaces is diminishing, but uate is sprayed through a heated capillary into
systems are still available from commercial a heated desolvation chamber at reduced pres-
sources. During particle beam LC-MS, the sure. Gas phase ions remaining after desolva-
HPLC eluate is sprayed into a heated chamber tion of the droplets are extracted through a
9 -
Figure 13.2. Positive ion APCI
mass spectrum of the red carot- -
enoid lycopene in a solution of $ - 119
methanol and tert-butyl methyl 444 467
ether (1:l;vlv).In this analysis, ly-
copene formed a protonated mole-
cule instead of a molecular ion,
100 200 300 400 500
skimmer into the mass spectrometer for anal- but similar to that formed during thermo-
ysis. The sensitivity of thermospray is poor spray. A cross-flow of heated nitrogen gas is
because there is no mechanism or driving - used to facilitate the evaporation of solvent
force to enhance the number of sample ions from the droplets. The resulting gas-phase
entering the gas phase from the spray during sample molecules are ionized by collisions
desolvation. Also, thermally labile compounds with solvents ions, which are formed by a co-
. in the heated source. These
tend to decom~ose rona discharge in the atmospheric pressure
problems were solved when thermospray was chamber. Molecular ions, M+' or M-', andlor
replaced by electrospray during the late 1980s. protonated or deprotonated molecule; can be
During the 1990s, electrospray and atmo- formed. The relative abundance of each type
spheric pressure chemical ionization (APCI) of ion depends on the sample itself, the HPLC
became the standard interfaces for LC-MS. solvent, and the ion source parameters. Next,
Today, APCI and electrospray ionization are ions are drawn into the mass spectrometer an-
the most widely used ionization sources and alyzer for measurement through a narrow
HPLC interfaces for drug discovery using opening or skimmer that helps the vacu.im
mass spectrometry. Unlike thermospray, par- pumps to maintain very low pressure inside
ticle beam or continuous-flow FAB, electro- the analyzer, while the APCI source remains
spray and APCI interfaces operate at atmo- at atmospheric pressure. For example, the
spheric pressure and do not depend on positive ion APCI mass spectrum of lycopene
vacuum pumps to remove solvent vapor. As a is shown in Fig. 13.2. The carotenoid lycopene
result, they are compatible with a wide range is the red pigment of ripe tomatoes and is un-
of HPLC flow rates. Also, no matrix is re- der clinical investigation for the prevention of
quired. Both APCI and electrospray are com- prostate cancer (5).
patible with a wide range of HPLC columns During electrospray, the HPLC eluate is
and solvent systems. Like all LC-MS systems, sprayed through a capillary electrode at high
the solvent system should contain only vola- potential (usually 2000-7000 V) to form a fine
tile solvents, buffers or ion pair agents to re- mist of charged droplets at atmospheric pres-
duce fouling of the mass spectrometer ion sure. As the charged droplets migrate towards
source. In general, APCI and electrospray the opening of the mass spectrometer because
form abundant molecular ion species. When of electrostatic attraction, they encounter a
fragment ions are formed, they are usually cross-flow of heated nitrogen that increases
more abundant in APCI than electrospray solvent evaporation and prevents most of the
mass spectra. solvent molecules from entering the mass
The APCI interface uses a heated nebulizer spectrometer. Molecular ions, protonated or
to form a fine spray of the HPLC eluate, which deprotonated molecules, and cationized spe-
is much finer than the particle beam system cies such as [M + Na] and [M + Klt can be
+
1 Introduction
formed. (For additional information on elec- terization would require CID and MS-MS as
trospray ionization, see Cole 1997, Section 4). discussed in the next section.
In addition to singly charged ions, electro- When analyzing complex mixtures such as
spray is unique as an ionization technique in the botanical extract shown in Fig. 13.3, the
that multiply charged species are common and use of chromatographic separation before
often constitute the majority of the sample ion mass spectrometric ionization and analysis is
abundance. The relative abundance of each of essential to distinguish between isomeric com-
these species depends on the chemistry of the pounds. Even simple mixtures of synthetic
analyte, the pH, the presence of proton donat- compounds might contain isomers that would
ing or accepting species, and the levels of trace require LC-MS for adequate characterization.
Another problem overcome by using a chro-
amounts of sodium or potassium salts in the
matography step before mass spectrometric
mobile phase. In contrast, APCI, MALDI, EI,
analysis is ion suppression. No matter what
CI, and FABLSIMS usually produce singly ionization technique is used, the presence of
charged species. A consequence of forming multiple compounds in the ion source might
multiply charged ions is that they are detected enhance the ionization of one compound while
at lower mlz values (i.e., z > 1)than the corre- suppressing the ionization of another. Usu-
sponding singly charged species. This has the ally, only some of the compounds in a complex
benefit of allowing mass spectrometers with mixture can be detected by mass spectrometry
modest mlz ranges to detect and measure ions without chromatographic separation. The
of molecules with very high masses. For exam- presence of salts and buffers in a sample can
ple, electrospray has been used to measure also suppress sample ionization. Therefore,
ions with molecular weights of hundreds of LC-MS has become a powerful tool for analyz-
thousands or even millions of Daltons on mass ing natural products, synthetic organic com-
spectrometers with mlz ranges of only a few pounds, and pharmaceutical agents and their
thousand. (For a review of LC-MS techniques, metabolites.
see Niessen 1999, Section 4.) In general, APCI facilitates the ionization
An example of the C,, reversed phase of non-polar and low molecular weight species,
HPLC-negative ion electrospray mass spectro- and electrospray is more useful for the ioniza-
metric (LC-MS) analysis of an extract of the tion of polar and high molecular weight com-
botanical, Trifoliumpratense L. (red clover),is pounds. In this sense, APCI and electrospray
shown in Fig. 13.3. Extracts of red clover are are often complementary ionization tech-
used as dietary supplements by menopausal niques. However, during the analysis of large
and postmenopausal women and are under in- or diverse combinatorial libraries, both polar
vestigation as alternatives to estrogen replace- and non-polar compounds are usually present.
ment therapy (6). The two-dimensional map As a result, no one set of ionization conditions
illustrates the amount of information that using APc'I or electrospray is adequate to de-
may be acquired using hyphenated techniques tect all the compounds contained in the library
such as LC-MS. In the time dimension, chro- of compounds. Therefore, a UV ionization
matograms are obtained, and a sample com- technique called atmospheric pressure photo-
puter-reconstructed mass chromatogram is ionization (APPI) has been developed for use
shown for the signal at mlz 269. An intense with combinatorial libraries and LC-MS (7).
chromatographic peak was detected eluting at Recently, APPI became a commercially avail-
12.4 min. In the mlz dimension, the negative able ionization alternative to APCI and elec-
ion electrospray mass spectrum recorded at trospray. During APPI, a liquid solution or
12.4 min shows a base peak at mlz 269. Based HPLC eluate is sprayed at atmospheric pres-
on comparison with authentic standards (data sure, as in APCI. Instead of using a corona
not shown), the ion of mlz 269 was found to discharge as in APCI, ionization occurs during
correspond to the deprotonated molecule of APPI because of irradiation of the analyte
genistein, which is an estrogenic isoflavone molecules by an intense UV light source. Ob-
(6). Because almost no fragmentation of the viously, the carrier solvent must not absorb
genistein ion was observed, additional charac- UV light at the same wavelengths, or interfer-
Computer-reconstructed
mass chromatogram of m/z 269
500 ............................................
.
1 , , I , , I , I , I 1 1 1 1 I
, 1 1 1 1 1 1 1 8 I , , I I I , , , ! I
1 1 1 1 , 1 1 1 1 1 8 8 I I I I J I I
I I I I I I I I I / , , , I 8 I I I I I I I (
,
,
,
,
..........,. ..,.....*.....*..............................
.....+. .....#............*..... I
I
,
,
.)
,
,
,
,
I
I
I
I
I
I
I
I I
I I
I
I
I
I
I
............* .....* ....................... .....*.....+ .....,
I
I I
I
I
I
I
I I
. I
I
I
I
I
I
I
I
I
I
, , , , , , , , , ~ , , , , ~ > _ ! , ! ~ ~ l ! ~
, I , I I , I , I I . I I I I I I
/
.......l""',.."'l'
, ,
,
/
....l.....I.....
, I
I
,...
,
,
C . ...,.....l..... .,............,.....,.....,.....
,
, ,
I
,
,
,
I
,
I
I
T.
I
/
,
...,......... 2r.....,....x;. .,..... .....,......
I
_
. ., I
,
I
.
I
I
I
.
I
~
~
~
I T
8
,,
1
I
,
4
/
I
1
I
,
I
I
,
I
I I
,
,
I
,
,
I
,
t
, i
I I
i ,
1
I
8
I
I
8 . 8 , I
I
I
I
I
I
I _
I
I
!
I
......,.......... .--.l...,...L.
J ..,.....,..... ......L ..... .....,.....1..... ..... ....3 ..... .I7 .... ................. .....
J
, I , , . , ,
l....
J.
" ,
L
,
1
,
J
,
,...7
~
1:
,
k 1
, ~ ~ ~
,, ,, ,, ,, ,, ,
I /
T '
, , ,
"""'r'""
..,.
I
8 8
8
I
8
8
I
,
/
I
1
I
8
,
1
-
8
1
8
8
1
8
I ..I---
I
8 .
I I I I
8
I-
I
.I
I.
I
I
I
I
I
/ ,
.......,..... .....+....* .....> .....,.

I I I
__._*__*
I ,
.....+ _ _-,_.....,.....,.....
,
__ ..:.. * ..... .....,..... ,.....,....*....+-..,.....,......,
,
I
I I ! :
I
,
I
I
I
I
I
I I
I
I
I
.
I
I
I
I I
I
I
I I I I _
.
.. i .
I I
-
- 4
, ,
, , ,
,
, , I , , , I I , - I I I ' - ! - - - - I I I
+:4T..".r: .....,.....,.....
8 T i - , I T ' , I - C T - 8 I I ' 8 . 2 8 I ' I . , - I
: ; ; ; f : .LI-::..:..I.-:
.......,.....,.....1.....l..... .I.....,...
8 / I I
,.....,... , _
...,-..,.....v....T.....
I _ , I I I 8 I I I I 8 . 8 I I I I - I L I
, ,
, ,
, , , T , , , , , , ,
8
I
, * I - , . ,
, , , + ,
8
'
8
,
,
,
8
,
8
,
,
,
,
,
- ,
,
- d
.
l
,
- #
I I
8
I I / I I I I I I I I - , I I I I I I I I L , - I - / - I - I
......1.....1.....
I ,
,....J.
,
......
,
I.....,..:..I. ..i..... J .....,...I
8 , . I , J
..... .... ..... .....,.....J......'...... .....,....L. .....1.....1..... .....>......
1
L.
1
.L
1
1
1 1 1 1
1
1 1 1 , I
J
I I I
1
#
1 1
# 1
# 1
, 1
, 1
, 1
.
1
,
1
,
1
/
1
,
1
, ,
1
,
1
I
1
I
1
I
1
/
1
,
1
I
,
I
I
I
I
I
I
I
I I
I
I , , , , , , I I , , I I I ' I I I I t I ~ I I
, , I , I , - I I I I I I I / . I I I I I / , ~ I I I
- 150
- .
'' 1
8
'4 16 20 24 28
u
",
m O0
C
3
%
1 Retention time (min)
Mass spectrum
at 12.4 min
269 [M-HI-
Figure 13.3. Two-dimensional map showing the LC-MS analysis of an extract of red clover under
investigation for the management of menopause. Reversed phase separation was carried out using a
C,, HPLC column in the time dimension and negative ion electrospray mass spectrometry was used
for compound detection and molecular weight determination in the second dimension.
ence would prevent sample ionization and de- hance the amount of structural information in
tection. The use of APPI as an alternative to these mass spectra, CID may be used to pro-
APCI and electrospray for drug discovery ap- duce more abundant fragment ions from mo-
plications is under investigation. lecular ion precursors formed and isolated
Desorption ionization techniques like FAB, during the first stage of mass spectrometry.
MALDI, and electrospray facilitate the molec- Then, a second mass spectrometry analysis
ular weight determination of a wide range of may be used to characterize the resulting
polar, non-polar, and low, and high molecular product ions. This process is called tandem
weight compounds including drugs and drug mass spectrometry or MS-MS and is illus-
targets such as proteins. However, the "soft" trated in Fig. 13.4.
ionization character of these techniques Another advantage of the use of tandem
means that most of the ion current is concen- mass spectrometry is the ability to isolate a
trated in molecular ions, and few structurally particular ion such as the molecular ion of the
significant fragment ions are formed. To en- analyte of interest during the first mass spec-
2 Current Trends and Recent Developments
100
Figure 13.4. Scheme illustrat-
ing the selectivity of MS-MS and
0 the process by which CID facili-
0
C
m
536 tates fragmentation of prese-
z3
M- lected ions. Negative ion electro-
spray tandem mass spectrum of
n
m 50
-
.-9
-m
lycopene. CID was used to induce
fragmentation of the molecular
ion of mlz 536. As a result, the
2 fragment ion of mlz 467 was
formed by the loss of a terminal
isoprene unit. This fragment ion
0 may be used to distinguish lyco-
pene from isomeric a-carotene
300 340 380 420 460 500 540 580
and p-carotene, which lack termi-
mlz nal isoprene groups.
trometry stage. This precursor ion is essen- ments that are essential to modern drug dis-
tially purified in the gas-phase and free of im- covery namely speed, sensitivity, and selec-
purities such as solvent ions, matrix ions, or tivity.
other analytes. Finally, the selected ion is frag-
mented using CID and analyzed using a sec-
ond mass spectrometry stage. In this manner, 2 CURRENT TRENDS AND RECENT
the resulting tandem spectrum contains ex- DEVELOPMENTS
clusively analyte ions without impurities that
might interfere with the interpretation of the Since the early 1990s, pharmaceutical re-
fragmentation patterns. In summary, CID search has focused on combinatorial chemis-
may be used with LC-MS-MS or desorption try (8,9) and high-throughput screening (10) t
ionization and MS-MS to obtain structural in- in an effort to accelerate the pace of drug dis-
formation such as amino acid sequences of covery. The goal has been to produce, in a
peptides and sites of alkylation of nucleic ac- short time, large numbers of synthetic organic
ids, or to distinguish structural isomers such compounds representing a great diversity of
as p-carotene and lycopene. Beginning in chemical structures through a process called
2001, TOF-TOF tandem mass spectrometers combinatorial chemistry and then quickly
became available from instrument manufac- screen them in vitro against pharmacologi-
turers. These instruments have the potential cally significant targets such as enzymes or
to deliver high resolution tandem mass spec- receptors. The "hits" identified through these
tra with high speed that should be compatible high-throughput screens may then be opti-
with the chip-based chromatography systems mized by quickly and efficiently synthesizing
now under development. and then screening large numbers of analogs
Over the course of the last century, mass called targeted or directed libraries. As a re-
spectrometry has become an essential ana- sult, lead compounds might emerge from such
lytical tool for a wide variety of biomedical combinatorial chemistry drug discovery pro-
applications including drug discovery and grams in a few weeks instead of several years.
development. By combining mass spectrom- Furthermore, a single organic chemist using
etry with chromatography as in LC-MS or by combinatorial synthetic methods might syn-
adding another stage of mass spectrometry thesize thousands of compounds or more in a
as in MS-MS, the selectivity of the technique single week instead of less than five in the
increases considerably. As a result, mass same time using conventional techniques, and
spectrometry offers all of the analytical ele- a single medicinal chemist might identify hun-
dreds of lead compounds per month instead of The application of combinatorial chemistry
just one or two in the same period of time. and high-throughput screening to drug dis-
Accompanying this new drug discovery covery has altered the traditional serial pro-
paradigm, new scientific journals have been cess of lead identification and optimization
established such as Combinatorial Chemistry that previously required years of human ef-
& High Throughput Screening, Journal of fort. Consequently, neither the synthesis of
Combinatorial Chemistry, Journal of Biomo- new chemical entities nor their screening is
lecular Screening, and Molecular Diversity limiting the pace of drug discovery. Instead, a
(see list of journal websites in Section 4). The new bottleneck is the verification of the struc-
variety of topics published in these journals ture and purity of each compound in a combi-
natorial library or of each lead compound ob-
reflects the multidisciplinary nature of the
tained from an uncharacterized library using
current drug discovery process and ranges
high-throughput screening. Because the num-
from organic chemistry, medicinal chemistry,
ber of lead compounds entering the drug de-
molecular modeling, molecular biology, and velopment process has increased, in part be-
pharmacology, to analytical chemistry. As de- cause compounds are entering development at
scribed below, the most significant analytical earlier stages than in the past, the traditional
component of drug discovery has become mass drug development investigations concerning
spectrometry. Only mass spectrometry has be- absorption, distribution, metabolism, and ex-
come an essential element at all stages of the cretion (ADME) and even toxicology evalua-
drug discovery and development process. tions of new drug entities have become addi-
Although a variety of spectroscopic and tional bottlenecks. As a solution to the drug
chromatographic techniques, including infra- development bottlenecks, high-throughput
red spectroscopy, nuclear magnetic resonance assays to assess the metabolism, bioavailabil-
spectroscopy, fluorescence spectroscopy, gas ity, and toxicity of lead compounds are being
chromatography, HPLC, and mass spectrom- developed and applied earlier than ever during
etry, are being used to support drug discovery the drug discovery process, so that only those
in various capacities, some of them, such as compounds most likely to become successful
gas chromatography and fluorescence spec- drugs enter the more expensive and slower
troscopy, are not applicable to most new chem- preclinical pharmacology and toxicology stud-
ical entities, some are not specific enough for ies. In support of these new combinatorial
chemical identification (e.g., infrared spec- chemistry synthetic programs and new high-
troscopy), and other techniques suffer from throughput assays, mass spectrometry has
low throughput (e.g., nuclear magnetic reso- emerged as the only analytical technique with
nance spectroscopy). Unlike gas chromatogra- sufficient throughput, sensitivity, selectivity,
phy, HPLC is compatible with virtually all and robustness to address all of these bottle-
drug-like molecules without the need for necks.
chemical derivatization to increase thermal
stability or volatility. In addition, mass spec-
2.1 LC-MS Purification of Combinatorial
trometry provides a universal means to char-
Libraries
acterize and distinguish drugs based on both
molecular weight and structural features Although combinatorial libraries were origi-
while at the same time providing high nally synthesized as mixtures, today most li-
throughput. With the development of routine braries are prepared in parallel as discrete
LC-MS interfaces and ionization techniques compounds and then screened individually in
such as electrospray and APCI, mass spec- microtiter plates of 96-well, 384-well, or 1536-
trometry has also become an ideal HPLC de- well formats. To facilitate subseauent struc-
tector for the analysis of combinatorial librar- ture-activity analyses and to assure the valid-
ies (ll),and LC-MS, MS-MS, and LC-MS-MS ity of the screening results, many laboratories
have become fundamental tools in the analysis verify the structure and purity of each com-
of combinatorial libraries and subsequent pound before high-throughput screening.
drug development studies (12-14). Semi-preparative HPLC has become the most
(a) PrepLCMS analysis (c) Analytical purity assessment
m
8e7 1
50 rng inj. Desire1
produc
\
Purity = 90.7%
\
5
Time, min
(b) RIC of desired peak from prepLCMS

6.49 rnin
8 Threshold for
.$
c
4e6 fraction collection
a
" I I I
5 2e6 2 4
Time, min
Time, min
Figure 13.5. Mass-directed purification of a combinatorial library. Chromatographic separation
was carried out using gradient elution of 10-90%acetonitrile in water for 7 min after an initial hold
at 10%acetonitrile for 1 min. (a) Total ion chromatogram showing desired product and impurities. (b)
Computer-reconstructed ion chromatogram (RIC) corresponding to the expected product. (c) Post-
purification analysis of the isolated component with a purity >go%. (Reproduced from Ref. 15 by
-
permission of Elsevier Science.)
popular technique for the purification of com- ation (15-17). Any size HPLC column may be
binatorial libraries on the milligram scale be- used, and only a small fraction of the eluant
cause of high throughput and the ease of au- (-pL/min) is diverted to the mass spectrome-
tomation. Typically during semi-preparative ter equipped for APCI or electrospray ioniza-
HPLC, fraction collection is initiated when- tion. Because all of the components, including
ever a UV signal is observed above a predeter- autosampler, injector, HPLC, switching valve,
mined threshold. This procedure usually re- mass spectrometer, and fraction collector, are
sults in the collection of several fractions per controlled by computer, the procedure may be
analysis and hence creates additional issues fully automated. For greatest efficiency, the
such as the need for large fraction collector system may be programmed to collect only
beds and the need for secondary analysis using those peaks displaying the desired molecular
flow-injection mass spectrometry, LC-MS, or ions, or alternatively, all peaks displaying
LC-MS-MS to identify the appropriate frac- abundant ions within a specified mass range.
tions. When purification of large numbers of An example of the MS-guided purification of a
combinatorial libraries is required, this ap- compound synthesized during the parallel
proach can become prohibitively time consum- synthesis of a combinatorial library of discrete
ing and expensive. compounds is shown in Fig. 13.5. Although the
To enhance the efficiently of this purifica- crude yield of the reaction product was only
tion procedure, the steps of HPLC purification 30% (Fig. 13.5a), the desired product was de-
and mass spectrometric analysis may be com- tected based on its molecular ion (Fig. 13.5b).
bined into automated mass-directed fraction- After MS-guided fractionation, re-analysis us-
ing LC-MS showed that the desired product provide additional structural information
was >90% pure (Fig. 13.5~). through the use of CID to produce fragment
The use of MS-guided purification of com- ions. As discussed above (see also Table 13.11,
binatorial libraries provides a means for re- tandem mass spectrometers include triple
ducing the number of HPLC fractions col- quadrupole instruments, QqTOF mass spec-
lected per sample and eliminates the need for trometers, ion trap mass spectrometers, mul-
post-purification analysis to further charac- tiple sector magnetic sector instruments,
terize and identify each compound as would be FTICR instruments, and the new TOF-TOF
necessary when using UV-based fractionation. mass spectrometers. In most applications,
The ionization technique (i.e., electrospray, APCI or electrospray ionization is used.
APCI, or APPI), and ionization mode (positive In addition to molecular weight and frag-
or negative) must be suitable for the combina- mentation patterns, high precision and high
torial compound so that molecular ion species resolution mass spectrometers such as
are formed. Also, a suitable mobile phase and QqTOF instruments, reflectron TOF mass
HPLC column must be selected. As an alter- spectrometers, double focusing magnetic sec-
native to HPLC, supercritical fluid chroma- tor mass spectrometers, and FTICR instru-
tography-mass spectrometry (SFC-MS) has ments are necessary for the measurement of
been used for the high-throughput analysis of exact masses of drugs and drug candidates for
combinatorial libraries (18, 19). The advan- the determination of elemental compositions.
tages of SFC-MS relative to conventional The combination of high resolution and high
LC-MS for the purification of combinatorial precision is especially useful for determining
libraries of compounds are the lower viscosi- the elemental compositions of compounds in
ties and higher diffusivities of condensed CO, combinatorial library mixtures without hav-
compared with HPLC mobile phases and the ing to isolate each compound using chroma-
ease of solvent removal and disposal after tography or some other separation technique.
analysis. However, SFC instrumentation re- Because FTICR instruments and the hybrid
mains more expensive and less widely avail- QqTOF mass spectrometers are capable of si-
able than conventional HPLC systems. multaneously measuring exact masses at high
resolution of both molecular ions and frag-
ment ions generated during MS-MS, these .in-
2.2 Confirmation of Structure and Purity of
struments are becoming extremely popular
Combinatorial Compounds
within drug discovery programs.
The determination of molecular weights, ele- As an example of the exact mass measure-
mental compositions, and structures of com- ment of a combinatorial library mixture, the
pounds used for high-throughput screening, FTICR negative ion electrospray mass spectra
whether discrete compounds or combinatorial of a 36- and a 120-compound peptide library
library mixtures, is typically carried out using mixture are shown in Fig. 13.6. The resolution
mass spectrometry, because traditional spec- achieved in this experiment was 20,000-
troscopic and gravimetric techniques are too 40,000. Although the exact masses of all com-
slow to keep pace with combinatorial chemical ponents in a small combinatorial library can
synthesis. In addition, mass spectrometry may often be measured during a single infusion ex-
be used to assess the purity of compounds be- periment, on-line HPLC separation or the
ing used for high-throughput screening. The analysis of discrete compounds is sometimes
highest-throughput technique for confirming required to overcome ion suppression prob-
molecular weights and structures of drug can- lems. However, LC-MS is a relatively slow pro-
didates is flow injection analysis of sample so- cess because of the slow chromatographic sep-
lutions using electrospray, APCI, or APPI aration step. Because LC-MS is required in
mass spectrometry. Typically, no sample prep- many instances for the analysis of mixtures
aration is necessary. and to eliminate interfering salts or buffers,
Although any organic mass spectrometer two approaches have emerged to increase the
may be used to confirm the molecular weight throughput of this technique; parallel LC-MS
of a compound, tandem mass spectrometers and fast LC-MS. One approach to increasing
(a) Pro-X-Asp 621.2817 Ala-X-TyrMe

1.6 ppm Asn-X-Asp 657.3181
ASP-X-ASP
621.2807 638.2718 639.2559 0 PPm
I 1.6 ppm / 1.4 PPm I
I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I ~ I I I I J I I
620 630 640 650 660 mlz
(b) ThrMe-X-Asp
637.3136 639.2922
I Asp-X-Asp 2.5 ppm
639.2559
3.8 ppm /
Figure 13.6. (a) Partial negative ion electrospray mass spectrum of a 36-component library mix-
ture. Both the measured mass and the difference between the measured and theoretical values (in
ppm) are shown. (b) Negative ion electrospray spectrum of the 120-component library showing the
resolution of three nominally isobaric peaks. (Reproduced from b f . 24 by permission of Bentham
Science Publishers).
throughput of the rate-limiting chromato- spray interfaces and HPLC systems are now
graphic separation has been to simultaneously available that can accommodate up to eight
interface multiple HPLC columns to a single HPLC columns simultaneously (20-22). Al-
mass spectrometer. This approach is called though the multiple sprays are introduced to
parallel LC-MS. Commercial parallel electro- the ion source simultaneously, these streams
may be sampled in a time-dependent manner tion of active compounds or "hits" during

to minimize cross contamination between high-throughput screening. Although the syn-
channels. thesis and screening of discrete compounds
Another solution to increasing the (25) enables them to be followed through the
throughput of LC-MS has been to minimize entire process by using partial encoding or
the time required for HPLC separation bar-coding, it is sometimes advantageous to
through an approach called fast HPLC. HPLC screen libraries prepared as mixtures (26) and
separations are accelerated by using shorter use a technique such as mass spectrometry to
columns and higher mobile phase flow rates. rapidly identify the hit(s) in the mixture. One
Because coelution of some species is likely to approach to the rapid deconvolution of combi-
occur during fast chromatographic separa- natorial library mixtures is to prepare librar-
tions, the selectivity of the mass spectrometer ies containing compounds of unique molecular
is essential for the characterization and/or weight and then identify them using mass
quantitative analysis of the target compound. spectrometry. However, such libraries are
However, samples of compounds prepared us- necessarily small because the molecular
ing combinatorial chemistry are usually sim- weight of most drug-like molecules is between
ple mixtures of reagents, by-products, and 150-400 Da. Because of the molecular weight
products that require only partial chromato- degeneracy of larger combinatorial libraries,
graphic purification to prevent ion suppres- several encoding strategies have been devised
sion effects during mass spectrometric analy- to rapidly identify active compounds in these
sis. mixtures (27-29).
In addition to molecular weight determina- Because most combinatorial libraries con-
tion using conventional MS or high exact mass tain compounds with degenerate molecular
measurement and structural confirmation us- weights, various tagging strategies have been
ing MS-MS, fast LC-MS is also used to assess devised to uniquely identify library com-
the purity and yield of combinatorial products pounds bound to beads. Most of these tagging
(15, 23). Before high-throughput screening, approaches are based on the synthesis of en-
many researchers analyze combinatorial li- coding molecules. For example, peptide (30) or
braries for both purity and structural identity oligonucleotide (31) labels have been synthe-
using mass spectrometry to assure the validity sized on the beads in parallel to the target mol-
of structure-activity relationships that might ecules and then sequenced for bead decoding.
be derived from the screening data. Fast Alternatively, haloarene tags have been incor-
LC-MS and LC-MS-MS may be carried out to porated during synthesis and then identified
satisfy this requirement using gradients (usu- with high sensitivity using electron-capture
ally a step gradient with a reverse phase gas chromatography detection (32). In addi-
HPLC column) with a total cycle time of 1-3 tion to the increased time and cost for the syn-
min (24) or using an isocratic system requiring thesis of a library containing tagging moieties,
less than 1 min per analysis. A variety of the tagging groups themselves might interfere
HPLC columns are used for fast LC-MS that with screening giving false positive or nega-
include narrow bore (2-mm) and analytical tive results.
bore (4.6-mm) columns with length typically For peptide libraries, one solution to this
from 0.5-5 cm. The mobile phase flow rate for problem uses matrix-assisted laser desorption
these fast LC-MS analyses is usually from ionization (MALDI) mass spectrometry to di-
1.5-5 mL/min. rectly desorb and identify peptides from beads
that were screened and found to be hits (33).
2.3 Encoding and Identification of
This technique is called the termination syn-
Compounds in Combinatorial Libraries
thesis approach. Because the peptide library
and Natural Product Extracts
compounds are analyzed directly, products
The use of mass spectrometric identification with amino acid deletions or substitutions,
in combinatorial chemistry is not limited to side-reaction products, or incomplete depro-
the analysis of synthetic products as a means tection are readily observed. Also, because
of quality control, but also for the identifica- there are no extra molecules used for chemical
2 Current Trends and Recent Developments 597
tagging, this source of interference is avoided. based affinity screening methods have been
However, this approach is specific to peptide developed to streamline the tedious process
libraries and is not necessarily applicable to of activity-guided fractionation. These ap-
other types of combinatorial libraries. proaches are discussed in Section 2.4.
Another approach that eliminates possible Whether lead compounds in natural prod-
interference from the chemical tags, "ratio en- uct extracts are isolated using bioassay-guided
coding," has been developed for the mass spec- fractionation or mass spectrometry-based
trometric identification of bioactive leads us- screening, there is a high probability that the
ing stable isotopes incorporated into the structure of the active compound(s) has al-
library compounds (29,34). Within the ligand ready been reported in the natural product lit-
itself, the code might be a single-labeled atom erature. In such cases, the tedious process of
that is conveniently inserted whenever a com- complete structure elucidation using a battery
mon reagent transfers at least one atom to the of spectrometric tools should be unnecessary.
target compound or ligand. The code consists Instead, mass spectrometry alone may be used
of an isotopic mixture having one of the many to quickly "dereplicate" or identify the known
predetermined ratios of stable isotopes and compounds based on molecular weight, frag-
can be incorporated in the linker or added mentation patterns, and elemental composi-
through a reagent used during the synthesis. tion in combination with natural product da-
The mass spectrum of the compound shows a tabase searching (35-39). Commercially
molecular ion with a unique isotope ratio that available natural products databases include
codes for a particular library compound. For NAPRALERT (40), Scientific & Technical In-
example, Wagner et al. (29) used isotope ratio formation Network (STN) (41), and the Dic-
encoding during the synthesis of a 1000-com- tionary of Natural Products (42). Because
pound peptoid library and was able to identify some of these databases also contain WIVIS
uniquely all the components based on their absorbance data, it is also advantageous to use
isotopic patterns and molecular weights. Be- a photodiode array detector between the
cause isotope ratio codes are contained within HPLC and mass spectrometer to obtain addi-
each combinatorial compound, a chemical tag tional spectrometric data during LC-W-MS
is not required. The speed of MS-based decod- dereplication (36, 37).
ing outperforms most other decoding technol-
2.4 Mass Spectrometry-Based Screening
ogies, which are time consuming and decode a
restricted set of active compounds. The earliest approaches to combinatorial syn-
Although combinatorial synthesis provides thesis used portioning and mixing (26)and en-
rapid access to large numbers of compounds abled the synthesis of combinatorial libraries
for screening during drug discovery and lead containing hundreds of thousands to millions
optimization, these libraries are usually based of compounds. Today, this approach remains
on a small number of common structures or the most efficient method for preparing enor-
scaffolds. There is a constant need for increas- mous libraries of compounds. However, until
ing the molecular diversity of combinatorial the mid-1990s, efficient screening techniques
libraries and finding new scaffolds, and natu- did not exist to rapidly identify the "hits"
ral products have always been a rich source of within large combinatorial mixtures. There-
chemical diversity for drug discovery. The tra- fore, chemists were motivated to develop ways
ditional approach to screening natural prod- to prepare large numbers of discreet com-
ucts for drug leads uses bioassays to test or- pounds using massively parallel synthesis,
ganic solvent extracts for activity. If strong which could be assayed quickly for pharmaco-
activity is detected, then activity-guided frac- logical activity using high throughput screen-
tionation of the crude extract is used to isolate ing one compound at a time. Recently, several
the active compound(s),which is identified us- mass spectrometry-based screening assays
ing mass spectrometry (including tandem have been developed that are suitable for
mass spectrometry and exact mass measure- screening combinatorial library mixtures, and
ments), IR, W M S spectrometry, and NMR. some are even useful for screening natural
Recently, a variety of mass spectrometry- product extracts which have always been a
Binding
Library + R R R +
Affinity column
Wash unbound library
compounds to waste
Isolation
Elute bound ligands
using pH change
I
LC-MS-MS
identification
Trap ligands on C,,
column
Elute trapped ligands

onto HPLC column
I
Figure 13.7. Affinity chromatography
combined with LC-MS-MS for screening
combinatorial library mixtures.
source of molecular diversity for drug discov- In some applications (43), ligands are
ery. All of the mass spectrometry-based eluted from the affinity column and then
screening methods use receptor binding of li- trapped on a second column such as a reverse
gands as the basis for identification of lead phase HPLC column. LC-MS or LC-MS-MS
compounds. identification of the ligands (hits) is then car-
ried out using the trapping column. In other
2.4.1 Affinity Chromatography-Mass Spec- systems, ligands are identified directly from
trometry. Since the introduction of affinity the affinity column using mass spectrometry
chromatography more than 30 years ago, this (44). For example, Kelly et al. (44) prepared an
-
technique has become a standard biochemical affinity column containing immobilized phos-
tool for the isolation and identification of new phatidylinositol-3-kinase and used it for direct
binding partners to specific target molecules. LC-MS screening of a 361-component peptide
Therefore, the coupling of affinity chromatog- library. Electrospray mass spectrometry and
raphy to mass spectrometry is a logical exten- tandem mass spectrometry were used to iden-
sion of this technique, and the application of tify the ligands released from the affinity col-
affinity LC-MS to the screening of combinato- umn using pH gradient elution.
rial libraries has been demonstrated by sev- Advantages of affinity chromatography-
eral groups (43, 44). During affinity LC-MS mass spectrometry for screening during drug
screening, a receptor molecule such as a bind- discovery include versatility and re-use of the
ing protein or enzyme is immobilized on a column. Both combinatorial libraries and nat-
solid support within a chromatography col- ural product extracts can be screened using
umn. The library mixture is pumped through this approach, and a wide range of binding
the affinity column in a suitable binding buffers may be used. Mass spectrometry-com-
buffer so that any ligands in the mixture with patible mobile phases are only required during
affmity for the receptor would be able to bind. the final LC-MS detection step. Furthermore,
Then, unbound material is washed away. Fi- a single column may be used multiple times to
nally, the specifically bound ligands are eluted screen different samples for ligands unless the
using a destabilizing mobile phase and identi- destabilization solution irreversibly dena-
fied using mass spectrometry. This affinity- tures, releases, or inhibits the receptor.
column LC-MS assay is summarized in Fig. Despite these advantages, affinity chroma-
13.7. tography has numerous drawbacks that have
Binding 0 0
0
+ R L-R + O 0
0 0 0
GPC isolation
L-R Figure 13.8. GPC followed by LC-MS-MS

n . for screening mixtures of combinatorial li-
Reversed phase braries. After incubation of a receptor with a
identification desaltingldenaturation library of compounds, the ligand-receptor
complexes (L-R) are separated from the low
molecular weight unbound library com-
pounds using GPC. Next, the L-R complexes
are denatured during reversed phase HPLC
to release the ligands for MS-MS identifica-
tion.
prompted the development of alternative mass phase HPLC and identified either on-line or
spectrometer screening tools. For example, im- off-line using tandem mass spectrometry.
mobilization of the receptor might change its af- This screening method is illustrated in Fig.
finity characteristics causing false negative or 13.8.
false positive hits. This is particularly problem- During the pre-incubation and GPC steps,
atic for receptors that are solution-phase in their any binding buffer may be used, because the
native state. Also, developing and then imple- binding buffer will be removed during reverse
menting an immobilization scheme is often a phase LC-MS analysis. However, the GPC sep-
slow, tedious, and even expensive process, and aration step must be carried out quickly, be-
this process is unique for each new receptor. Fi- cause ligands begin to dissociate from the re-
nally, false positive hits are often obtained when ceptor immediately and can become lost into
screeninglarge molecularly diverse libraries, be- the size exclusion gel. Despite this disadvan-'
cause there are usually compounds in such mix- tage, this approach allows both receptor and
tures that have affinity for the stationary phase ligand to be screened in solution, which avoids
or linker molecule instead of the receptor. some of the problems associated with the use
of affinity columns for screening. The GPC
2.4.2 Gel Permeation Chromatography- LC-MS-MS screening method should also be
Mass Spectrometry. Another type of chroma- suitable for screening natural product ex-
tography that has been combined with mass tracts as well as combinatorial library mix-
spectrometry as a screening system for drug tures.
discovery is gel permeation chromatography
(GPC) (45,461.Also called size-exclusion chro- 2.4.3 Affinity Capillary Electrophoresis-
matography, GPC separates molecules accord- Mass Spectrometry. Affinity capillary electro-
ing to size as they pass through a stationary phoresis was originally used for the determi-
phase containing particles with a defined pore nation of the binding constants of small
size. During GPC-based screening, a library molecules to proteins (47-49). This solution-
mixture is pre-incubated with a macromolec- based technique is rapid and requires only
ular receptor to allow any ligands in the li- small amounts of ligands. Affinity constants
brary to bind, and then GPC is used to sepa- are measured based on the mobility change of
rate the large receptor-ligand complexes from the ligand on interaction with the receptor
the unbound low molecular weight com- present in the electrophoretic buffer (50). By
pounds in the mixture. Finally, ligands are re- combining affinity capillary electrophoresis
leased from the receptor during reversed with on-line mass spectrometric detection and
I I I I I I 1
0 1 2 3 4 5 6
Migration time (min)
Figure 13.9. Affinity capil-
lary electrophoresis-UV-mass
spectrometry of a 100-tetrapep-
tide library weened for binding
to vancomycin (104 pikf in the
electrophoresis buffer). (a) The
elution of peptides was moni-
tored with UV absorbance dur-
ing capiuary electrophoresis,
and the elution time increased
with increasing affinity for van-
comycin. 6) Positive ion electm-
spray mass spedrum with CID
of the Tris adduct of the proton-
ated peptide detected at -5 rnin
in the electropherogram shown
in a (Reproduced from Ref. 52
by permission of the American
Chemical Society.)
identification, affinity constants for multiple Tris, which was used in the electrophoresis
compounds can be measured in a single anal- buffer. Although the identification of this pep-
ysis (51). Recognizing that on-line mass spec- tide was not prevented by the formation of this
trometric detection was helpful for the identi- adduct, some buffers used during electro-
fication of each ligand, Chu et al. (52) extended phoresis might interfere with mass spectro-
this approach to include the screening of com- metric ionization and detection. Also, the
binatorial libraries as a means of drug discov- types of libraries that have been screened us-
ery. The data in Fig. 13.9 show the results of ing this approach have contained modest
screening a 100-tetrapeptide library for affin- numbers of synthetic analogs such as pep-
ity to vancomycin using affinity capillary elec- tides. Libraries exceeding 400 members re-
trophoresis-mass spectrometry. Without van- quired preliminary purification using affinity
comycin in the electrophoresis buffer, all the chromatography to reduce the number of com-
peptides eluted within 3 min. When vancomy- pounds (52). As a result, this approach is prob-
cin was present, the peptides eluted in order of ably not ideal for screening libraries contain-
affinity, with the highest affmity compounds ing molecularly diverse compounds or for
being detected between 4.5 and 5 min. Positive screening natural product extracts. However,
ion electrospray tandem mass spectrometry affinity capillary electrophoresis-mass spec-
was used to identify the highest affinity li- trometry is fast; each analysis requires less
gands (see Fig. 13.9b). than 10 min. Also, it may be used to measure
Note that some peptide ligands such as affinity constants for ligand-receptor interac-
Fmoc-DDFA were detected as adducts with tions.
2.4.4 Frontal Affinity Chromatography- Because all library compounds must be moni-
Mass Spectrometry. Like affinity chromatog- tored simultaneously, the compounds must be
raphy-mass spectrometric screening (see Sec- selected so that they have unique molecular
tion 2.4.1), frontal affinity chromatography weights. Also, one compound in the mixture
uses an aMinity column containing immobi- should not suppress the ionization of another.
lized receptor molecules (53). The difference Therefore, this approach is probably re-
between the two screening methods is that the stricted to the screening of small combinato-
ligands are continuously infused into the col- rial libraries that are similar in chemical
umn during frontal affinity chromatography structure and ionization efficiencies. Finally,
and detected using mass spectrometry. Com- the binding buffer used for affinity chromatog-
pounds with no affinity for the immobilized raphy must be compatible with on-line APCI
receptor elute immediately in the void volume, or electrospray mass spectrometry. This
but the elution of the ligands is delayed. As means that the mobile phase must be volatile
compounds compete for binding sites on the and usually of low ionic strength (i.e., typically
affinity column, these sites become saturated <40 mM for electrospray ionization).
until ligands begin to elute from the column at
their infusion concentration. In this manner, 2.4.5 Bioaffinity Screening using Electro-
frontal affinity chromatography may be used spray FTICR Mass Spectrometry. Although
to measure affinity constants for ligands, and FTICR mass spectrometry may be used to de-
by using a mass spectrometer for on-line iden- termine the exact masses of combinatorial li-
tification of ligands, this technique becomes a brary compounds and to confirm their struc-
screening method (54,55). tures using CID and high resolution tandem
During frontal affinity chromatography- mass spectrometry (see definitions of CID and
mass spectrometry, signals for all compounds MS-MS in Section I), electrospray FTICR
eluting from the affinity column are recorded mass spectrometry may be used for the direct
by the mass spectrometer, and the last com- screening of combinatorial libraries without
pounds to elute at their infusion concentra- the need for any pre-purification or chroma-
tions represent the highest affinity com- tography. In this application, a combinatorial
pounds or "hits." An example of the screening library is pre-incubated with a receptor in so- .
of six oligosaccharides with different binding lution and then analyzed directly using elec-
affinities for an immobilized monoclonal car- trospray to identify receptor-ligand com-
bohydrate-binding antibody is shown in Fig. plexes in the gas phase (56-60). Once a
13.10. Compounds 1-3 eluted immediately (no receptor-ligand complex is ionized and
affinity), whereas compounds 4-6 eluted in trapped in the FTICR mass spectrometer, the
order of increasing affinity for the antibody. mass difference between the complex and the
Dissociation constants were determined to be receptor alone might be measured with suffi-
185, 12.6, and 1.8 p M for compounds 4-6, re- cient resolution and accuracy to determine the
spectively (54). mass(es) and perhaps elemental composi-
Because frontal affinity chromatography t i o n ( ~of
) the ligand(s). If the ligand carries a
uses a conventional affinity column, this tech- charge, then CID may be used to dissociate the
nique provides additional applications of this ligand for subsequent analysis using tandem
type of column to investigators already using mass spectrometry. This elegant and simple
affinity-mass spectrometry (See Section screening approach is summarized in Fig.
2.4.1). However, the same limitations and dis- 13.11.
advantages of using immobilized receptors An extension of this FTICR mass spectrom-
still apply, such as non-specific binding to the etry-based screening technique has been to
stationary phase, the development time and screen a combinatorial library for ligands to
cost of preparing the affinity columns, and the two receptors simultaneously (59,60). In this
possibility that immobilizing the receptor example, the two receptors consisting of RNA
might alter its binding characteristics and constructs representing the prokaryotic (16s)
specificity. In addition, mass spectrometric de- rRNA and eukaryotic (18s) rRNA A-site were
tection creates some additional limitations. incubated simultaneously with an aminogly-
-
-
-
-
-
I I I I I
10 20 30 40 50
tlmin-
10 20 30 40 50
tlmin-
Figure 13.10. Frontal affkity chromatography-mass spectrometry screening of a 6-oligosaccharide

mixture for affinity to an immobilized carbohydrate-binding monoclonal antibody. Top: positive ion
electrospray total ion chromatogram. Middle: computer-reconstructedmass chromatograms for the
molecular ion species of all six compounds. Compounds 1-3 (solid line) eluted in the void volume,
indicating no binding to the antibody. Break through signals for compounds 4, 5, and 6 appear at
successively later times, indicating increasing affinity for the immobilized antibody. Bottom: positive
ion electrospray mass spectra recorded at times I, 11, and I11 as indicated in the middle trace. The
protonated molecules of compounds 1-6 are labeled.
coside library to identify potential ligands. By buffer and receptors that may be used. Only
screening a target mixture against the same low ionic strength and volatile buffers are
library, screening efficiency is enhanced and compatible with this approach (such as 10 mM
the number of analyses required is reduced. ammonium acetate). Also, the receptor and li-
The advantage of this screening method gand must be highly purified to avoid impuri-
over other approaches is the elimination of pu- ties that might interfere with ionization and
rification steps before mass spectrometric detection. Therefore, this technique is proba-
identification. Also, the disadvantages associ- bly more suitable for the screening of combi-
ated with chromatographic separations are natorial libraries than complex natural prod-
eliminated. However, the use FTICR mass uct mixtures. Finally, the receptor-ligand
spectrometric screening restricts the binding complex must ionize efficiently during electro-
[L- RJ'
MS/MS
identification
1 FTICR-MS
dissociation Figure 13.11. Bioaffinity electrospray
FTICR mass spectrometry. The isolation and
mass spectrometric identification of receptor-
specificligands are carried out entirely in the
mass spectrometer without chromatography
or other separation steps.
spray under solvent and ion source conditions The principle of pulsed ultrafiltration screen-
that do not cause dissociation of the complex. ing of combinatorial libraries is shown in Fig.
13.12. During pulsed ultrafiltration, ligand-
2.4.6 Pulsed Ultrafiltration-Mass Spectrom- receptor complexes remain in solution in the
etry. A versatile approach to screening solu- ultrafiltration chamber while unbound library
tion phase combinatorial libraries and natural compounds and buffer are washed away. After
product extracts is pulsed ultrafiltration- unbound compounds are removed, the hits
mass spectrometry (61,62), which uses a stan- from the library are eluted from the chamber
dard LC-MS system with an ultrafiltration by destabilizing the ligand-receptor complex
chamber substituted for the HPLC column. using an organic solvent, a pH change, or a
Unbound
Ligand-receptor compounds
complexes / Wash unbound
compounds to waste
n
Pulse containing a Ultrafiltration Elute bound ligands

plant extract or separation into a trapping column
library of or directly into MS
compounds
LC-MS
identification Elute desalted
ligands into MS
mlz
Figure 13.12. Combinatorial library screening using pulsed ultrafiltration mass spectrometry. Dur-
ing the loading step (left),ligands are bound to the receptor either on-line (top) using a flow-through
approach or off-line (bottom two incubations). Unbound compounds and binding buffer, cofactors,
etc. are washed out of the ultrafiltration chamber to waste during a separation step (middle). Bound
ligands are dissociated from the receptor molecules and eluted from the chamber by introducing a
destabilizing solution such as methanol, pH change, etc. Finally, released ligands are identified using
mass spectrometry, tandem mass spectrometry, or LC-MS (right). (Reproduced from Ref. 64 by
permission of John Wiley & Sons.)
267
100- 129 Library without adenosine
50-
C
0
142 172 207 299 354 375
Z: 410
?? 0 . 'b'l bn !"# : ! 8 ' 8 8 8 8 8
EHNA
\
Library with adenosine
Figure 13.13. Identification of EHNA as the highest affinity ligand for adenosine deaminase in a
combinatorial library of 20 adenosine analogs using ultrafdtration electrospray mass spectrometry.
(Reproduced from Ref. 61 by permission of the American Chemical Society.)
combination of both. The released ligands are tion (Fig. 13.13, Control). Despite being
identified on-line using APCI or electrospray present at a 10-fold lower concentration than
mass spectrometry (61) or collected and ana- the natural substrate adenosine analogs,
lyzed off-line using mass spectrometry, LC- EHNA was easily identified because it had the
MS, or LC-MS-MS (63). highest affinity among the library compounds
An example of pulsed ultrafiltration mass (K,= 1.9 nM). This demonstrates the use of
spectrometry for the screening of a library of ultrafiltration electrospray mass spectrome-
20 adenosine analogs for ligands to adenosine try for identifying a high affinity ligand among
deaminase is shown in Fig. 13.13. After a 15- a set of analogs that bind to a specific receptor.
min preincubation of the library compounds In a follow-up lead optimization study using
(17.5 p.M each except for EHNA, which was pulsed ultrafiltration mass spectrometry, a
present at 1.75 p.M) with 2.1 p.M adenosine synthetic combinatorial library of EHNA ana-
deaminase in 50 mM phosphate buffer, an al- logs was screened for binding to adenosine
iquot containing 420 pmol of the receptor was deaminase, and structure-activity relation-
injected into the ultrafiltration and washed for ships for EHNA binding were identified (65).
8 min at 50 pL1min with water to remove the As an illustration of the versatility of
phosphate buffer and unbound or weakly pulsed ultrafiltration-mass spectrometry,
binding library compounds. Methanol was in- binding assays for a variety of receptors have
troduced into the mobile phase to dissociate been reported including dihydrofolate reduc-
the enzyme-ligand complex and release bound tase (631, cyclooxygenase-2 (621, serum albu-
ligands for identification by electrospray mass min (66, 67) and estrogen receptors (68). Not
spectrometry. During methanol elution, only only is pulsed ultrafiltration useful for identi-
EHNA [erythro-9-(2-hydroxy-3-nonyl) ade- fying ligands to different receptors, but a wide
nine] was detected as the [M+HIt ion of mlz range of combinatorial libraries and natural
278 (Fig. 13.13). In control experiments using product extracts in any suitable binding buffer
the library without enzyme, no library com- may be screened. In addition to combinatorial
pounds were detected during methanol elu- libraries, complex natural product extracts
have been screened (68),and neither plant nor trol injection is used to control for non-specific
fermentation broth matrices were found to in- binding to the apparatus. Because the concen-
terfere with screening (62). As another exam- tration of receptor and total amount of liquid
ple of the flexibility of this screening system, a are known, and because the concentration of
centrifuge tube equipped with an ultrafiltra- free ligand is measured as it elutes from the
tion membrane (69) has been used instead of chamber over a wide range of concentrations,
an on-line ultrafiltration chamber. Other ap- the affinity constant and other binding param-
plications of pulsed ultrafiltration-mass spec- eters may be calculated.
trometry include screening drugs and drug In most of the applications of pulsed ultra-
candidates for metabolic stability (701, meta- filtration to date, serial analyses were carried
bolic activation to reactive metabolites (711, out with a throughput of approximately one or
and the measurement of affinity constants for two assays per hour. Because the purpose of
ligand-receptor interactions (66, 67). these assays was to screen complex mixtures
Metabolism and toxicity screening appli- or to obtain metabolism data for new drug en-
cations of pulsed ultrafiltration use hepatic tities, the throughput of these analyses was
microsomes in the ultrafiltration chamber. acceptable, but was not high throughput. The
For metabolic screening drugs and the co- rate limiting step in these analyses was the
factor nicotinamide dinucleotide phosphate ultrafiltration separation and not the mass
(NADPH) are flow-injected through the ul- spectrometric detection. Two solutions have
trafiltration chamber (oxygen is dissolved in been reported to increase the throughput of
the mobile phase), and the metabolites pulsed ultrafiltration mass spectrometry. In
formed by microsomal cytochrome P450 and the first solution, van Breemen et al. (70) used
any unreacted compounds flow out of the a multiplex ultrafiltration system in which up
chamber for mass spectrometric identifica- to 60 ultrafiltration chambers could be ar-
tion and/or quantitative analysis (70). On- ranged in parallel and interfaced to a single
line applications require the use of volatile mass spectrometer. This scheme is shown in
buffers, but LC-MS and LC-MS-MS may be Fig. 13.14. In this system, a continuous flow of
used off-line to analyze the ultrafiltrate no the buffer or mobile phase is maintained
matter what buffer had been used. Screen- through the ultrafiltration chambers, but the
ing drugs for metabolic activation using mass spectrometer samples each ultrafiltrate -
pulsed ultrafiltration-mass spectrometry is solution at 1-minintervals. The sampling time
carried out in a similar manner, except that would be selected to correspond to the time at
glutathione is coinjected along with NADPH which a maximum concentration of metabo-
and the drug substrate (71). MS-MS may be lites would be expected to elute from the
used on-line or LC-MS-MS may be used off- chamber. This approach was demonstrated to
line to screen for glutathione adducts as an increase the throughput of metabolic screen-
indication that the drug was metabolized to ing using ultrafiltration mass spectrometry by
a reactive intermediate(s) that was trapped 60-fold. Although used originally for meta-
by reaction with glutathione. Finally, pulsed bolic screening, this approach would be appli-
ultrafiltration may be used with UV or mass cable to toxicity screening and drug discovery
spectrometric detection to measure affinity screening as well.
constants of individual compounds (66). The second solution to increasing the
To measure affinity constants and other throughput of pulsed ultrafiltration mass
physico-chemical properties of binding such as spectrometry has been to miniaturize the ul-
the number of binding sites, two pulsed ultra- trafiltration chamber volume while maintain-
filtration measurements are carried out. First, ing the flow rate and chamber pressure. Be-
an aliquot or pulse of a liquid is injected cause the ultrafiltration membrane cannot
through the chamber, and the elution profile withstand high pressure without rupturing,
is recorded. Then, the chamber is loaded with the ultrafiltration process cannot be acceler-
a receptor, and the ligand is reinjected. If bind- ated simply by increasing the flow rate
ing occurs, the elution profile will be delayed through the chamber. The approach of Bev-
in proportion to the affinity constant. The con- erly et al. (72) was to fabricate a 35-pL ultra-
La Switching-
Autoinjector
Parallel
ultrafiltration
chambers
Figure 13.14. High-throughput pulsed ultrafiltration mass spectrometry system for screening drug
candidates for metabolic transformation. Multiple ultrafiltration chambers are connected in parallel
to a single mass spectrometer detector. After loading each chamber with liver microsomes, a different
drug is injected into each chamber at intervals of 1 min (for 60 screensh using 60 chambers).
Constant flow of incubation buffer is maintained through all chambers, but only one chamber at a
time is connected on-line to the mass spectrometer. Drug metabolite profiles are recorded using mass
spectrometry for up to 1 min per chamber. (Reproduced from Ref. 70 by permission of the American
Society for Pharmacology and Experimental Therapeutics.)
filtration chamber that was approximately disadvantages of pulsed ultrafiltration screen-

threefold lower in volume than the smallest ing for drug discovery include the washing
reported by van Breemen et al. (61). As a re- step, during which dissociation and loss of
sult, ultrafiltration mass spectrometric analy- weakly bound ligands might occur, and the
ses could be carried out at the rate of at least slow speed of each experiment, which can take
three per hour, which corresponded to a three- up to 1 h.
fold enhancement of throughput. This study
suggests that chip-based ultrafiltration mass 2.4.7 Solid Phase Mass Spectrometric
spectrometry would have the potential to re- Screening. Because drugs are usually in a sol-
sult in a truly high-throughput system. uble form to be transported to the active sites
The advantages of pulsed ultrafiltration- in cells and tissues, it is logical that most mass
mass spectrometry include the variety of dif- spectrometry-based screening methods use
ferent applications that may be carried out, solution-phase analysis of these compounds,
the convenience of on-line screening, solution- and it is no surprise that most successful mass
phase screening, the ability to screen either spectrometry screening assays use electro-
combinatorial libraries or natural product ex- spray ionization or APCI. However, solid
tracts, the diversity of receptors that may be phase ionization techniques such as matrix-
screened, and the freedom to use either vola- assisted laser desorption ionization (MALDI)
tile or non-volatile binding buffers. For meta- might be effective, provided that ligand-re-
bolic and toxicity screening, flow injection ceptor interactions are allowed to take place
analyses have the additional advantages that in an environment similar to in vivo condi-
product feedback inhibition is prevented so tions and that a suitable separation step is
that the metabolic profile more closely approx- carried out before the preparation of the
imates the i n vivo system (70). Finally, the MALDI sample.
3 Things to Come
To use MALDI mass spectrometry for 3 THINGS TO COME

screening, several research groups have devel-
oped immobilized receptors on MALDI targets Mass spectrometry has become an essential
or on solid supports that can be placed on a analytical tool at every stage of the drug dis-
MALDI target for use in the affinity purification covery and development process. In this chap-
ofpotential drugs from test solutions. Following ter, the various applications of mass spectrom-
procedures originally developed for afKnity etry to combinatorial chemistry and drug
chromatography, the preparation of affinitysur- discovery have been highlighted. Although the
faces for MALDI mass spectrometry has been speed of mass spectrometry matches the de-
achieved quite easily. However, the use of these mands of combinatorial chemistry, the slow
f f i i t y MALDI chips for screening mixtures of and serial nature of chromatography in the
small molecules during drug discovery has been various LC-MS applications remains a bottle-
unproductive. One of the problems has been the neck that limits their throughput. Because
high background noise at low mlz values caused mass spectrometry is highly selective, only
by the matrix used for MALDI. This problem partial chromatographic separations are
may be mitigated by eliminating the matrix or needed for most measurements. In fact, the
using alternative sample stages such as porous primary function of the chromatography step
silicon chips (73, 74). However, noise persists is usually to separate species that might oth-
because of the affinity support and immobilized erwise interfere with the ionization process.
receptor molecules. Another problem that has Recognizing this limited function of chroma-
yet to be overcome is the elimination of the high tography during LC-MS-based screening as-
background noise caused by non-specific bind- says, manufacturers of chromatography col-
ing of test compounds to the affinity target. Al- umns are addressing this need by developing
though this problem is similar to the false posi- high-throughput columns for fast chromatog-
tive results and non-specific binding that occurs raphy for LC-MS. Improvements in this direc-
during afKnity chromatography-mass spec- tion should continue to reduce the time re-
trometry (see Section 2.4.11, the signals for non- quired for LC-MS from a few minutes to a few
specific binding are magnified by the fad that seconds. Meanwhile chip-based technology is
the actual affinity surface is being irradiated and beginning to emerge for miniaturized capil-
sampled by the MALDI laser beam. As a result, lary electrophoresis-mass spectrometry (CE- '
affinity-based screening coupled with MALDI MS) (79). These chips are being developed to
mass spectrometry has not been a successful enable ultrafast and highly sensitive electro-
drug discovery approach. spray mass spectrometric analysis. Because of
However, progress is being made in the use their microscopic size, CE-MS chips have the
of affmity probes for the capture of proteins potential to hold large arrays of samples that
and other macromolecules from biological so- would facilitate high-throughput analysis.
lutions followed by MALDI mass spectromet- In terms of mass spectrometry instrumen-
ric detection and identification (75-77). One tation, the currently available instruments
affinity MALDI mass spectrometry method such as time-of-flight (TOF) analyzers and hy-
has been paired with the affinity probes using brid quadrupole-TOF analyzers are able to ac-
in surface plasmon resonance systems (78). quire complete mass spectra at rates compat-
These affinity-based MALDI mass spectrome- ible with fast CE separations. As CE or
try screening assays are promising approaches ultrafast chromatography replaces conven-
for testing blood or other biological fluids for tional, slow HPLC applications, TOF-based
the presence of specific proteins or other mac- mass spectrometers will be needed to replace
romolecules. As a result, these have the poten- the less efficient scanning types of instru-
-
tial to become clinical diagnostic tools or ments such as quadrupoles and ion traps for
might even lead to the identification of new most high-throughput applications. FTICR
therapeutic targets. However, they are un- mass spectrometry remains unsurpassed in
likely to become useful for screening combina- terms of resolution and mass accuracy for both
torial libraries or natural ~ r o d u cextracts
A
t for MS and MS-MS applications. However, the
the purpose drug discovery. throughput of FTICR mass spectrometric
analysis needs to be increased to remain use- tion about mass spectrometry and links to a
ful for combinatorial chemistry applications. variety of reference materials regarding bio-
Advances in increasing the throughput of medical mass spectrometry.
FTICR mass spectrometry are anticipated. http://www.bentham.org/cchts/ Combinato-
Hyphenated technologies such as LC- rial Chemistry & High Throughput Screen-
NMR-MS are being developed to support ing
structure elucidation of combinatorial librar- 0 http://pubs.acs.org/journals/jcchff/ Journal
ies (80).Although such technologies are still in of Combinatorial Chemistry
a developmental stage, they have great poten-
0 http://www.5z.com/moldiv/ Molecular Di-
tial for analyses of combinatorial libraries and
versity
for natural product drug discovery (81-83).
The main impediments of applying LC- http://www.liebertpub.com/BSC/defaultl.asp
NMR-MS to combinatorial chemistry remain Journal of Biomolecular Screening
poor sensitivity of the NMR, the obligatory use R. B, Cole, Ed, Electrospray Ionization Mass
of deuterated solvents for chromatography, Spectrometry, John Wiley and Sons, New
and the low throughput of NMR analyses. York, 1997.
However, efforts are in progress to improve 0 F. W. McClafferty, and F. Turecek, Interpre-
the throughput of NMR analyses (84-86). tation of Mass Spectra, 4th ed, University
In conclusion, mass spectrometry provides Science Books, Mill Valley, CA, 1993.
rapid, reliable, sensitive, and selective analy- W. M. Niessen, J. Chromatogr. A, 856,179-
sis of combinatorial libraries for structure 189 (1999).
confirmation, purity analysis, and library de-
0 J . T. Watson, Introduction to Mass Spec-
convolution. In addition, mass spectrometric
trometry, 3rd ed, Lippincott-Raven, Phila-
screening methods have been developed and
delphia, PA, 1997.
are beginning to be applied to drug discovery.
In the case of natural products, mass spec-
trometry facilitates the screening of natural 5 ACKNOWLEDGMENTS
product extracts and facilitates the dereplica-
tion and characterization of lead compounds. I thank Young Geun Shin, Benjamin Johnson,
At different times during the last 100 years, and Jennifer Mosel for help in writing and pre-
first physicists and physical chemists and then paring this chapter.
organic chemists pronounced that mass spec-
trometry had run out of new applications and
REFERENCES
had no future. Fortunately, they were wrong.
Today, medicinal chemists recognize that the 1. F. Field, J. Am. Soc. Mass Spectrom, 1,277-283
(1990).
potential of mass spectrometry to contribute
to all facets of drug discovery has only just 2. M. Barber, R. S. Bordoli, G . J . Elliott, R. D.
Sedgwick, and A. N . Tyler, Anal. Chem., 54,
begun to be explored. Furthermore, applica-
645A-657A (1982).
tions of mass spectrometry to drug develop-
3. F. Hillenkamp, M . Karas, R. C. Beavis, and B. T .
ment are even less developed and are waiting
Chait, Anal. Chem., 63, 1193A-1203A (1991).
to be developed. Mass spectrometry has be-
4. Y. Ito, T . Takeuchi, D. Ishii, and M . Goto,
come a fundamental analytical tool for drug J. Chromatogr, 346,161-166 (1985).
discovery, and this role should continue to
5. L. Chen, M . Stacewicz-Sapuntzakis, C. Duncan,
grow in the future.
R. Sharifi, L. Ghosh, R. van Breemen, D. Ash-
ton, and P. E. Bowen, J. Natl. Cancer Znst., 93,
1872-1879 (2001).
4 WEB SITE ADDRESSES AND 6. J. Liu, J. E. Burdette, H . X u , C. Gu, R. B. van
RECOMMENDED READING FOR FURTHER Breemen, K. P. L. Bhat, N . Booth, A. I. Con-
INFORMATION stantinou, J . M . Pezzuto, H . H . S. Fong, N . R.
Farnsworth, and J . L. Bolton, J. Agric. Food
0 http://www.asms.org Homepage of the Chem., 49,2472-2479 (2001).
American Society for Mass Spectrometry. 7. J. A. Syage and M. D. Evans, Spectroscopy, 16,
This web site contains additional informa- 14-21 (2001).
References
8. E. M. Gordon, M. A. Gallop, and D. V. Patel, Acc. 29. D. S. Wagner, C. J. Markworth, C. D. Wagner,

Chem. Res., 29, 144-154 (1996). F. J. Schoenen, C. E. Rewerts, B. K. Kay, and
9. L. A. Thompson and J. A. Ellman, Chem. Rev., H. M. Geysen, Comb. Chem. High Throughput
29,132-143 (1996). Screen., 1, 143-153 (1998).
10. J. A. Loo, Eur. Mass Spectrom, 3,93-104 (1997). 30. J. M. Kerr, S. C. Banville, and R. N. Zucker-
11. J. N. Kyranos and J. C. Hogan, Anal. Chem., 70, mann, J. Am. Chem. Soc., 115, 2529-2531
389A-395A (1998). (1993).
12. Y. Dunayevskiy, P. Vouros, T. Carell, E. A. 31. S. Brenner and R. A. Lerner, Proc. Natl. Acad.
Wintner, and J. Rebek. Anal Chem., 67,2906- Sci. USA, 89,5381-5383 (1992).
2915 (1995). 32. M. H. J. Ohlmeyer, R. N. Swanson, L. W. Dil-
13. Y. Dunayevskiy, Y. V. Lyubarskaya, Y. H. Chu, lard, J. C. Reader, G. Asouline, R. Kobayashi, M.
P. Vouros, and B. L. Karger, J. Med. Chem., 41, Wigler, and W. C. Still, Proc. Natl. Acad. Sci.
1201-1204 (1998). USA, 90,10922-10926 (1993).
14. P. A. Demirev and R. A. Zubarev, Anal. Chem., 33. R. S. Youngquist, G. R. Fuentes, M. P. Lacey,
69,2893-2900 (1997). and T. Keough, Rapid Commun. Mass Spec-
15. L. Zeng, L. Burton, K. Yung, B. Shushan, and trom., 8, 77-81 (1994).
D. B. Kassel, J. Chromatogr. A., 794, 3-13 34. G. Karet, Drug Discovery Dev., 1,32-38 (1999).
(1998). 35. 0. Potterat, K. Wagner, and H. Haag, J. Chro-
16. L. Zeng and D. B. Kassel, Anal. Chem., 70, matogr. A, 872, 85-90 (2000).
4380-4388 (1998). 36. G. A. Cordell and Y. G. Shin, Pure Appl. Chem.,
17. J. P. Kiplinger, R. 0. Cole, S. Robinson, E. J. 71,1089-1094 (1999).
Roskamp, R. S. Ware, H. J. O'Connell, A. Brails- 37. Y. G. Shin, G. A. Cordel1,Y. Dong, J. M. Pezzuto,
ford, and J . Batt, Rapid Commun. Mass Spec- AVNA Rao, M. Ramesh, B. R. Kumar, and M.
trom, 12,658-664 (1998). Radhakishan, Phytochem. Anal., 10, 208-212
18. M. C. Ventura, W. P. Farrell, C. M. Aurigemma, (1999).
and M. J . Greig, Anal. Chem., 71, 2410-2416 38. C. L. Zani, T. M. A. Alves, R. Queiroz, M. A. L.
(1999). Chaves, E. S. Fontes, Y. G. Shin, and G. A. Cor-
19. M. C. Ventura, W. P. Farrell, C. M. Aurigemma, dell, Phytochemistry, 53, 877-880 (2000).
and M. J. Greig, Anal. Chem., 71, 4223-4231 39. H. L. Constant and C. W. W. Beecher, Nut. Prod.
(1999). Lett., 6, 193-196 (1995).
20. V. de Biasi, N. Haskins, A. Organ, R. Bateman, 40. D. G. Corley and R. C. Durley, J. Nut. Prod., 57,
K. Giles, and S. Jarvis, Rapid Commun. Mass 1484-1490 (1994).
Spectrom., 13, 1165-1168 (1999).
41. S. Stinson, Chem. Eng. News, 72,18-18 (1994).
21. T. Wang, L. Zeng, J. Cohen, and D. B. Kassel,
Comb. Chem. High Throughput Screen., 2,327- 42. W. E. Running, J. Chem. Info. Comp. Sci., 33,
334 (1999). 934-935 (1993).
22. L. Yang, N. Wu, R. P. Clement, and P. J. 43. M. L. Nedved, S. Habibi-Goudarzi, B. Ganem,
Rudewicz, Proc. 48th ASMS Conf. Mass Spec- and J. D. Henion, Anal. Chem., 68,4228-4236
trom. Allied Topics, 861-862 (2000). (1996).
23. H. N. Weller, M. G. Young, S. J. Michalczyk, 44. M. A. Kelly, H. B. Liang, I. I. Sytwu, I. Vlattas,
G. H. Reitnauer, R. S. Cooley, P. C. Rahn, D. J. N. L. Lyons, B. R. Bowen, and L. P. Wennogle,
Loyd, D. Fiore, and S. J. Fischman, Mol. Diver- Biochemistry, 35, 11747-11755 (1996).
sity, 3,61-70 (1997). 45. S. Kaur, L. McGuire, D. Tang, G. Dollinger, and
24. A. S. Fang, V. Vouros, and C. C. Stacey, et al., V. Huebner, J. Protein Chem., 16, 505-511
Comb. Chem. High Throughput Screen., 1, (1997).
23-33 (1998). 46. M. M. Siegel, K. Tabei, G. A. Bebernitz, and E. Z.
25. D. G. Powers and D. L. Coffen, Drug Discovery Baum, J. Mass Spectrom., 33,264-273 (1998).
Today, 4,377-383 (1999). 47. Y. H. Chu, L. Z. Avila, H. A. Biebuyck, and G. M.
26. A. Furka and W. D. Bennett, Comb. Chem. High Whitesides, J. Med. Chem., 35, 2915-2917
Throughput Screen., 2, 105-122 (1999). (1992).
27. K. D. Janda, Proc. Natl. Acad. Sci. USA, 91, 48. Y. H. Chu and G. M. Whitesides, J. Org. Chem.,
10779-10785 (1994). 57,3524-3525 (1992).
28. A. W. Czarnik, Proc. Natl. Acad. Sci. USA, 94, 49. K. L. Rundlett and D. W. Amstrong, J. Chro-
12738-12739 (1997). matogr., 721, 173-186 (1996).
50. L. Z. A d a , Y . H. Chu, E. C. Blossey, and G. M. Mass Spectrometry of Biological Materials,

Whitesides, J. Med. Chem., 36,126-133(1993). Marcel Dekker, New York, pp. 99-113(1998).
51. B. L. Karger, J. Med. Chem., 41, 1201-1204 68. J. Liu, J. E. Burdette, H. Xu, C. Gu, R. B. van
(1998). Breemen, K. P. Bhat, N. Booth, A. I. Con-
52. Y . H. Chu, Y . M. Dunayevskiy, D. P. Kirby, P. stantinou, J. M. Pezzuto, H. H. Fang, N. R.
Vouros, and B. L. Karger, J. Am. Chem. Soc., Farnsworth, and J. L. Bolton, J. Agric. Food
118,7827-7835 (1996). Chem., 49,2472-2479(2001).
53. K. Kasai and Y . Oda, J. Chromatogr., 376, 69. R.Wieboldt, J. Zweigenbaum, and J. Henion,
33-47(1986). Anal. Chem., 69,1683-1691(1997).
54. D. C. Schriemer, D. R. Bundle, L. Li, and 0. 70. R. B. van Breemen, D. Nikolic, and J. L. Bolton,
Hindsgaul, Angew Chem. Znt. Ed., 37, Drug Metabol. Dispos., 26,85-90(1998).
33833387(1998). 71. D. Nikolic, P. W . Fan, J. L. Bolton, andR. B. van
55. B. Zhang, M. M. Palcic, H. Mo, I. J. Goldstein, Breemen, Comb. Chem. High Throughput
and 0. Hindsgaul, Glycobiology, 11, 141-147 Screen., 2, 165-175 (1999).
(2001). 72. M. B. Beverly, P. West, and R. K. Julian, Comb.
56. X . H. Cheng, R. D. Chen, J. E. Bruce, B. L. Chem. High Throughput Screen., 5, 65-73
Schwartz, G. A. Anderson, S. A. Hofstadler, (2002).
D. C. Gale, R. D. Smith, J. M. Gao, G. B. Sigal, 73. Z . Shen, J. J. Thomas, C. Averbuj, K.M. Broo,
M. Mammen, and G. M. Whitesides, J. Am. M. Engelhard, J . E. Crowell, M . G. Finn, and G.
Chem. SOC., 117,8859-8860(1995). Siuzdak, Anal. Chem., 73,612-619(2001).
57. J. Gao, X.Cheng, R. Chen, G. B. Sigal, J. E. 74. J. Wei, J. M. Buriak, and G. Siuzdak, Nature,
Bruce, B. L. Schwartz, S. A. Hofstadler, G. A. 399,243-246(1999).
Anderson, R. D. Smith, and G. M. Whitesides, 75. T.W . Hutchens and T . T . Yip, Rapid Commun.
J. Med. Chem., 39,1949-1955(1996). Mass Spectrom., 7,576-580(1993).
58. M.Wigger, J. P. Nawrocki, C. H. Watson, J . R. 76. R.W . Nelson, Mass Spectrom. Reu., 16,353376
Eyler, and S. A. Benner, Rapid Commun. Mass (1997).
Spectrom., 11,1749-1752(1997). 77. R. W . Nelson, D. Nedelkov, and K. A. Tubbs,
59. S. A. Hofstadler, K. A. Sannes-Lowery, S. T . Anal. Chem.72,404A-411A(2000).
Crooke, D. J. Ecker, H. Sasmor, S. Manalili, and 78. R.W . Nelson and J. R. Krone, J. Mol. Recognit,
R. H. Griffey, Anal. Chem., 71, 34363440 12,77-93(1999).
(1999). 79. T.Wachs and J. Henion, Anal. Chem.73, 6.32-
60. K. A. Sannes-Lowery, J. J . Drader, R. H. 638(2000).
Griffey, and S. A. Hofstadler. Trends Anal. 80. R. M. Holt, M. J . Newman, F. S. Pullen, D. S.
Chem., 19,481-491(2000). Richard, and A. G. Swanson, J. Mass Spectrom.,
61. R. B. van Breemen, C. R. Huang, D. Nikolic, 32,64-70 (1997).
C. P. Woodbury, Y . Z. Zhao, and D. I. Venton, 81. J. L. Wolfender, K. Ndjoko, and K. Hostett-
Anal. Chem., 69,2159-2164(1997). mann, Phytochem. Anal., 12,2-22(2001).
62. D.Nikolic, S. Habibi-Goudarzi, D. G. Corley, S. 82. S. C. Bobzin, S. T . Yang, and T. P. Kasten,
Gafner, J. M. Pezzuto, and R. B. van Breemen, J. Chromatogr. B, 748,259-267(2000).
Anal. Chem., 72,3853-3859 (2000). 83. R. T.William, E. L. Chapin, A. W . Carr, J . R.
63. D. Nikolic and R. B. van Breemen, Comb. Chem. Gilbert, P. R. Graupner, P. Lewer, P. McKamey,
High Throughput Screen., 1,47-55(1998). J. R. Carney, and W . H. Genvick, Org. Lett., 2,
289-292(2000).
64. Y.G. Shin and R. B. van Breemen. Biopharm.
84. J. Chin, J. B. Fell, M. J. Shapiro, J. Tomesch,
Drug Dispos., 22,353-372(2001).
J . R. Wareing, and A. M. Bray, J. Org. Chem.,
65. Y.Z . Zhao, R. B. van Breemen, D. Nikolic, C. R. 62,538-539(1997).
Huang, C. P. Woodbury, A. Schilling, and D. L. 85. J. Chin, J. B. Fell, M . Jarosinski, M. J . Shapiro,
Venton, J. Med. Chem., 40,4006-4012(1997). and J. R. Wareing, J. Org. Chem., 63, 386-390
66. C. Gu, D.Nikolic, J. Lai, X . Xu, and R. B. van (1998).
Breemen, Comb. Chem. High Throughput 86. P.A. Keifer, S. H. Smallcombe, E. H. Williams,
Screen., 2,353-359(1999). K. E. Salomon, G. Mendez, J. L. Belletire, and
67. R. B. van Breemen, C. P. Woodbury, and D. L. C. D. Moore, J. Combin. Chem., 2, 151-171
Venton in B. S. Larson and C. N. McEwen, Eds., (2000).
RICHARD HENDERSON
Medical Research Council Laboratory of Molecular Biology
TIMOTHYS. BAKER
Purdue University
Department of Biological Sciences
West Lafayette, Indiana
Contents
1 Macromolecular Structure Determination by Use
of Electron Microscopy, 612
2 Electron Scattering and Radiation Damage, 612
3 Elastic and Inelastic Scattering, 613
4 Radiation Damage, 614
5 Required Properties of Illuminating Electron
Beam, 615
6 Three-Dimensional Electron Cryomicroscopy of
Macromolecules, 615
7 Overview of Conceptual Steps, 616
8 Classification of Macromolecules, 617
9 Specimen Preparation, 618
10 Microscopy, 620
11 Selection and Preprocessing
of Digitized Images, 623
12 Image Processing and 3D Reconstruction, 624
12.1 2D Crystals, 626
12.2 Helical Particles, 626
12.3 Icosahedral Particles, 627
13 Visualization, Modeling, and Interpretation of
Results, 628
14 Trends, 628
16 Abbreviations, 628

Electron Cryomicroscopy of Biological Macromolecules
1 MACROMOLECULAR STRUCTURE while it was examined in the electron micro-

DETERMINATION BY USE OF scope. This use of unstained specimens thus
ELECTRON MICROSCOPY led to the structure determination of the mol-
ecules themselves rather than the structure of
The two principal methods of macromolecular a "negative stain" excluding volume, and has
structure determination that use scattering created the burgeoningfield of 3D electron mi-
techniques are electron microscopy and X-ray croscopy of macromolecules.
crystallography. The most important differ- Many medium resolution structures of
ence between the two is that the scattering macromolecular assemblies (e.g., ribosomes),
cross section is about 105 times greater for spherical and helical viruses, and larger pro-
electrons than it is for X-rays, so significant tein molecules have now been determined by
scattering using electrons is obtained for spec- electron cryomicroscopy in ice. Four atomic
imens that are 1 to 10 nm thick, whereas scat- resolution structures have been obtained by
tering or absorption of a similar fraction of an electron cryomicroscopy of thin 2D crystals
illuminating X-ray beam requires crystals embedded in glucose, trehalose, or tannic acid
that are 100 to 500 pm thick. The second main (11-141, where specimen cooling reduced the
difference is that electrons are much more eas- effect of radiation damage. One of these, the
ily focused than X-rays because they are structure of bacteriorhodopsin (11)provided
charged particles that can be deflected by elec- the first structure of a seven-helix membrane
tric or magnetic fields. As a result, electron protein. The medium resolution density distri-
lenses are greatly superior to X-ray lenses and butions can often be interpreted in terms of
can be used to produce a magnified image of an the chemistry of the structure if a high resolu-
object as easily as a diffraction pattern. This tion model of one or more of the component
then allows the electron microscope to be pieces has already been obtained by X-ray,
switched back and forth instantly between im- electron microscopy, or NMR methods. As a
aging and diffraction modes so that the image result, the use of electron microscopy is be-
of a single molecule at any magnification can coming a powerful technique for which, in
be obtained as conveniently as the electron some cases, no alternative approach is possi-
diffraction pattern of a thin crystal. ble. Useful reviews [e.g., Dubochet et al. (9),
In the early years of electron microscopy of Amos et al. (151, Walz and Grigorieff (161, and
macromolecules, electron micrographs of mol- Baker et al. (1711and a book [Frank (1811have
ecules embedded in a thin film of heavy atom been written.
stains (1,2) were used to produce pictures that
were interpreted directly. Beginning with the 2 ELECTRON SCATTERING
work of Klug and Berger (3), a more rigorous A N D RADIATION DAMAGE
approach to image analysis led first to the in-
terpretation of the two-dimensional (2D) im- A schematic overview of scattering and imag-
ages as the projected density summed along ing in the electron microscope is depicted in
the direction of view and then to the ability to Fig. 14.1. The incident electron beam passes
reconstruct the three-dimensional (3D) object through the specimen and individual elec-
from which the images arose (4,5), with sub- trons are either unscattered or scattered by
sequent more sophisticated treatment of im- the atoms of the specimen. This scattering oc-
age contrast transfer (6). curs either elastically, with no loss of energy
Later, macromolecules were examined by and therefore no energy deposition in the
electron diffraction and imaging without the specimen, or inelastically, with consequent
use of heavy atom stains by embedding the energy loss by the scattered electron and ac-
specimens in either a thin film of glucose (7) or companying energy deposition in the speci-
in a thin film of rapidly frozen water @-lo), men, resulting in radiation damage. The elec-
which required the specimen to be cooled trons emerging from the specimen are then
3 Elastic and Inelastic Scattering
I, incident beam
Specimen
(Maximum dose -5e-/A2for
organic or biological specimens)
Diffraction
'............ ..*- pattern Figure 14.1. Schematic diagram show-
(Strongest spots: ing the principle of image formation and
protein - 10-5 lo diffraction in the transmission electron
paraffin - 10-2lo) microscope. The incident beam I. illumi-
nates the specimen. Scattered and un-
scattered electrons are collected by the
objective lens and focused back to form
&st an electron diffraction pattern and
then an image. For a 2D or 3D crystal,
the electron-diffraction pattern would
show a lattice of spots, each of whose in-
tensity is a small fraction of that of the
nder focused incident beam. In practice, an in-focus
image image has no contrast, so images are re-
corded with the objective lens slightly
n-focus image defocused to take advantage of the out-
of-focus phase-contrast mechanism.
lllected by the imaging optics, shown here for describing the structure of the specimen.
mplicity as a single lens, but in practice con- The amplitudes and phases of the scattered
sting of a complex system of five or six lenses, electron beams are directly related to the
ith intermediate images being produced at amplitudes and phases of the Fourier com-
messively higher magnification at different ponents of the atomic distribution in the
~sitionsdown the column. Finally, in the specimen. When the scattered beams are re-
ewing area, either the electron diffraction combined with the unscattered beam in the
lttern or the image can be seen directly by image, they create an interference pattern
.e on the phosphor screen, or detected by a (the image), which, for thin specimens, is
i7 or CCD camera, or recorded on photo- related approximately linearly to the density
aphic film or image plate. variations in the specimen. The information
about the structure of the specimen can then
ELASTIC A N D INELASTIC SCATTERING be retrieved by digitization and computer-
based image processing, as described later.
le coherent, elastically scattered electrons The elastic scattering cross sections for elec-
ntain all the high resolution information trons are not as simply related to the atomic
composition as happens with X-rays. With 4 RADIATION DAMAGE

X-ray diffraction, the scattering factors are
simply proportional to the number of elec- The most important consequence of inelastic
trons in each atom, normally equal to the scattering is the deposition of energy into the
atomic number. Given that elastically scat- specimen. This is initially transferred to sec-
tered electrons are in effect diffracted by the ondary electrons, which have an average en-
electrical potential inside atoms, the scatter- ergy (20 eV) that is 5 or 10 times greater than
ing factor for electrons depends not only on the valency bond energies. These secondary
the nuclear charge but also on the size of the electrons interact with other components of
surrounding electron cloud that screens the the specimen and produce numerous reactive
nuclear charge. As a result, electron scatter- chemical species, including free radicals. In
ing factors in the resolution range of interest ice-embedded samples, these would be pre-
dominantly highly reactive, hydroxyl free rad-
in macromolecular structure determination
icals that arise from the frozen water mole-
(up to 113 k l ) , are sensitive to the effective
cules. In turn, these react with the embedded
radius of the outer valency electrons and macromolecules and create a great variety of
therefore depend sensitively on the chemis- radiation products such as modified side
try of bonding. Although this is a fascinating chains, cleaved polypeptide backbones, and a
field in itself, with interesting work already host of molecular fragments. From radiation
carried out by the gas phase electron diffrac- chemistry studies, it is known that thiol or
tion community [e.g., Hargittai and Hargit- disulfide groups react more quickly than ali-
tai (1911, it is still an area where much work phatic groups and that aromatic groups, in-
remains to be done. At present, it is probably cluding nucleic acid bases, are the most resis-
adequate to think of the density obtained in tant. Nevertheless, the end effect of the
macromolecular structure analysis by elec- inelastic scattering is the degradation of the
tron microscopy as roughly equivalent to the specimen to produce a cascade of heteroge-
electron density obtained by X-ray diffrac- neous products, some of which resemble the
tion but with the contribution from hydro- starting structure more closely than others.
gen atoms being somewhat greater relative Some of the secondary electrons also escape
to carbon, nitrogen, and oxygen. from the surface of the specimen, causing it to
Those electrons that are inelastically scat- charge up during the exposure. As a rough
tered lose energy to the specimen by a number rule for 100-kVelectrons, the dose that can be
of mechanisms. The energy loss spectrum for a used to produce an image in which the starting
typical biological specimen is dominated by structure at high resolution is still recogniz-
the large cross section for plasmon scattering able is about 1 ep/A2 for organic or biological
materials at room temperature, 5 e-/A2 for a
in the energy range 20-30 eV, with a contin-
specimen near liquid nitrogen temperature
uum in the distribution that decreases up to
(- 170"c), and 10ep/A2 for a specimen near
higher energies. At discrete high energies, spe- liquid helium temperature (4-8 K). However,
cific inner electrons in the K shell of carbon, individual experimenters will often exceed
nitrogen, or oxygen can be ejected with corre- these doses if they wish to enhance the low
sponding peaks in the energy loss spectrum resolution information in the images that is
appearing at 200-400 eV. Any of these inelas- less sensitive to radiation damage. The effects
tic interactions produces an uncertainty in the of radiation damage attributed to electron ir-
position of the scattered electron (by Heisen- radiation are essentially identical to those
berg's uncertainty principle) and, as a result, from X-ray or neutron irradiation for biologi-
the resolution of any information present in cal macromolecules except for the amount of
the energy loss electron signal extends only to energy deposition per useful coherent elasti-
low resolutions of around 15 A (20). Conse- cally scattered event (21). For electrons scat-
quently, the inelastically scattered electrons tered by biological structures at all electron
are generally considered to contribute little energies of interest, the number of inelastic
except noise to the images. events exceeds the number of elastic events by
6 Three-Dimensional Electron Cryomicroscopy of Macromolecules
a factor of 3 to 4, so that 60 to 80 eV of energy Purified specimen

is deposited for each elastically scattered elec-
tron. This limits the amount of information in Sample preparation
an image of a single biological macromolecule.
Consequently, the 3D atomic structure cannot
be determined from a single molecule but re-
Thin, vitrified sample
quires the averaging of the information from
u
at least 10,000 molecules in theory, and even
more in practice (21). Crystals used for X-ray Cryomicroscopy
or neutron diffraction contain many orders of
magnitude more molecules. t
It is possible to collect both the elastically Micrographs
and the inelastically scattered electrons simul- I
taneously with an energy analyzer and, if a lmage selection
fine electron beam is scanned over the speci- Digitization
men, then a scanning transmission electron Preprocessing
micrograph displaying different properties of lmage processing & 3D reconstruction
the specimen can be obtained. Alternatively,
conventional transmission electron micro-
3D
1
density map
scopes to which an energy filter has been
added can be used to select out a certain en- I
ergy band of the electrons from the image. Visualization, modeling, and
Both types of microscope can contribute in interpretation
other ways to the knowledge of structure, but
in this presentation, we concentrate on high
+
Structure-function relationships
voltage, phase-contrast electron microscopy of
unstained macromolecules most often embed- Figure 14.2. Flow diagram showing all the proce-
ded in ice because this is the method of widest dures involved in electron cryomicroscopy from
impact in structural biology. sample preparation to map interpretation.
closer to being monochromatic. Electron

5 REQUIRED PROPERTIES O F
beams can also be produced by a normal,
ILLUMINATING ELECTRON BEAM
heated tungsten source, which gives a less par-
allel beam with a larger energy spread, but is
The important properties of the image in
nevertheless adequate for electron cryomi-
terms of defocus, astigmatism, and the pres-
croscopy if the highest resolution images are
ence and effect of amplitude or phase contrast
not required.
are discussed later. The best quality incident
electron beam is produced by a field emission
gun (FEG). This is because the electrons from 6 THREE-DIMENSIONAL ELECTRON
a FEG are emitted from a very small volume at CRYOMlCROSCOPY O F
the tip, which is the apparent source size. MACROMOLECULES
Once these electrons have been collected by
the condenser lens and used to produce the The determination of 3D structure by
illuminating beam, that beam of electrons is cryo-EM methods follows a common scheme
then nearly parallel (divergence of for all macromolecules (Fig. 14.2). A more de-
mrad) and therefore spatially coherent. Simi- tailed discussion of the individual steps as ap-
larly, because the emitting tip of a FEG is not plied to different classes of macromolecules
heated as much as a conventional thermionic appears in subsequent sections. Briefly, each
tungsten source, the thermal energy spread of specimen must be prepared in a relatively ho-
the electrons is relatively small (0.5-1.0 eV) mogeneous, aqueous form (ID or 2D crystals
and, as a result, the illuminating beam is or a suspension of single particles in a limited
number of states) at relatively high concentra- cussed in 1971 (23) and demonstrated in 1975
tion, rapidly frozen (vitrified) as a thin film, (7,241, although earlier work on stained spec-
transferred into the electron microscope, and imens had shown the value of averaging to
photographed by means of low dose selection increase the signal-to-noise ratio. The im-
and focusing procedures. The resulting improvement obtained, as in all repeated mea-
ages, if recorded on film, must then be digi- surements, gives a factor of N< improvement
tized. Digitized images are then processed by in signal-to-noise ratio, where N is the number
the use of computer programs that allow dif- of times the measurement is made. The effect
ferent views of the specimen to be combined of averaging to produce an improvement in
into a 3D reconstruction that can be inter-
signal-to-noise ratio is seen most clearly in the
preted in terms of other available structural,
processing of images from 2D crystals. Figure
biochemical, and molecular data.
14.3 shows the results of applying a sequence
of corrections, beginning with averaging, to
two-dimensional crystals of bacteriorhodopsin
7 OVERVIEW O F CONCEPTUAL STEPS
in 2D space group p3. The panels show: (a, b)
Radiation damage by the illuminating elec- 2D averaging, (c) correction for the micro-
tron beam generally allows only one good pic- scope contrast transfer function (CTF), and
ture (micrograph) to be obtained from each (dl threefold crystallographic symmetry av-
molecule or macromolecular assembly. In this eraging of the phases and combination with
micrograph, the signal-to-noise ratio of the 2D electron diffraction amplitudes. At each
projection image is normally too small to accu- stage in the procedure the projected picture
rately determine the projected structure. This of the molecules gets clearer. The final stage
implies, first, that it is necessary to average results in a virtually noise-free projected
many images of different molecules taken structure for the molecule at near atomic
from essentially the same viewpoint to in- (3A) resolution.
crease the signal-to-noise ratio and, second, The earliest successful application of the
that many of these averaged projections, idea of combining projections to reconstruct
taken from different directions, must be com- the 3D structure of a biological assembly was
bined to build up the information necessary to made by DeRosier and Klug (4). The idea-is
determine the 3D structure of the molecule. that each 2D projection corresponds after Fou-
Thus, the two key concepts are: ( 1 ) averaging rier transformation to a central section of the
to a greater or lesser extent depending on res- 3D transform of the assembly. If enough inde-
olution, particle size and symmetry to increase pendent projections are obtained, then the 3D
the signal-to-noise ratio; and (2)the combina-
transform will have been fully sampled and
tion of different projections to build a 3D map
the structure can then be obtained by back
of the structure.
transformation of the averaged, interpolated,
In addition, there are various technical cor-
rections that must be made to the image data and smoothed 3D transform. This procedure
to allow an unbiased model of the structure to is shown schematically for a three-dimen-
be obtained. These include correction for the sional object in the shape of a duck, which rep-
phase-contrast transfer function (CTF) and, resents the molecule whose structure is being
at high resolution, for the effects of beam tilt. determined (Fig. 14.4).
For crystals, it is also possible to combine elec- In practice, the implementation of these
tron diffraction amplitudes with image phases concepts has been carried out in a variety of
to produce a more accurate structure (7), and ways, given that the experimental strategy
in general to correct for loss of high resolution and type of computer analysis used depend
contrast for any reason by "sharpening" the on the type of specimen, especially the molec-
data by application of a negative temperature ular weight of the individual molecule, its
factor (22). symmetry, and whether it assembles into an
The idea of increasing the signal-to-noise aggregate with one-dimensional (ID), two-di-
ratio in electron images of unstained biologi- mensional (2D), or three-dimensional (3D) pe-
cal macromolecules by averaging was dis- riodic order.
8 Classification of Macromolecules
Figure 14.3. Display of the results at different

stages of image processing of a digitized micrograph
of a 2D crystal of bacteriorhodopsin. The left panel
:a) shows an area of the raw digitized micrograph in
which only electron noise is visible. The lower right
~ a n e(b)
l shows the results of the averaging of unit
:ells from the whole picture by unbending in real
rpace and filtering in reciprocal space. The scale of Figure 14.4. Schematic diagram to illustrate the
;he density in (b) is the same as that in the original principle of 3D reconstruction. Each 2D projected
nicrograph, showing that the signal is very much image, as recorded on the micrograph and after CTF
~ e a k e than
r the noise. Panel (c) shows the same correction, represents a section through the 3D Fou-
iensity as that in (b) but with contrast increased rier transform. This is called the projection theo-
LO-fold to show that the signal in the original picture rem. After accumulation of enough information
s approximately 10X below the noise level. Panel from enough different views, a 3D map of the struc-
d) shows the density after correction for contrast ture can be calculated by Fourier inversion.
,ransfer function (CTF) attributed in this case to a
lefocus of 6000 A. Panel (e) shows the density after
urther threefold crystallographic averaging (the 8 CLASSIFICATION
,pacegroup is p3) and replacement of image ampli-
MACROMOLECULES
udes by electron diffraction amplitudes. Panel (e)
herefore shows an almost perfect atomic resolution
mage of the projected structure of bacteriorhodop- The symmetry of a macromolecule or su-
in. The trimeric rings of molecules are centered on pramolecular complex is the primary determi-
he crystallographic threefold axis and the internal nant of how specimen preparation, micros-
tructure shows a-helical segments in the protein. copy, and 3D image reconstruction are
618 Electron Cryomicroscopy of Biological Macromolecules
performed. The classification of molecules ac- to much below that realized in the bulk of cur-
cording to their level of periodic order and rent X-ray crystallographic studies, cryo-EM
symmetry (Table 14.1) provides a logical and methods provide a powerful means to study
convenient way to consider the means by molecules that resist crystallization in ID, 2D,
which specimens are studied in 3D by micros- or 3D. These methods allow one to explore the
COPY- dynamic events, different conformational
Each type of specimen offers a unique set of states (asinduced, for example, by altering the
challenges in obtaining 3D structural infor- microenvironment of the specimen), and mac-
mation at the highest possible resolution. The romolecular interactions that are the key to
best resolutions achieved by 3D EM methods understanding how each macromolecule func-
to date, at about 3-4 A, have been obtained tions.
with seve& thin, 2D crystals, in large part
because of their excellent order.
With the exception of true 3D crystals, 9 SPECIMEN PREPARATION
which must be sectioned to make them thin
enough to study by transmission electron mi- The goal in preparing specimens for cryomi-
croscopy, the resolutions obtained with biolog- croscopy is to keep the biological sample as
ical specimens are generally dictated by the close as possible to its native state to preserve
preservation of periodic order, and the sym- the structure to atomic or near-atomic resolu-
metry and complexity of the object. Hence, tion in the microscope and during microscopy.
studies of the helical acetylcholine receptor The methods by which numerous types of
tubes (36), the icosahedral hepatitis B virus macromolecules and macromolecular com-
capsid ( 4 4 , the 50s ribosome (45), and the plexes have been prepared for cryo-EM studies
centriole (26) have yielded 3D density maps at are now well established (9,56,57). Most such
resolutions of 4.6, 7.4, 15, and 280 & respec- methods involve cooling samples at a rate fast
tively. enough to permit vitrification (solid, glasslike
If high resolution were the sole objective of state) rather than crystallization of the bulk
EM, it would be necessary, given the capabili- water. Noncrystalline biological macromole-
ties of existing technology, to try to form well- cules are typically vitrified by applying a small
ordered 2D crystals or helical assemblies of (often <10 pL) aliquot of a purified, approxi-
each macromolecule of interest. Indeed, a mately 0.2-5 mg/mL suspension of sample to
number of different crystallization techniques an EM grid coated with a carbon or holey car-
have been devised [e.g., Horne and Pasquali- bon support film. The grid, secured with a pair
Ronchetti (46); Yoshimura et al. (47); Korn- of forceps and suspended over a container of
berg and Darst (48);Jap et al. (49); Kubalek et ethane or propane cryogen slush (maintained
al. (50); Rigaud et al. (51); Hasler et al. (52); near its freezing point by a reservoir of liquid
Reviakine et al. (53); Wilson-Kubalek et al. nitrogen), is blotted nearly dry with a piece of
(54)], and some of these have yielded new filter paper. The grid is then plunged into the
structural information about otherwise recal- cryogen, and the sample, if thin enough (-0.2
citrant molecules like RNA polymerase (55). pm or less), is vitrified in millisecond or
However, despite the obvious technological shorter time periods (58-60).
advantages of having a molecule present in a The ability to freeze samples with a time
highly ordered form, most macromolecules resolution of milliseconds affords cryo-EM one
function not as highly ordered crystals or he- of its unique and, as yet, perhaps most under-
lices but instead as single particles (e.g., many utilized advantages: capturing and visualizing
enzymes) or, more likely, in concert with other dynamic structural events that occur over
macromolecules as occurs in supramolecular time periods of a few milliseconds or longer.
assemblies. Also, crystallization tends to con- Several devices that allow samples to be per-
strain the number of conformational states a turbed in a variety of ways as they are plunged
molecule can adapt and the crystal conforma- into cryogen have been described [e.g., Subra-
tion might not be functionally relevant. maniarn et al. (61); Berriman and Unwin (59);
Hence, although resolution may be restricted Siege1 et al. (62); Trachtenberg (63); White et
Table 14.1 Classification of Macromolecules According to Periodic Order and Symmetry
Periodic
Order Type Symmetry Example Macromolecule/Complex Representative Reference
OD Point group CI Ribosome 25
CI Centriole 26
c5 BacteriophageW9 head 27
C8 Ribonucleoprotein vault 28
C17 TMV disk 29
Dz p-galactosidase 30
DB Clathrin coats 31
I36 Lumbricus terrestris hemoglobin 32
T Dps protein 33
0 Azotobacter pyruvate dehydrogenase core 34
I Icosahedral viruses 17
Screw axis (helical)" Acto-myosin filament 35
Acetylcholine receptor tubes 36
Microtubule 37
Bacterial flagella 38
Tobacco mosaic virus 39
2D 2D space group (2D crystal) ~3 Bacterial rhodopsin membrane 11
~4212 Aquaporin membrane 40
~6 Gap junction membrane 41
p321 Light harvesting complex I1 12
~ 1 % Tubulin sheet 13
3D 3D space group (3D crystal) p212121 Myosin S1 protein crystal 42
P6, or P6, Insect flight muscle 43
, space axis, which combines a rotation of 2 d n radius about an axis followed by a translation of nvh of the repeat distance.
"The symmetry of a helical structure is defined by an n
al. (6011. Examples of the use of such devices intensified camera system. For some speci-
include spraying acetylcholine onto its recep- mens, like thin 2D crystals, searching is
tor to cause the receptor channel to open (64), conveniently performed by viewing the low
or lowering the pH of an enveloped virus sam- magnification, high contrast image produced
ple to initiate early events of viral fusion (65), by slightly defocusing the electron diffraction
or inducing a temperature jump with a flash- pattern by use of the diffraction lens.
tube system to study phase transitions in lipo- After a desired specimen area is identified,
somes (661, or mixing myosin S1 fragments the microscope is switched to high magnifica-
with F-actin to examine the geometry of the tion mode for focusing and astigmatism cor-
crossbridge powerstroke in muscle (67). rection. These adjustments are typically per-
Crystalline (2D) samples fortunately can formed in a region about 2-10 pm away from
often be prepared for cryo-EM by means of the chosen area at the same or higher magni-
simpler procedures, and vitrification of the fication than that used for photography. The
bulk water is not always essential to achieve choice of magnification, defocus level, acceler-
success (68). Such specimens may be applied ating voltage, beam coherence, electron dose,
to the carbon film on an EM grid by normal and other operating conditions is dictated by
adhesion methods, washed with 1-2% solu- several factors. The most significant ones are
tions of solutes like glucose, trehalose, or tan- the size of the particle or crystal unit cell being
nic acid, blotted gently with filter paper to re- studied, the anticipated resolution of the im-
move excess solution, air dried, loaded into a ages, and the requirements of the image pro-
cold holder, inserted into the microscope, and, cessing needed to compute a 3D reconstruc-
finally, cooled to liquid nitrogen temperature. tion to the desired resolution. For most
specimens at required resolutions from 3 to 30
A, images are typically recorded at 25,000-
10 MICROSCOPY 50,000X magnification, with an electron dose
of between 5 and 20 e-/A2. These conditions
Once the vitrified specimen is inserted into the yield micrographs of sufficient optical density
microscope and sufficient time is allowed (OD 0.2-1.5) and image resolution for subse-
(-15 min) for the specimen stage to stabilize quent image-processing steps. Most modern
to minimize drift and vibration, microscopy is EMS provide some mode of low dose operation
performed to generate a set of images that, for imaging beam-sensitive, vitrified biological
with suitable processing procedures, can later specimens.
be used to produce a reliable 3D reconstruc- The intrinsic low contrast of unstained
tion of the specimen at the highest possible specimens makes it impossible to observe and
resolution. To achieve this goal, imaging must focus on specimen details directly, as is rou-
be performed at an electron dose that mini- tine with stained or metal-shadowed speci-
mizes beam-induced radiation damage to the mens. Focusing, aimed to enhance phase con-
specimen, with the objective lens of the micro- trast in the recorded images but minimize
scope defocused to enhance phase contrast beam damage to the desired area, is achieved
from the weakly scattering, unstained biolog- by judicious defocusing on a region that is ad-
ical specimen, and under conditions that keep jacent to the region to be photographed and
the specimen below the devitrification tem- preferably situated on the microscope tilt axis.
perature and minimize its contamination. The appropriate focus level is set by adjusting
The microscopist locates specimen areas the appearance of either the Fresnel fringes
suitable for photography by searching the EM that occur at the edges of holes in the carbon
grid at very low magnification ( ~ 3 0 0 0 X to
) film or the "phase granularity" from the car-
keep the irradiation level very low (<0.05 e-/ bon support film.
A2) while assessing sample quality. In micro- Unfortunately, electron images do not give
scopes operated at 200 keV or higher, where a direct rendering of the specimen density dis-
image contrast is very weak, it is helpful to tribution. The relationship between image
perform the search procedure with the assis- and specimen is described by the contrast
tance of a CCD camera or a video-rate TV- transfer function (CTF), which is characteris-
10 Microscopy
tic of the particular microscope used, the spec- The overall dependency of CTF on resolu-
imen, and the conditions of imaging. The mi- tion, wavelength, defocus, and spherical aber-
!
croscope CTF arises from the objective lens ration is given by
focal setting and from the spherical aberration
present in all electromagnetic lenses and var-
ies with the defocus and accelerating voltage
according to a formula (see below) that in-
cludes both phase- and amplitude-contrast
components. First, however, it might be useful where ~ ( v =) T A ~ ( -
A ~0.5C,h2?); v is the
to describe briefly the essentials of amplitude spatial frequency (in kl); Fa,, is the fraction
contrast and phase contrast, two concepts car- of amplitude contrast; A is the electron wave-
ried over from optical microscopy. Amplitude length (in &, where
contrast refers to the nature of the contrast in
an image of an object that absorbs the incident
illumination or scatters it in any other way, so
that a proportion of it is lost. As a result, the (=0.037, 0.025, and 0.020 A for 100, 200, and
image appears darker where greater absorp- 300 keV electrons, respectively); Vis the volt-
tion occurs. Phase contrast is required if an age (in volts); Af is the underfocus (in &; and
object is transparent (i.e., it is a pure phase C, is the spherical aberration of the objective
object) and does not absorb but only scatters lens of the microscope (in A).
the incident illumination. Biological speci- In addition, this CTF is attenuated by an
mens for cryo-EM are almost pure phase ob- envelope or damping function, which depends
jects and the scattering is relatively weak, so on the coherence of the beam, specimen drift,
that the simple theory of image formation by a and other factors (6,71,72).Figure 14.5 shows
weak phase object applies (69, 70). An exactly a few representative CTFs for different
in-focus image of a phase object has no con- amounts of defocus on a normal and a FEG
trast variation because all the scattered illu- microscope. Thus, for a particular defocus set-
mination is focused back to equivalent points ting of the objective lens, phase contrast in the
in the image of the object from which it was electron image is positive and maximal only at -
scattered. In optical microscopy, the use of a a few specific spatial frequencies. Contrast is
quarter wave plate can retard the phase of the either lower than maximal, completely absent,
direct unscattered beam, so that an in-focus or it is opposite (inverted or reversed) from
image of a phase object has very high "Zer- that at other frequencies. Hence, as the objec-
nicke" phase contrast. However, there is as tive lens is focused, the electron microscopist
yet no simple quarter wave plate for electrons, selectively accentuates image details of a par-
so instead, phase contrast is created by intro- ticular size.
ducing phase shifts into the diffracted beams Images are typically recorded 0.8-3.0 pm
by adjustment of the excitation of the objective underfocus to enhance s~ecimenfeatures in
lens so that the image is slightly defocused. In the 20-40 A size range a i d thereby facilitate
addition, because all matter is composed of at- phase origin and specimen orientation search
oms and the electric potential inside each procedures carried out in the image-process-
atom is very high near the nucleus, even the ing steps. However, this level of underfocus
electron scattering behavior of the light atoms also enhances the contrast envelope in lower
found in biological molecules deviates from resolution maps, which may help in interpre-
that of a weak phase object; however, for a tation. To obtain results at better than 10-15
deeper discussion of this the reader should re- A resolution, it is essential to record, process,
fer to Reimer (70) or Spence (69). In practice, and combine data from several micrographs
the proportion of "amplitude" contrast is that span a range of defocus levels [e.g., Unwin
about 7% at 100 kV, 5% at 200 kV, and 4% at and Henderson (7); Bottcher et al. (44)l. This
300 kV for low dose images of protein mole- strategy ensures good information transfer at
cules embedded in ice. all spatial frequencies up to the limiting reso-
1
0.05 0.10 0.1 5 0.20 0.25
Spatial frequency (k')
Figure 14.5. Representative plots of the contrast transfer function (CTF) as a function of spatial
frequency, for two different defocus settings (0.7 and 4.0 pm underfocus) and for a field emission
(light curve) or tungsten (dark curve) electron source. All plots correspond to electron images formed
in an electron microscope operated at 200 kV and with objective lens aberration coefficients, C, = C,
= 2.0 mm, and assuming amplitude contrast of 4.8% (73). The spatial coherence, which is related to
the electron source size and expressed as P, the half-angle of illumination, for tungsten and FEG
electron sources was fixed at 0.3 and 0.015 milliradians, respectively. Likewise, the temporal coher-
ence (expressed as AE,the energy spread) was fixed at 1.6 and 0.5 eV for tungsten and FEG sources.
The combined effects of the poorer spatial and temporal coherence of the tungsten source leads to a
significant dampening, and hence loss of contrast, of the CTF at progressively higher resolutions
compared to that observed in FEG-equipped microscopes. The greater number of contrast reversals
with higher defocus arises because of the greater out-of-focus phase shifts.
lution but requires careful compensation for (FEG) electron sources [e.g., Zemlin (76, 78);
the effects of the microscope CTF during im- Zhou and Chiu (77); Mancini et al. (79)l. The
age processing. Also, the recording of image high coherence of a FEG source ensures that
focal pairs or focal series from a given speci- phase contrast in the images remains strong
men area can be beneficial in determining or- out to high spatial frequencies (>1/3.5 kl),
igin and orientation parameters for processing even for highly defocused images. The use of
of images of single particles [e.g., Cheng et al. higher voltages provides potentially higher
(74); Trus et al. (75)l. resolution [greater depth of field (i.e., less cur-
Many high resolution cryo-EM studies are vature of the Ewald sphere) attributed to
now performed with microscopes operated at smaller electron beam wavelength], better
200 keV or higher and with field emission gun beam penetration (less multiple scattering),
11 Selection and Preprocessing of Digitized Images
reduced problems with specimen charging defocus to produce good phase contrast. This
that plague microscopy of unstained or un- is usually done by visual examination and op-
coated vitrified specimens (go), and reduced tical diffraction.
phase shifts associated with beam tilt. Once the best pictures have been chosen,
Images are recorded on photographic film the micrographs must be scanned and digi-
or on a CCD camera with either flood beam or tized on a suitable densitometer. The sizes of
spot-scan procedures. Film, with its advan- the steps between digitization of optical den-
tages of low cost, large field of view, and high sity and the size of the sample aperture over
resolution (-10 pm), has remained the pri- which the optical density is averaged by the
mary image recording medium for most densitometer must be sufficiently small to
cryo-EM applications, despite disadvantages sample the detail present in the image at fine
of high background fog and need for chemical enough intervals (83). Normally, a circular (or
development and digitization. CCD cameras square) sample aperture of diameter (or
provide image data directly in digital form and length of side) equal to the step between digi-
with very low background noise, but suffer tizations is used. This avoids digitizing over-
from higher cost, limited field of view, limited lapping points, without missing any of the in-
spatial resolution caused by poor point spread formation recorded in the image. - The size of
characteristics, and a fixed pixel size (typically the sample aperture and digitization step de-
between 14 and 24 pm). They are useful, for pends on the magnification selected and the
example, for precise focusing and adjustment resolution required. A value of 114 to 113 of the
of astigmatism [e.g., Krivanek and Mooney required limit of resolution (measured in pm
(81);Sherman et al. (82)l. on the emulsion) is normally ideal because it
For studies in which specimens must be avoids having too many numbers (and there-
tilted to collect 3D data, such as with 2D crys- fore wasting computer resources), without los-
tals, or single particles that adopt preferred ing anything during the measurement proce-
orientations on the EM grid, or specimens re- dure. For a 40,000X image, on which a
quiring tomography, microscopy is performed resolution of 10 A at the specimen is required,
in essentially the same way as described a step size of 10 pm { = 1/4 X [(lo A X 40,000)l
above. However, the limited tilt range (260- (10,000 &pm)]) would be suitable.
70") of most microscope goniometers can lead The best area of an image of a helical or 2D
to nonisotropic resolution in the 3D recon- crystal specimen can then be boxed off using a
structions (the "missing cone" problem), and soft-edge mask. For images of single particles,
tilting generates a constantly varying defocus a stack of individual particles can be created
across the field of view in a direction normal to by selecting out many small areas surround-
the tilt axis. The effects caused by this varying ing each particle. Because, in the later steps of
defocus level must be corrected in high resolu- image processing, the orientation and position
tion applications. of each particle are refined by comparing the
amplitudes and phases of their Fourier com-
ponents, it is important to remove spurious
11 SELECTION A N D PREPROCESSING features around the edge of each particle and
O F D I G I T I Z E D IMAGES to make sure the different particle images are
on the same scale. This is normally done by
Before any image analysis or classification of masking off a circular area centered on each
the molecular images can be done, a certain particle and floating the density so that the
amount of preliminary checking and normhl- average around the perimeter becomes zero
ization is required to ensure there is a reason- (83). The edge of the mask is apodized by ap-
able chance that a homogeneous population of plying a soft cosine bell shape to the original
molecular images has been obtained. First, densities so they taper toward the background
good quality micrographs are selected in level. Finally, to compensate for variations in
which the electron exposure is correct, there is the exposure attributed to ice thickness or
no image drift or blurring, and there is mini- electron dose, most microscopists normalize
mal astigmatism and a reasonable amount of the stack of individual particle images so that
624 Electron Cryornicroscopy of Biological Macromolecules
the mean density and mean density variation relatively specialized compared to that used in
over the field of view are set to the same values the more mature field of macromolecular X-
for all particles (84). ray crystallography. In part, this may be at-
Once some good particles or crystalline ar- tributed to the large diversity of specimen
eas for 1D or 2D crystals have been selected, types amenable to cryo-EM and reconstruc-
digitized, masked, and their intensity values tion methods. As a consequence, image-recon-
normalized, true image processing can begin. struction software is evolving quite rapidly,
and references to software packages cited in
Table 14.2 are likely to become quickly out-
12 I M A G E PROCESSING A N D 3 D -
dated. Extensive discussion of algorithms and
RECONSTRUCTION software packages in use at this time may be
found in a number of recent special issues of
Although the general concepts of signal aver- the Journal of Structural Biology [volumes
aging, together with combining different 116(1), 120(3), 121(2),and 125(2/3)1.
views to reconstruct the 3D structure, are In practice, attempts to determine or refine
common to the different computer-based pro- some parameters may be affected by the in-
cedures that have been implemented, it is im- ability to determine accurately one of the
portant to emphasize one or two preliminary other parameters. The solution of the struc-
points. First, a homogeneous set of particles ture is therefore an iterative procedure in
must be selected for inclusion in the 3D recon- which reliable knowledge of the parameters
struction. This selection may be made by eye, that describe each image is gradually built up
to eliminate obviously damaged particles or to produce an increasingly accurate structure,
impurities, or by the use of multivariate sta- until no more information can be squeezed out
tistical analysis (85) or some other classifica- of the micrographs. At this point, if any of the
tion scheme. This allows a subset of the parti- origins or orientations is wrongly assigned,
cle images to be used to determine the there will be a loss of detail and signal-to-noise
structure of a better defined entity. All image- ratio in the map. If a better determined or
processing procedures require the determina- higher resolution structure is required, it
tion of the same parameters that are needed to would then be necessary to record images on a
specify unambiguously how to combine the in- better microscope or to prepare new speci-
formation from each micrograph or particle. mens and record better pictures.
These parameters are: the magnification, de- The reliability and resolution of the final
focus, astigmatism, and, at high resolution, reconstruction can be measured by use of a
the beam tilt for each micrograph; the electron variety of indices. For example, the differen-
wavelength used (i.e., accelerating voltage of tial phase residual (DPR) (1331, the Fourier
the microscope);the spherical aberration coef- shell correlation (FSC) (134), and the Q-factor
ficient (C,) of the objective lens; and the orien- (135) are three such measures. DPR is the
tation and phase origin for each particle or mean phase difference, as a function of resolu-
unit cell of the ID, 2D, or 3D crystal. There are tion, between the structure factors from two
13 parameters for each particle, of which eight independent reconstructions, often calculated
may be common to each micrograph and two by splitting the image data into two halves.
or three (C,, kV, magnification) to each micro- FSC is a similar calculation of the mean corre-
scope. The different general approaches that lation coefficient between the complex struc-
have been used in practice to determine the ture factors of the two halves of the data as a
3D structure of different classes of macromo- function of resolution. The Q-factor is the
lecular assemblies from one or more electron mean ratio of the vector sum of the individual
micrographs are listed in Table 14.2. structure factors from each image divided by
The precise way in which each general ap- the sum of their moduli, again calculated as a
proach codes and determines the particle or function of resolution. Perfectly accurate mea-
unit cell parameters varies greatly and is not surements would have values of DPR, FSC,
described in detail. Much of the computer soft- and Q-factor of O", 1.0, and 1.0 respectively,
ware used in image reconstruction studies is whereas random data containing no informa-
12 Image Processing and 3D Reconstruction
Table 14.2 Methods of Three-Dimensional Image Reconstruction

Type Structure Reference(s)to Technicd
(symmetry) Method Theoretical Details
Asymmetric Random conical tilt
(Point group C,) *Software package
Angular reconstitution
*Software package
Weighted back projection
Radon transform alignment
Reference-based alignment
Reference free alignment
Fourier reconstruction and alignment
Tomographic tilt series and remote control of
microscopea
Symmetric Angular reconstitution
(Point groups *Software packages
C,, D,; n > 1) Fourier-Bessel synthesis
Reference-based alignment and weighted
back projection
Icosahedral Fourier-Bessel synthesis (common-lines)
(Point group I) *Reference-based alignment
*Software packages
Angular reconstitution
Tomographic tilt series
Helical Fourier-Bessel synthesis
*Software packages and filament
straightening routines
2D Crystal Random azimuthal tilt
*Software packages
3D Crystal Oblique section reconstruction
*Software package
Sectioned 3D crystal
"Note: Electron tomography is the subject of an entire issue of J. Struct. Biol. [120,207-395 (199711and a book edited by
Frank (132).
tion would have values of go0,0.0, and 0.0. The 2. Calculate 3D structures for each particle
spectral signal-to-noise ratio (SSNR) crite- by use of an R-weighted back-projection
rion has been advocated as the best of all algorithm (93).
(136): it effectively measures, as a function of 3. Average 3D data for several particles in
resolution, the overall signal-to-noise ratio
real or reciprocal space to get a reasonably
(squared) of the whole of the image data. It is
good 3D model of the stain excluding the
calculated by taking into consideration how
well all the contributing image data agree in- region of the particle.
ternally. 4. Record a number of micrographs of the
An example of a typical strategy for deter- particles embedded in vitreous ice.
mination of the 3D structure of a new and un- 5. Use the 3D negative stain model obtained
known molecule without any symmetry and in (3) with inverted contrast to determine
that does not crystallize might be as follows:
the rough alignment parameters of the
particle in the ice images.
1. Record a single axis tilt series of particles
embedded in negative stain, with a tilt 6. Calculate a preliminary 3D model of the
range from - 60" to + 60". average, ice-embedded structure.
7. Use the preliminary 3D model to deter- symmetry of the 2D space group of the crystal.
mine more accurate alignment parame- Finally, the whole data set is fitted by least
ters for the particles in the ice images. squares to constrained amplitudes and phases
8. Calculate a better 3D model. along the lattice lines (137) before calculating
9. Determine defocus and astigmatism to al- a map of the structure. The initial determina-
low CTF calculation and correct 3D model tion of the 2D space group can be carried out
so that it represents the structure at high by a statistical test of the phase relationships
resolution. in one or two images of untilted specimens
(138). The absolute hand of the structure is
10. Keep adding pictures at different defocus
automatically correct, given that the 3D struc-
levels to get an accurate structure at as
ture is calculated from images whose tilt axis
high a resolution as possible.
and tilt angle are known. Nevertheless, care
must be taken not to make any of a number of
For large single particles with no symmetry trivial mistakes that would invert the hand.
or for particles with higher symmetry or for
crystalline arrays, it should be possible to miss 12.2 Helical Particles
out the negative staining steps and go straight The basic steps involved in processing and 3D
to alignment of particle images from ice-em- reconstruction of helical specimens include:
bedding because the particle or crystal tilt an- Recording a series of micrographs of vitrified
gles can be determined internally from com- particles suspended over holes in a perforated
parison of phases along common lines in carbon support film. The micrographs are dig-
reciprocal space or from the lattice or helix itized and Fourier-transformed to determine
parameters from a 2D or 1D crystal. image quality (astigmatism, drift, defocus,
The following discussion briefly outlines presence, and quality of layer lines, etc.). Indi-
for a few specific classes of macromolecule the vidual particle images are boxed, floated, and
general strategy for carrying out image pro- apodized within a rectangular mask. The pa-
cessing and 3D reconstruction (see Fig. 14.6). rameters of helical symmetry (number of sub-
units per turn and pitch) must be determined
12.1 2D Crystals
by indexing the computed diffraction pat-
For 2D crystals, the general 3D reconstruction terns. If necessary, simple spline-fitting proce-
approach consists of the following steps: First, dures may be employed to "straighten" im-
a series of micrographs of single 2D crystals ages of curved particles (124), and the image
are recorded at different tilt angles, with ran- data may be reinterpolated (126) to provide
dom azimuthal orientations. Each crystal is more precise sampling of the layer line data in
then unbent using cross-correlation tech- the computed transform. Once a preliminary
niques, to identify the precise position of each 3D structure is available, a much more sophis-
unit cell (1271, and amplitudes and phases of ticated refinement of all the helical parame-
the Fourier components of the average of that ters can be used to unbend the helices onto a
particular view of the structure are obtained predetermined average helix so that the con-
for the transform of the unbent crystal. The tributions of all parts of the image are cor-
reference image used in the cross-correlation rectly treated (123). The layer line data are
calculation can either be a part of the whole extracted from each particle transform and
image masked off after a preliminary round of two phase origin corrections are made, one to
averaging by reciprocal space filtering of the shift the phase origin to the helix axis (at the
regions surrounding the diffraction spots in center of the particle image) and the other to
the transform, or it can be a reference image correct for effects caused by having the helix
calculated from a previously determined 3D axis tilted out of the plane normal to the elec-
model. The amplitudes and phases from each tron beam in the electron microscope. The
image are then corrected for the CTF and layer line data are separated out into near-
beam tilt (11,22, 127) and merged with data and far-side data, corresponding to contri-
from many other crystals by scaling and origin butions from the near and far sides of each
refinement, taking into account the proper particle imaged. The relative rotations and
Image Processing and 3D Reconstruction 627
Figure 14.6. Examples of macromolecules studied by cryo-EM and 3D image reconstruction and the
resulting 3D structures (bottom row) after cryo-EM analysis. All micrographs (top row) are displayed
at about 170,000X magnification and all models a t about 1,200,000x magnification. (a) A single
particle without symmetry: The micrograph shows 70s E. coli ribosomes complexed with mRNA and
Met-tRNA. The surface-shaded density map, made by averaging 73,000 ribosome images from 287
micrographs has a resolution (FSC) of 11.5 A. The 50s and 30s subunits and the tRNA are colored
blue, yellow, and green, respectively. The identity of many of the subunits is known and some RNA
double helices are clearly recognizable by their major and minor grooves (e.g., helix 44 is shown in
red). [Courtesy of J. Frank (SUNY, Albany), using data from Gabashvili et al. (86).1 (b) A single
particle with symmetry: The micrograph shows hepatitis B virus cores. The 3D reconstruction, a t a
resolution of 7.4 A (DPR), was computed from 6384 particle images taken from 34 micrographs.
[From Bottcher et al. (441.1 (c) A helical filament: The micrograph shows actin filaments decorated
with myosin S1 heads containing the essential light chain. The 3D reconstruction, at a resolution of
30-35 A, is a composite in which the differently colored parts are derived from a series of difference
maps that were superimposed on f-actin. The components include: f-actin (blue), myosin heavy chain
motor domain (orange), essential light chain (purple), regulatory light chain (white), tropomyosin
(green), and myosin motor domain N-terminal beta-barrel (red). [Courtesy of A. Lin, M. Whittaker,
and R. Milligan (Scripps Research Institute, La Jolla, CA).] (dl A 2D crystal, light-harvesting complex
-
LHCII at 3 . 4 4 resolution. The model shows the ~ r o t e i nbackbone and the arrangement of chro-
mophores in a number of trimeric subunits in the crystal lattice. In this example, image contrast is too
low to see any hint of the structure without image processing (see also Fig. 14.3). See color insert.
[Courtesy of W. Kuhlbrandt (Max-Planck-Institute for Biophysics, Frankfurt, Germany).]
translations needed to align the different graphs of a monodisperse distribution of par-

transforms are determined so the data may be ticles, normally suspended over holes in a per-
me:rged and a 3D reconstruction computed by forated carbon support film, is recorded. After
Fo1~rier-Besselinversion procedures (83). De- digitization of the micrographs, individual
terimination of the absolute hand requires particle images are boxed and floated with a
conlparison of a pair of images recorded with a circular mask. The astigmatism and defocus of
Smid l tilt of the specimen between the views each micrograph is measured from the sum of
(139). intensities of the Fourier transforms of all
particle images (140). Autocorrelation tech-
3 lcosahedral Particles
niques are then used to estimate the particle
Thl3 typical strategy for processing and 3D re- phase origins, which coincide with the center
construction of icosahedral particles consists of each particle, where all rotational symme-
ofthe following steps: First, a series of micro- try axes intersect (141).The view orientation
of each particle, defined by three Eulerian an- tion helps to increase the signal-to-noise ratio
gles, is determined either by means of com- for the structure at high resolution. Cold
mon and cross-common lines techniques or stages are constantly being improved, with
with the aid of model-based procedures [e.g., several liquid helium stages now in operation
Crowther (106); Fuller et al. (107);Baker et al. (143, 144). Two of these are commercially
(17)l. Once a set of self-consistent particle im- available from JEOL and FEIPhilips.
ages is available, an initial, low resolution 3D Finally, three additional likely trends in-
reconstruction is computed by merging these clude: (1)increased automation, including the
data with Fourier-Bessel methods (106). This recording of micrographs, the use of spotscan
reconstruction then serves as a reference for procedures in remote microscope operation
(145, 146), and in every aspect of image pro-
refining the orientation, origin, and CTF pa-
cessing; (2) production of better electronic
rameters of each of the included particle im-
cameras (e.g., CCD or pixel detectors); and (3)
ages, for rejecting "bad" images, and for in-
increased use of dose-fractionated, tomo-
creasing the size of the data set by including graphic tilt series, to extend EM studies to the
new particle images from additional micro- domain of larger supramolecular and cellular
graphs taken at different defocus levels. A new structures (102, 147).
reconstruction, computed from the latest set
of images, serves as a new reference and the
above refinement procedure is repeated until 15 ACKNOWLEDGMENTS
no further improvements, as measured by the
reliability criteria mentioned above, are made. We are greatly indebted to all our colleagues at
Determination of the absolute hand of the Purdue and Cambridge for their insightful
structure requires the recording and process- comments and suggestions; to B. Bottcher, R.
ing of a pair of images taken with a known, Crowther, J. Frank, W. Kiihlbrandt, and R.
small relative tilt of the specimen between the Milligan for supplying images used in Figure
two views (142). 14.6; and J. Brightwell for editorial assistance.
T.S.B. was supported in part by Grant
GM33050 from the National Institutes of
13 VISUALIZATION, MODELING, Health.
A N D INTERPRETATION OF RESULTS
Once a reliable 3D map is obtained, computer 16 ABBREVIATIONS
graphics and other visualization tools may be
used as aids in interpreting morphological de- OD zero-dimensional (single parti-
tails and understanding biological function in cles)
the context of biochemical and molecular stud- 1D one-dimensional (helical)
ies and complementary X-ray crystallographic 2D two-dimensional
and other biophysical measurements. 3D three-dimensional
CCD charge coupled device (slow scan
TV detector)
14 TRENDS cryo-EM electron cryomicroscopy
CTF contrast transfer function
The new generation of intermediate voltage EM electron microscope/microscopy
(-300 kV)FEG microscopes is now making it FEG field emission gun
much easier to obtain higher resolution im-
ages that, by use of larger defocus values, have
good image contrast at both very low and very REFERENCES
high resolution. The greater contrast at low 1. S. Brenner and R. W . Horne, Biochem. Bio-
resolution greatly facilitates particle-align- phys. Acta-Prot. S t r u t . , 34, 103-110 (1959).
ment procedures, and the increased contrast 2. H. E. Huxley and G . Zubay, J. Mol. Biol., 2,
resulting from the high coherence illumina- 10-18 (1960).
References
3. A. Klug and J. E. Berger, J. Mol. Biol., 10, 25. J. Frank, Curr. Opin. Struct. Biol., 7,266-272
565-569 (1964). (1997).
4. D. J. DeRosier and A. Klug, Nature, 217, 130- 26. J. Kenney, E. Karsenti, B. Gowen, and S. D.
134 (1968). Fuller, J. Struct. Biol., 120, 320-328 (1997).
5. W. Hoppe, R. Langer, G. Knesch, and C. Poppe, 27. Y. Tao, N. H. Olson, W. Xu, D. L. Anderson,
Naturwissenschaften, 55,333-336 (1968). M. G. Rossmann, and T. S. Baker, Cell, 95,
431-437 (1998).
6. H. P. Erickson and A. Klug, Philos. Trans. R. 28. L. B. Kong, A. C. Siva, L. H. Rome, and P. L.
Soc. Lond. B, 261, 105-118 (1971). Stewart, Structure, 7, 371-379 (1999).
7. P. N. T. Unwin and R. Henderson, J. Mol. 29. A. C. Bloomer, J. Graham, S. Hovmoller,
Biol., 94,425440 (1975). P. J . G. Butler, and A. Klug, Nature, 276,362-
8. J. Dubochet, J. Lepault, R. Freeman, J. A. Ber- 368 (1978).
riman, and J.-C. Homo, J. Microsc., 128, 219- 30. R. H. Jacobson, X.-J. Zhang, R. F. DuBose, and
237 (1982~). B. W. Matthews, Nature, 369,761-766 (1994).
9. J. Dubochet, M. Adrian, J.-J. Chang, J.-C. 31. G. P. A. Vigers, R. A. Crowther, and B. M. F.
Homo, J. Lepault, A. W. McDowall, and P. Pearse, EMBO J.,5, 529-534 (1986).
Schultz, Q. Rev. Biophys., 21,129-228 (1988). 32. M. Schatz, E. V. Orlova, P. Dube, J. Jager, and
M. van Heel, J. Struct. Biol., 114, 28-40
10. K. A. Taylor and R. M. Glaeser, Science, 186,
(1995).
1036-1037 (1974).
33. R. A. Grant, D. J. Filman, S. E. Finkel, R.
11. R. Henderson, J . M. Baldwin, T. A. Ceska, F. Kolter, and J . M. Hogle, Nut. Struct. Biol., 5,
Zemlin, E. Beckmann, and K. H. Downing, J. 294-303 (1998).
Mol. Biol., 213, 899-929 (1990).
34. A. Mattevi, G. Obmolova, E. Schulze, K. H.
12. W. Kuhlbrandt, D. N. Wang, andY. Fujiyoshi, Kalk, A. H. Westphal, A. D. Kok, and W. G. J .
Nature, 367, 614-621 (1994). Hol, Science, 255, 1544-1550 (1992).
13. E. Nogales, S. G. Wolf, and K. H. Downing, 35. R. A. Milligan, Proc. Natl. Acad. Sci. USA, 93,
Nature, 391, 199-203 (1998). 21-26 (1996).
14. K. Murata, K. Mitsuoka, T. Hirai, T. Waltz, P. 36. A. Miyazawa, Y. Fujiyoshi, M. Stowell, and N.
A g e , J. B. Heymann, A. Engel, and Y. Fujiyo- Unwin, J. Mol. Biol., 288, 765-786 (1999).
shi, Nature, 407,599-605 (2000). 37. K. Hirose, W. B. Amos, A. Lockhart, R. A.
15. L. A. Amos, R. Henderson, and P. N. T. Unwin, Cross, and L. A. Amos, J. Struct. Biol., 118, '
Prog. Biophys. Mol. Biol., 39, 183-231 (1982). 140-148 (1997).
16. T. Walz & N. Grigorieff, J. Struct. Biol., 121, 38. K. Narnba and F. Vonderviszt, Q. Rev. Bio-
142-161 (1998). phys., 30,l-65 (1997).
17. T. S. Baker, N. H. Olson, and S. D. Fuller, Mi- 39. T.-W. Jeng, R. A. Crowther, G. Stubbs, and W.
crobiol. Mol. Biol. Rev., 63,862-922 (1999). Chui, J. Mol. Biol., 205, 251-257 (1989).
18. J. Frank, Three-Dimensional Electron Micros- 40. A. Cheng, A. N. van Hoek, M. Yeager, A. S.
copy of MacromolecularAssemblies, Academic Verkman, and A. K. Mitra, Nature, 387, 627-
Press, San Diego, CA, 1996,342 pp. 630 (1997).
41. V. M. Unger, N. M. Kumar, N. B. Gilula, and
19. I. Hargittai and M. Hargittai, Eds., Stereo-
M. Yeager, Science, 283,1176-1180 (1999).
chemical Applications of Gas-Phase Electron
Diffraction, VCH, New York, 1988. 42. D. A. Winkelmann, T. S. Baker, and I. Ray-
ment, J. Cell Biol., 114, 701-713 (1991).
20. M. Isaacson, J. Langmore, and H. Rose, Optik,
41,92-96 (1974). 43. K. A. Taylor, J. Tang, Y. Cheng, and H. Win-
kler, J. Struct. Biol., 120,372-386 (1997).
21. R. Henderson, Q. Rev. Biophys., 28, 171-193
44. B. Bottcher, S. A. Wynne, and R. A. Crowther,
(1995).
Nature, 386, 88-91 (1997).
22. W. A. Havelka, R. Henderson, and D. Oester- 45. A. Malhotra, P. Penczek, R. K. Agrawal, I. S.
helt, J. Mol. Biol., 247, 726-738 (1995). Gabashvili, R. A. Grassucci, R. Junemann, N.
23. R. M. Glaeser, J. Ultrastruct. Res., 36, 466- Burkhardt, K. H. Nierhaus, and J. Frank, J.
482 (1971). Mol. Biol., 280, 103-116 (1998).
24. R. Henderson and P. N. T. Unwin, Nature, 46. R. W. Horne and I. Pasquali-Ronchetti, J. Ul-
2 5 7 , 2 8 3 2 (1975). trastruct. Res., 47, 361-383 (1974).
47. H. Yoshimura, M. Matsumoto, S. Endo, and K. 69. J. C. H.Spence, Experimental High-Resolution

Nagayama, Ultramicroscopy, 32, 265-274 Electron Microscopy, Oxford University Press,
(1990). Oxford, UK, 1988.
48. R. Kornberg and S. A. Darst, Curr. Opin. 70. L. Reimer, Transmission Electron Microscopy,
Struct. Biol., 1,642-646 (1991). Springer-Verlag, Berlin, 1989.
49. B. Jap, M. Zulauf, T. Scheybani, A. Hefti, W. 71. R. H. Wade and J. Frank, Optik, 49, 81-92
Baumeister, and U. Aebi, Ultramicroscopy,46, (1977).
45-84(1992). 72. R. H. Wade, Ultramicroscopy, 46, 145-156
50. E. W. Kubalek, S. F. J. LeGrice, and P. 0 . (1992).
Brown, J. Struct. Biol., 113,117-123(1994). 73. C. Toyoshima, K. Yonekura, and H. Sasabe,
51. J.-L. Rigaud, G. Mosser, J . J . Lacapere, A. Ultramicroscopy, 48,165-176(1993).
Olofsson, D. Levy, and J.-L. Ranck, J. Struct. 74. R. H. Cheng, N. H. Olson, and T. S. Baker,
Biol., 118,226-235(1997). Virology, 186,655-668 (1992).
52. L.Hasler, J. B. Heymann, A. Engel, J. Kistler, 75. B. L. Trus, R. B. S. Roden, H. L. Greenstone,
and T. Walz, J. Struct. Biol., 121,162-171 M. Vrhel, J. T. Schiller, and F. P. Booy, Nut.
(1998). Struct. Biol., 4,413-420 (1997).
53.I. Reviakine, W.Bergsma-Schutter, and A. 76. F. Zemlin, Ultramicroscopy, 46,25-32 (1992).
Brisson, J. Struct. Biol., 121,356-361(1998). 77. Z. H. Zhou and W. Chiu, Ultramicroscopy, 49,
54.E.M. Wilson-Kubalek, R. E. Brown, H. Celia, 407-416(1993).
and R. A. Milligan, Proc. Natl. Acad. Sci. USA, 78. F. Zemlin, Micron, 25,223-226(1994).
95,8040-8045(1998). 79. E. J. Mancini, F. D. Haas, and S. D. Fuller,
55. A. Polyakov, C. Richter, A. Malhotra, D. Kou- Structure, 5,741-750(1997).
lich, S. Borukhov, and S. A. Darst, J.Mol. Biol., 80.J.Brink, M. B. Sherman, J. Berriman, and W.
281,465-473(1998). Chiu, Ultramicroscopy, 72,41-52 (1998).
56. M. Adrian, J. Dubochet, J. Lepault, and A. W. 81. 0. L. Krivanek and P. E. Mooney, Ultramicros-
McDowall, Nature, 308,32-36 (1984). copy, 49,95-108(1993).
57. J.R. Bellare, H. T. Davis, L. E. Scriven, andY. 82. M. B. Sherman, J. Brink, and W. Chiu, Micron,
Talmon, J. Electron Microsc. Technol., 10,87- 27,129-139(1996).
111(1988). 83. D. J. DeRosier and P. B. Moore, J. Mol. Biol.,
58. E. Mayer and G. Astl, Ultramicroscopy, 45, 52,355-369 (1970).
185-197(1992). 84. J. L. Carrascosa and A. C. Steven, Micron, 9,
59. J. Berriman and N. Unwin, Ultramicroscopy, 199-206(1978).
56,241-252(1994). 85. M. van Heel and J. Frank, Ultramicroscopy, 6,
60. H. D.White, M. L. Walker, and J. Trinick, J. 187-194(1981).
Struct. Biol., 121,306-313(1998). 86. I.S. Gabashvili, R. K. Agrawal, C. M. T. Spahn,
61. S. Subramaniam, M. Gerstein, D. Oesterhelt, R. A. Grassucci, D. I. Svergun, J. Frank, and P.
and R. Henderson, EMBO J.,12,l-18 (1993). Penczek, Cell, 100,537-549(2000).
62. D. P.Siegel, W. J. Green, and Y. Talmon, Bio- 87. M. Radermacher, T. Wagenknecht, A. Ver-
phys. J., 66,402-414 (1994). schoor, and J. Frank, J. Microsc., 146,113-136
(1987).
63. S. Trachtenberg, J. Struct. Biol., 123,45-55
(1998). 88. M. Radermacher, J. Electron Microsc. Tech-
nol., 9,359-394 (1988).
64. N.Unwin, Nature, 373,37-43 (1995).
89. J. Frank, M. Radermacher, P. Penczek, J. Zhu,
65. S. D. Fuller, J. A. Berriman, S. J. Butcher, and Y. Li, M. Ladjadj, and A. Leith, J. Struct. Biol.,
B. E. Gowen, Cell, 81,715-725(1995). 116,190-199(1996).
66. D. P.Siegel and R. M. Epand, Biophys. J.,73, 90. M. van Heel, Ultramicroscopy, 21, 111-124
3089-3111(1997). (1987a).
67. M. Walker, X.-Z.Zhang, W. Jiang, J. Trinick, 91. M. van Heel, G. Harauz, and E. V. Orlova, J.
and H. D. White, Proc. Natl. Acad. Sci. USA, Struct. Biol., 116,17-24(1996).
96,465-470 (1999). 92. M. Radermacher in D.-P. Hader, Ed., Image
68. M. Cyrklaff and W. Kiihlbrandt, Ultramicros- Analysis in Biology, CRC Press, Boca Raton,
copy, 55,141-153(1994). FL, 1991,pp. 219-246.
References
93. M. Radermacher in J. Frank, Ed., Electron To- 115. I. M. Boier Martin, D. C. Marinescu, R. E.
mography, Plenum Press, New York, 1992, pp. Lynch, and T . S. Baker, J. Struct. Biol., 120,
91-115. 146-157(1997).
94. M. Radermacher, Ultramicroscopy, 53, 116. Z. H. Zhou, W . Chiu, K. Haskell, H. J. Spears,
121-136(1994). J. Jakana, F. J. Rixon, and L. R. Scott, Biophys.
95. P. A. Penczek, R. A. Grassucci, and J . Frank, J.,74,576-588(1998).
Ultramicroscopy, 53,251-270(1994). 117. P.L. Stewart, C. Y . Chiu, S. Huang, T . Muir,Y.
96. M . Schatz and M. van Heel, Ultramicroscopy, Zhao, B. Chait, P. Mathias, and G. R. Nem-
32,255-264(1990). erow, EMBO J., 16,1189-1198(1997).
97. P. Penczek, M. Radermacher, and J. Frank, 118. J. Walz, T . Tamura, N. Tamura, R. Grimm,W .
Ultramicroscopy, 40,33-53(1992). Baumeister, and A. J. Koster, Mol. Cell, 1,
59-65(1997).
98. N. Grigorieff, J. Mol. Biol., 277, 1033-1046
(1998). 119. M. Stewart, J. Electron Microsc. Technol., 9,
325-358(1988).
99. D. E. Olins, A. L. Olins, H. A. Levy, R. C.
120. C. Toyoshima and N. Unwin, J. Cell Biol., 111,
Durfee, S. M. Margle, E. P. Tinnel, and S. D.
2623-2635(1990).
Dover, Science, 220,498-500 (1983).
121. D. G. Morgan and D. DeRosier, Ultramicros-
100.U . Skoglund and B. Daneholt, Trends Bio-
copy, 46,263-285 (1992).
chem. Sci., 11,499503(1986).
122. N. Unwin, J. Mol. Biol., 229, 1101-1124
101. J. C. Fung, W . Liu, W . J. DeRuijter, H. Chen,
(1993).
C. K. Abbey, J . W . Sedat, and D. A. Agard, J.
Struct. Biol., 116,181-189 (1996). 123. R. Beroukhim and N. Unwin, Ultramicros-
copy, 70,57-81 (1997).
102. W . Baumeister, R. Grimm, and J. Walz, Trends
Cell Biol., 9,81-85(1999). 124. E.H.Egelman, Ultramicroscopy, 19,367474
(1986).
103. A. K. Shah and P. L. Stewart, J. Struct. Biol.,
123,17-21(1998). 125. B. Carragher, M . Whittaker, and R. A. Milli-
gan, J. Struct. Biol., 116,107-112 (1996).
104. F. Beuron, M . R. Maurizi, D. M. Belnap, E.
126. C. H. Owen, D. G. Morgan, and D. J. DeRosier,
Kocsis, F. P. Booy, M. Kessel, and A. C. Steven,
J. Struct. Biol., 116,167-175(1996).
J. Struct. Biol., 123,248-259 (1998).
127. R. Henderson, J. M. Baldwin, K. H. Downing,
105. R. A. Crowther, L. A. Amos, J. T . Finch, D. J. J. Lepault, and F. Zemlin, Ultramicroscopy,
DeRosier, and A. Klug, Nature, 226,421-425 19, 147-178(1986).
(1970).
128. J. M . Baldwin, R. Henderson, E. Beckman, and
106. R. A. Crowther, Philos. Trans. R. Soc. Lond., F. Zemlin, J. Mol. Biol., 202,585-591(1988).
261,221-230(1971).
129. S. Hardt, B. Wang,andM. F . Schmid, J. Struct.
107. S. D. Fuller, S. J. Butcher, R. H. Cheng, and Biol., 116,68-70(1996).
T . S. Baker, J. Struct. Biol., 116,48-55(1996).
130. R. A. Crowther and P. K. Luther, Nature, 307,
108. R. H. Cheng, V . S. Reddy, N . H. Olson, A. J . 569-570(1984).
Fisher, T . S. Baker, and J. E. Johnson, Struc-
131. H. Winkler and K. A. Taylor, J. Struct. Biol.,
ture, 2,271-282(1994).
116,241-247(1996).
109. R. A. Crowther, N . A. Kiselev, B. Bottcher, 132. J. Frank in J . Frank, Ed., Electron Tomogra-
J. A. Berriman, G. P. Borisova,V . Ose, and P. phy: Three-Dimensional Imaging with the
Pumpens, Cell, 77,943-950(1994).
Transmission Electron Microscope, Plenum
110. T . S. Baker and R. H . Cheng, J. Struct. Biol., Press, New York, 1992,399 pp.
116,120-130(1996). 133. J. Frank, A. Verschoor, and M . Boublik, Sci-
111. J. R. Castbn, D. M . Belnap, A. C. Steven, and ence, 214,1353-1355(1981).
B. L. Trus, J. Struct. Biol., 125, 209-215 134. M. van Heel, Ultramicroscopy, 21, 95-100
(1999). (198'7b).
112. R. A. Crowther, R. Henderson, and J. M. 135. M. van Heel and J. Hollenberg in W . Baumeis-
Smith, J. Struct. Biol., 116,9-16(1996). ter and W . Vogell, Eds., Electron Microscopy at
113. J. A. Lawton and B. V . V . Prasad, J. Struct. Molecular Dimensions, Springer-Verlag, Ber-
Biol., 116,209-215(1996). lin, 1980, pp. 256-260.
114. P. A. Thuman-Commike and W . Chiu, J. 136. M . Unser, B. L. Trus, J. Frank, and A. C.
Struct. Biol., 116,41-47(1996). Steven, Ultramicroscopy, 30,429-434 (1989).
137. D. A. Agard, J. Mol. Biol., 167, 849-852 143. Y. Fujiyoshi, T. Mizusaki, K. Morikawa, H.
(1983). Yamagishi, Y. Aoki, H. Kihara, and Y. Harada,
138. J. M. Valpuesta, J. L. Carrascosa, and R. Hen- Ultramicroscopy, 38,241-251(1991).
derson, J. Mol. Biol., 240,281-287(1994). 144. F. Zemlin, E. Beckmann, and K. D. vander-
Mast, Ultramicroscopy, 63,227-238(1996).
139. J. T.Finch, J. Mol. Biol., 66,291-294(1972).
145. N.Kisseberth, M. Whittaker, D. Weber, C. S.
140. Z. H. Zhou, S. Hardt, B. Wang, M. B. Sherman, Potter, and B. Carragher, J. Struct. Biol., 120,
J. Jakana, and W. Chiu, J. Struct. Biol., 116, 309-319(1997).
216-222(1996). 146. M. Hadida-Hassan, S. J. Young, S. T. Peltier,
141. N.H. Olson and T. S. Baker, Ultramicroscopy, M. Wong, S. Lamont, and M. H. Ellisman, J.
30,281-298(1989). Struct. Biol., 125,235-245 (1999).
142. D.M. Belnap, N. H. Olson, and T. S. Baker, J. 147. B. F. McEwen, K. H. Downing, and R. M. Glae-
Struct. Biol., 120,44-51(1997). ser, Ultramicroscopy, 60,357-373 (1995).
CHAPTER FIFTEEN
Peptidomimetics for Drug
M. ANGELS ESTIARTE
DANIEL H. RICH
School of Pharmacy-Department of Chemistry
University of Wisconsin-Madison
Madison, Wisconsin
Contents
1 Introduction, 634
2 Classification of Peptidomimetics,634
3 Design of Conformationally Restricted Peptides,
636
4 Template Mimetics, 643
5 Peptide Bond Isosteres, 644
6 From Transition-State Analog Inhibitors to Non-
Peptide Inhibitors: Examples in Protease
Inhibitors, 646
6.1 TSA in Aspartic Peptidase Inhibitors, 647
6.2 TSA in Metallo Peptidase Inhibitors, 650
6.3 TSA-Derived Cysteine and Serine Peptidase
Inhibitors, 652
7 Speeding up Peptidomimetic Research, 655
8 Toward Rational Drug Design: Discovery of
Novel Non-Peptide Peptidomimetics, 657
9 Historical Development of Important Non-
Peptide Peptidomimetics,659
9.1 HIV Protease, 659
9.2 Thrombin, 660
9.3 Factor Xa,662
9.4 Glycoprotein IIbiIIIa (GP IIbDIIa),662
9.5 Ras-Farnesyltransferase, 665
9.6 Non-Peptidic Ligands for Peptide Receptors,
667
9.6.1 Angiotensin 11,668
9.6.2Substance P, 669
9.6.3Neuropeptide Y,670
9.6.4Growth Hormone Secretagogues, 670
9.6.5 Endothelin, 672
10 Summary and Future Directions, 674
Peptidomimetics for Drug Design
1 INTRODUCTION 2 CLASSIFICATION OF
PEPTIDOMIMETICS
Protein-protein interactions are central to bi-
ology and provide one mechanism to convert The term peptidomimetic is often used in the
genomic information into regulated biological literature to indicate a multitude of structural
responses. Important examples of protein- types that differ in fundamental ways. Com-
peptide interactions include the binding of parisons between peptidomimetics suffer
peptide ligands to proteases, the binding of from the lack of accepted definitions of what a
peptide hormones to peptide receptors, the re- peptidomimetic is (1).The term is often ap-
cruitment of proteins to effect signal trans- plied to highly modified analogs of peptides
duction, and apoptosis. Peptides also act as without distinguishing how these differ from
neurotransmitters, neuromodulators, hor- classical analogs of peptides. For example,
mones, and autocrine and paracrine factors. peptide (2) is derived from the decapeptide
Unfortunately, their use as pharmaceutical LH-RH (1);(2) contains only five amino acids,
drugs is made difficult by their poor pharma- none of which is present in the parent com-
cokinetic profiles; they are easily proteolyzed, pound, yet it is a powerful antagonist of the
poorly transported, and rapidly excreted. Al- LH-RH receptor (Fig. 15.1) (2). Is (2) a peptide
though modern formulation techniques have analog or a peptidomimetic?
improved delivery of peptides (e.g., inhalation In the 19709, Hughes et al. were the first to
of insulin), there remains a need for small po- show that two very different chemical struc-
tent molecules that can be administered tures have similar agonist properties (3). The
orally. opioid natural product, morphine (3),was
For these reasons, much effort has been ex- found to resemble the N-terminal structure of
pended to find ways to replace portions of pep- the endogenous opioid peptides, enkephalins,
tides with non-peptide structures, called pep- (4a) and (4b),and p-endorphin ( 5 )(Fig. 15.2).
tidomimetics, in the hope of obtaining orally The remarkable similarity between the mor-
bioavailable entities. Several types of peptido- phine phenol system and the N-terminal ty-
mimetics have been developed, and the field rosine residue in the peptide opioids implied
has emerged as one of the important ap- that these units reacted with opioid receptors
proaches to drug design and discovery. This in a similar fashion to elicit comparable re-
review will describe the various methods de- sponses (4- 6).
veloped to design peptidomimetics. Due to The realization that a non-peptide natural
space limitations, the biological rationale for product was mimicking the action of a natural
each peptidomimetic and its chemical synthe- peptide effector led Farmer to postulate that
sis can not be covered. Selected examples of other non-peptide structures might be found
the strategies employed to obtain peptidomi- that would mimic other peptide effectors (7).
metics are provided to illustrate the breadth of His concept of "peptide mimetic," which later
research in this field. was called "peptidomimetic," proposed that
pGIu-His-Trp-Ser-Tyr-Gly-Leu-Arg-Pro-Gly-NH,
LH-RH
(1
Figure 15.1. Reduced-size antag-

onist of LH-RH.
2 Classification of Peptidomimetics
&
Met-enkephalin Tyr-Gly-Gly-Phe-Met
(4a)
Leu-Enkephalin Tyr-Gly-Gly-Phe-Leu
(4b) o\+
Tyr-Gly-Gly-Phe-Met-Thr-Ser-GIu-Lys-Ser-
,%Endorphin Gln-Thr-Pro-Leu-Val-Thr-Phe-Lys-Asn-Ala-
HO b~
Ile-He-Lys-Asn-Ala-Tyr-Lys-Lys-Gly-Glu Morphine
(5) (3)
Figure 15.2. Examples of peptidic and non-peptidic opioid receptor ligands.
novel scaffolds could be designed to replace idence is available from X-ray crystallography
the entire peptide backbone while retaining that heterocyclic inhibitors are mimicking the
isosteric topography of the enzyme-bound extended p-strand of enzyme-bound sub-
peptide (or assumed receptor-bound) confor- strate-derived inhibitors (vide infra).
mation. Farmer's definition went beyond sim- Based on these considerations, four dis-
ple replacement of amide bonds and the con- tinct types of peptidomimeticshave been iden-
cept of stringing together conformationally tified to date (9, 10). The first invented were
restricted amino acid derivatives to mimic the structures that contain one or more mimics of
native peptide structure. In the intervening the local topography about an amide bond
years, many non-peptide and partially peptide (amide bond isosteres). Strictly speaking,
structures have been found that mimic (or an- these are properly classified as pseudopeptides
tagonize) the action of the peptide ligand at its ( l l ) , but in recent years, they have been called
receptor; this is particularly true with sub- peptidomimetics on occasion. For historical
stances active at G-protein-coupled receptors. reasons, we classify the peptide backbone mi-
The pyrrolinone unit (6)designed by Smith metics as type I mimetics (Table 15.1). These
and Hirschmann illustrates a modern use of
these two concepts (Fig. 15.3) (8). Pyrrolino-
nes constrain the peptide-like side-chains into
an extended p-structure topography that fits
the active sites of most peptidases; pyrrolino-
nes are resistant to normal proteolysis be-
cause no a-amino acid units remain, and the
units impart sophisticated partitioning prop-
erties to the final inhibitor. Pyrrolinones, like
many peptide-derived peptidomimetics, retain Peptide
an atom-to-atom correspondence to the par-
ent peptide, especially with respect to the pep-
tide backbone structure. Most of these struc-
tures contain elements that accomplish one of
two objectives: they replace amide bonds with
metabolically stable units, and they affect a
conformational constraint on peptides or on
the peptide replacement. In contrast, hetero-
cyclic natural products or screening leads that
bind to peptide receptors also have been called
peptidomimetics by virtue of their mimicking Pyrrolinone analog
(or antagonizing) the function of the natural
peptide. Although structural data confirming
mimicry of the designed mimetics are rarely Figure 15.3. Correlationof pyrrolinone-based pep-
available for receptor bound ligands, ample ev- tidomimetics and the parent peptide.
636 Peptidomimetics for Drug Design
Table 15.1 Peptidomimetic Types

Peptidomimetic Examples
Type I Peptide backbone mimetics Substrate-based design Pseudopeptides
Type I1 Functional mimetics Molecular modeling, HTS GPCR antagonists
Type I11 Topographical mimetics Structure-based design Non-peptide protease
inhibitors
Type IV Non-peptide peptidomimetics Group Replacement Piperidine inhibitors
Assisted Binding
mimetics often match the peptide backbone X-ray structural determination of both the
atom-for-atom while retaining functionality peptide-derived inhibitor and the heterocyclic
that makes important contacts with binding non-peptide inhibitor complexes have been
sites. Some units mimic short portions of sec- compared. These examples demonstrate that
ondary structure (e.g., p-turns) and have been alternate scaffolds can display side-chains so
used to generate lead compounds. Many early that they interact with proteins in fashion
protease inhibitors were designed from tran- closely related to that of the parent peptide.
sition state analog mimetics or from collected Recently, a fourth type of peptidomimetic
substrate/product mimetics, each designed to called a GRAB-peptidomimetic (group replace-
mimic reaction pathway intermediates of the ment-assisted binding) has been identified
enzyme-catalyzed reaction. These are mimics (10). These structures might share structural-
of the peptide bond in a putative transition functional features of type I peptidomimetics,
state or product state and will be classified but they bind to an enzyme form not accessible
here as peptidomimetics. with type I peptidomimetics.
The second type of mimetic to emerge was Previous reviews on peptidomimetics have
the functional mimetic, or type ZZ mimetic, addressed pseudopeptides (ll),macrocyclic
which is a small non-peptide molecule that mimetics (13), natural product mimetics (14),
binds to a peptide receptor. Morphine was the cyclic protease inhibitors (15),mimetics for re-
first well-characterized example of this type of ceptor ligands (16-22), and earlier general
peptidomimetic. Initially, type I1 mimetics overviews (23-29). This review will focus on
were presumed to be direct structural analogs the design process itself. Novel peptidomimet-
of the natural peptide, but characterization of ics in which the structural relationship be-
both the endogenous peptide and antagonist's tween parent peptide and the peptidomimetic
binding sites by site-directed mutagenesis has has been established by biophysical methods
raised doubts about this interpretation (12). are used to clarify the principles. Successful
The mutagenesis data indicate that antago- approaches are highlighted to illustrate how
nists for a large number of receptors seem to these concepts are currently used.
bind to receptor subsites different than those
used by the parent peptide. Consequently,
functional mimetics may not mimic the struc- 3 DESIGN O F CONFORMATIONALLY
ture of the parent hormone; this remains to be RESTRICTED PEPTIDES
determined. Despite this uncertainty, the ap-
proach has been quite successful and produced Peptide derivatives that contain conforma-
a number of potential drug lead structures. tionally restricting amino acid units or other
Type ZZZ mimetics represent the Farmer conformational constraints were first called
definition ofpeptidomimetics in that theypos- conformationally constrained (or restricted)
sess novel templates, which appear unrelated peptide analogs. The use of steric hindrance or
to the original peptides but contain the essen- cyclization to limit rotational degrees of free-
tial groups, positioned on a novel non-peptide dom in biologically active molecules has a long
scaffold to serve as topographical mimetics. history and was originally applied to non-pep-
Several type I11 peptidomimetic protease in- tide neurotransmitters (30). Subsequently, it
hibitors have been characterized where direct was applied to amino acid substituents and to
3 Design of Conformationally Restricted Peptides
rocyclic peptide ring system to reduce the

number of conformations available to the an-
alog. Not all substitutions were expected to
produce biologically active products, but those
that retained activity were assumed to be able
to adopt conformations close to the normal
bioactive conformation. This work began from
the earlier discovery by Rivier et al. (37) that
TRH replacement of ~4ryptophanin the position-8
of somatostatin by D-tryptophan produced an
(7)
analog that retained biological activity. This
Figure 15.4. Structure of TRH tripeptide. unusual biological result is possible when a
D,L-sequence(D-Trp-Lys) replaces an L,L-se-
cyclic peptides (31, 32) and to control second- quence (Trp-Lys) in a peptide at a type I1
ary structure in model proteins. p-turn, because the topography of the amino
Conformational restriction is a very power- acid side-chains at these positions is essen-
ful method for probing the bioactive confor- tially identical in these turns (38). These re-
mations of peptides. Small peptides have sults led Veber et al. to postulate that the
many flexible torsion angles so that enormous amino acid sequence Phe-Trp-Lys-Thr might
numbers of conformations are possible in so- be part of a type I1 p-turn, and that this tet-
lution. For example, a simple tripeptide such rapeptide sequence might comprise the active
as thyrotropin-releasing hormone (TRH; 7) pharmacophore. Although this hypothesis was
(Fig. 15.4) with six flexible bonds could have highly speculative for its time, it was shown to
over 65,000 possible conformations. The num- be essentially correct by applying the principle
ber of potential conformers for larger peptides of conformational restriction (Fig. 15.5). Dele-
is enormous, and some method is needed to tion of the N-terminal dipeptide, followed by
exclude potential conformers. Modern bio- insertion of the D-Trp at position-8, and re-
physical methods, e.g., X-ray crystallography placement of the disulfide sulfurs with car-
or isotope edited nuclear magnetic resonance bons produced analog (9). NMR and other
(NMR),(33) can be used to characterize pep- data suggested that the two Phe side-chains
tide-protein interactions for soluble proteins, were clustered, thus they were replaced by a
but most biophysical methods cannot yet de- transannular disulfide bond limiting the avail-
termine the conformation of a ligand bound to able conformation, as in compound (10).After
constitutive receptors, e.g., G-protein-cou- several iterations of this process, a biologically
pled receptors (34,35). active cyclic hexapeptide (11) was discovered
Cyclization is one of the earliest techniques that retained only 6 of the original 14 amino
applied to design peptidomimetics. Cyclic pep- acids in somatostatin yet produced a fully ac-
tides are more stable to amide bond hydrolysis tive derivative (31).
and allow less conformational flexibility; con- The work of Veber et al. established that
sequently, the resulting analogs are antici- valuable information about the bioactive con-
pated to be more selective and less toxic. Meth- formation of a flexible peptide could be ob-
-
ods for restricting conformations include tained by applying the principles of conforma-
peptide backbone cyclization, disulfide bond tional restriction, and several additional
formation, side-chain cyclization, and metal examples soon were reported that followed
ion chelation. this strategy. Conformationally restricted en-
The first successful application of confor- kephalin analogs, e.g., 02-13), were formed
mational restriction to peptide chemistry was by cyclizing between positions 2 and 5 of en-
carried out by Veber et al. at Merck, (361, who kephalins (4a-b) (39). Cyclization of a-mela-
were trying to simplify the structure of soma- notropin (14) gave the unusually active analog
tostatin (8) (Fig. 15.5) to produce an orally (15) (40). Small cyclic analogs of endothelin
active derivative. Their approach was to intro- (16) (41) have been discovered by applying
duce conformational restraints into the mac- these methods, as illustrated by (17) (Fig.
Design of Conformationally Restricted Peptides
H2N-Cys-Ser-Cys-Ser-Ser-Leu-Met
1 \ I
HO-Trp-lle-lle-Asp-Leu-His-Cys-Phe-Tyr-Val-Cys-Glu-Lys-~p
Endothelin
Ac-Ser-
(16)
Figure 15.6. Cyclic hormone peptide analogs.
15.6). Peptide chemists routinely apply con- modifications of the peptide substrate are the
formational restriction in their attempts to replacement of the amino acids of the PI-P,'
determine possible bioactive conformations. cleavage site by D-amino acids or the employ-
Flexible peptides can be conformationally ment of a-C or a-N alkylated amino acids and
restricted by a variety of methods other than cyclic or p-amino acids (Fig. 15.7).
macrocyclization of the peptide. For example, Mimicking the secondary structure of pep-
Marshall et al. introduced a-methyl amino tides has become one of the most important
acid substituents into peptides as a way to de- tools for rational drug design (44-47). These
crease the conformational space available to methods induce the synthetic analog to adopt
the resulting peptide (42); these types of ap- a set of target conformations, which are de-
proaches led to his "Active Analog" approach signed to mimic the bioactive conformation
for determining bioactive conformations of predicted in the native substrate from bio-
flexible molecules (43). Some other traditional physical techniques. Molecular surrogates
a-Amino acid
PAmino acid a-Alkyl p-Amino acid
Cyclic derivatives / N
H
Figure 15.7. Representative amino acid mimetics.
have been found that efficiently- mimic turns, pound (19) retained good biological activity so
strands, sheets, and helices. By far, the major that the bioactive conformation of LH-RH
efforts have focused on the design of p-turn mi- probably contains a p-turn around residues 6
metic~.Some of the templates used to constrain and 7 (48).
the conformational torsion angles of the peptide Conformational restriction has also been
chain are summarized in Figs. 15.8-15.14. used to determine the bioactive conformation
In a very early example, Freidinger et al. of enzyme-inhibitor systems for which no X-
developed a series of cyclic lactams that stabi- ray crystal structure is available. Thorsett et
lized P- and y-turn structures in linear pep- al. (49) synthesized conformationally re-
tides (Fig. 15.8). This strategy was applied to stricted bicyclic lactam derivatives of the an-
determine conformations of LH-RH that are giotensin converting enzyme (ACE) inhibitors
-
consistent with the turn structure ~ermitted enalapril (20) and enalaprilat (21) (Fig. 15.9)
by the constraint. For example, the 3-amin- to characterize torsion angles in the bioactive
olactam (18)was used to mimic a p-turn con- conformation. Analog (22) was used to con-
formation. When inserted in LH-RH, com- strain the torsion angle psi (T). Flynn et al.
Ca(i)
I
0
Glu-His-Trp-Ser-Tyr 9v O
JArg-pro-Gly-NH2
-
\
LH-RH p-turn mimetic
(19)
Figure 15.8. y-Ladam analog of LH-RH. A P-turn mimetic.

3 Design of Conformationally Restricted Peptides
Enalapril, R = Et, (20)

Enalaprilat, R = H, (21)
Figure 15.9. Conformation-

ally restricted ACE inhibitors.
(50) extended this principle to prepare the been shown to induce the desired secondary
very tight-binding tricyclic ACE inhibitor (23) structure in a gramicidine S analog. Later, it
(Fig. 15.9). was used to prepare a conformationally re-
Several other y-, 6,and elactam deriva- stricted cyclosporin A analog (51). Several
tives have been prepared and inserted into re- p-turn and y-turn mimetics are shown in Figs.
ceptor antagonists or agonists. For instance, 15.10-15.12, and many other examples are
the thiazolidine lactam (24) (Fig. 15.10) has available in the recent literature (52-54).
Figure 15.10. Lactams as p-turn mimetics.

Figure 15.11. Other p-turn mimetic scaffolds.
The p-sheet is an important, biologically trate other molecular templates designed to

relevant secondary structure. As noted ear- stabilize peptides in a p-strand conformation
lier, the pyrrolinones invented by Smith et al. (Fig. 15.13) (56, 57).
(Fig. 15.3) adopt a p-strand conformation, Peptidomimetic structures that support
which was corroborated by computer model- a-helixes (28-30) (Fig. 15.14) and loops have
ing and by X-ray crystallography (55).The dia- been reported less frequently because of the
cylaminoeindolidinone (25), the dibenzofuran difficulty in presenting the side-chains cor-
(26), and the N,N-linked oligourea (27) illus- rectly. However, newer approaches have pro-
H
Figure 15.12. Some y-turn mimetic 0
scaffolds.
Template Mimetics
(27)
Figure 15.13. Structures of P-sheet mimetics.
vided mimetics of multiple discontinuous pro- ing specific protein-protein interactions. In-
tein surfaces (56). Over the last few years, the sertion of the key pharmacophoric groups into
Gellman, Seebach, and Hanessian research a nonpeptidic framework has provided good
groups have invented novel helical structures inhibitors ofa variety of biological targets.
(e.g., 31,321 by use of P-, y-, and Gpeptides (58). This technique has been successfully ap-
It is important to stress that even a small plied in those biological targets where the key
change in the structure or in a single torsional structural amino acids of the native peptide
angle can be sufficient to dramatically modify for peptide recognition are known. Miscella-
the conformation of the resulting peptide. Nu- neous examples are found in glycoprotein
merous additional conformational constraints
GbIIb/IIIa inhibitors (33)that mimic the RGD
have been developed, and the reader is encour-
sequence (64) or in Ras-farnesyltransferase
aged to consult these reviews for additional
inhibitors (34) that mimic the CAAX sequence
examples (32,59-63).
(Fig. 15.15) (65).
An early example of this concept was devel-
4 TEMPLATE MIMETICS oped by Hirschmann et al. in the design of a
somatostatin analog (Fig. 15.15)(55).Three of
Highly functionalized molecular scaffolds the four crucial amino acid side-chains of the
have proven to be very successful in mimick- parent peptide (Tyr, Trp, and Lys) were posi-
OMe
\
N-N
(28)
Figure 15.14. Newer templates found in helical or loop structures.
tioned on a sugar template (35). Although ligand were identified; then, NMR and molec-
originally designed as a somatotropin release ular modeling techniques were used to model
inhibitory factor (SRIF) antagonist, com- these side-chains onto known scaffolds and to
pound (35)also proved to be a good Substance compare with the original three-dimensional
P antagonist. These sugar derivatives, as well (3D) structure of the native peptide. Com-
as the benzodiazepine, diphenylmethane, and pound (37) (Fig. 15.16) is a potent peptidomi-
spiropiperidine scaffolds, are elements found metic derivative with improved solubility in
in a variety of inhibitors of receptors, and have water that functions the same as the cyolic
been designated as "privileged structures" (66). tetrapeptide (69, 70).
Thus, these common scaffolds can often provide
a template for further optimization of a desired
activity. Evans et al. have noted that the essen- 5 PEPTIDE B O N D ISOSTERES
tial surface area of biologically active peptides is
similar to the surface area of benzodiazepines, The replacement of amide bonds by retro-in-
one type of non-peptide scaffold known to bind verso amide replacements (71, 72) and other
to Gprotein-coupled receptors (67). amide bond isosteres generates pseudopep-
The quest for functionalized lead struc- tides (11). This process was first used to stabi-
tures that effectively mimic the "hot spots" lize peptide hormones in viuo, and later to pre-
within the biological ligand is not easy (68). pare transition state analog (TSA) inhibitors.
Molecular modeling and high-throughput Systematic efforts to convert good in vitro in-
screening (HTS) are techniques that are cur- hibitors into good in viuo inhibitors became
rently used for this purpose and have been the driving force for further development of
summarized elsewhere. peptidomimetics. Figure 15.17 illustrates
The design and synthesis of antifungal an- some of the peptide backbone modifications
alogs of the cyclic peptide rhodopeptin (36) that have been made in an effort to increase
(Fig. 15.16) illustrate a recent application of bioavailability. Replacement of scissile amide
peptidomimetic scaffolding, where the struc- (CONH) bonds with groups insensitive to hy-
ture of the biological target is not known. Af- drolysis (e.g., CH,NH) has been extensively
ter structure-activity relationship (SAR) stud- practiced. Reviews of this work have appeared
ies, the important side-chains of the peptide (11, 73). Removal of the proton donors and
Peptide Bond lsosteres
OMe
/
Somatostatin
(35)
Figure 15.15. Biologically active template mimetics.
acceptors in an amide bond also reduces hy- fects the geometry and increases the flexibility
dration, which improves the ability of the com- of the molecule at this position, which de-
pounds to penetrate lipid membranes (74). creases ligand binding. Effective analogs have
These approaches represent important first been obtained when conformational restric-
steps in development of peptidomimetics. tion, transition-state analog design, and
However, removal of an amide bond also af- amide bond replacements have been applied to
Figure 15.16. Rhodopeptin analogs. Representative example of scaffolding methodology.

Peptide
Peptoid
X = NH, 0, S, CH2 X=O,S Retroinverso
Figure 15.17. Isosteres that replace peptide backbone amide groups to generate pseudopeptides.
scaffolds with molecular weights below 500-

600 (75,76),but at present this process is very
labor intensive.
6 FROM TRANSITION-STATE ANALOG

INHIBITORS TO NON-PEPTIDE
INHIBITORS: EXAMPLES IN PROTEASE
INHIBITORS
Many peptidomimetics derived from the de- Tetrahedral intermediate

sign of TSA inhibitors, molecules designed ac-
cording to the hypothesis provided by Pauling
(77) and implemented by Wolfenden (78, 79).
I
Bond cleavage
TSA protease inhibitors are stable analogs of
the tetrahedral intermediate for peptide bond
hydrolysis that inhibit the enzyme (Fig.
15.18). The first successful commercial appli-
cation was the development of captopril (38)
by Ondetti et al. (80), and many applications
have been reported over the past quarter cen-
tury.
Figures 15.19-15.32 list examples of ana-
logs of peptidyl transition states that have
been employed to develop inhibitors of four Transition state analog
classes of peptidases (81, 82). These units are
used to replace the scissile amide bond in a
substrate sequence with either an amino acid
I
No bond cleavage
or dipeptide isostere, or with a chelating moi-
ety in the case of metallo peptidases. The Figure 15.18. TSA inhibit peptide bond hydrolisis.
6 From Transition-State Analog lnhibitors to Non-Peptide Inhibitors: Examples in Protease Inhibitors 647
Statine (Sta) Pepstatin

(4'3)
Reduced amide Hydroxyethylene Hydroxyethylamine Hydroxyethyl

(39) (41 (42)
hydrazide
Phosphonic a-Hydroxy- Hydroxyethylurea Hydroxyethyl

pamino acid sulfonamide
Figure 15.19. TSA used to inhibit aspartic peptidases.
dipeptide TSA provides the functionality that tic proteases, (87-89), and their success led to
interacts tightly with the enzyme catalytic other tetrahedral intermediate mimics such as
groups while the amino acid sequence up- and the hydroxylethylene (41) and hydroxyethyl-
downstream on the peptide chain provides in- amine (42) isosteres (Fig. 15.19) (90-92).The
teractions that lead to selective inhibition of statine subunit, which mimics the tetrahedral
the target enzyme. The enzyme active site typ- intermediate, represents one of the earlier ex-
ically is buried in a cleft capable of accommo- amples of TSA, although statine is one atom
dating up to three to nine amino acid residues short in backbone length to be a true dipeptide
of the substratelinhibitor depending on the or two atoms too long to be an isosteric re-
minimum amino acid sequence necessary for placement for a single amino acid.
hydrolysis. The inhibitor's exquisite selectiv-
Early work focused on developing inhibi-
ity derives from the interactions of the li-
tors of renin as potential antihypertensive
gand's p,-P,' residues with the enzyme bind-
agents, but these compounds failed to become
ing sites (S,S,') (83). Recently, some aspartic drugs primarily because of difficulties in ob-
and serine peptidase inhibitors have been
taining orally active drugs. As a result, the
found that access an additional binding site first pharmaceutical attempts to develop renin
sub-pocket (S3sP)to increase both inhibitor po- inhibitors for treatment of hypertension
tency and selectivity (84-86). through TSA-biased inhibitors failed (93). It
was eventually realized after extensive modi-
6.1 TSA in Aspartic Peptidase lnhibitors
fications to the ancillary peptide functionality
The reduced amide isostere (39),developed by that developing bioavailable peptide-derived
Szelke, and the statine (hydroxylmethylene) inhibitors critically depended on the molecu-
isostere (40) were early transition-state ana- lar weight of the inhibitor. Developing inhibi-
logs used to design inhibitors of various aspar- tors for HIV protease was substantially easier
Saquinavir (Roche)
Ritonavir (Abbott)
lndinavir (Merck) Amprenavir (Vertex)
Nelfinavir (Agouron) Lopinavir (Abbott)

Figure 16.20. Peptide-derived TSA inhibitors used clinically in H W protease-based AIDS therapies.
than for renin because HIV protease requires and could be applied to the development of
a significantly smaller minimum substrate se- HIV protease inhibitors. Variations on the hy-
quence (94). In addition, the principles eluci- droxyethyl amine moiety proved to be very
dated to develop renin inhibitors were known successful. Some of the highly modified HIV
6 From Transition-State Analog Inhibitors to Non-Peptide Inhibitors: Examples in Protease lnhibitors 649
p-Secretase cleaves APP at:

-Ser-Glu-Val-Lys-Met-:-Asp-Ala-Glu-Phe-Arg-
-Ser-Glu-Val-Asn-Leu+--Asp-Ala-Glu-Phe-Arg-
H2N-Lys-Thr-GIu-Glu-He-Ser-GIu-Val-Asn-HN
= 30
(43)
&
nM
OH 0
Val-Ala-Glu-Phe-OH
H2N-Glu-Val-Asn-HN Jk/k Ala-Glu-Phe-OH

OH =
K, = 1.6 nM
(44)
BocHN-Val-Met-HN LA OH
K, = 2.5 nM
(45)
=
Val-CONHBn
Figure 15.21. Peptide-derived TSA inhibitors as 0-secretase inhibitors.
protease inhibitors now in clinical use (Fig. droxyl group is hydrogen bonded to Asp32 and
15.20) have excellent oral bioavailability and Asp228, like in other hydroxyethylene deriva-
establish that application of the transition tives, and the inhibitor binds in an extended
state analog design process can be very suc- conformation. Because the target p-secretase
cessful in favorable cases. is within the CNS, successful inhibitors have
More recently, the principles for designing to penetrate the brain blood barrier readily, a
inhibitors of aspartic proteases have been ap- property not yet achieved with any of the pep-
plied to the design of inhibitors of p-secretase tidomimetic inhibitors currently available.
(BACE or Memapsin-2) as potential agents for With the crystal structure in hand, struc-
treating or preventing Alzheimer's disease ture-based modification of the parent lead
(95, 96). Both statine-derived inhibitors (43) compound has just started to provide new pep-
and hydroxyethylene-derived BACE inhibi- tidomimetic structures with lower molecular
tors have been reported (Fig. 15.21) (97,98).A weight and fewer hydrogen bonds (e.g., 45)
crystal structure of (44) bound to p-secretase (Fig. 15.211, opening further avenues to phar-
has been reported (99). As expected, the hy- macologically useful compounds (100).
Captopril Enalapril, R = Et
(38) Enalaprilat, R = H, (46)
Lisinopril
Figure 15.22. Examples of
TSA as ACE inhibitors. (47)
6.2 TSA in Metallo Peptidase Inhibitors trated by enalaprilat (46) and lisinopril (47)
The discovery of the angiotensin converting (Fig. 15.22) (101, 102).
enzyme inhibitors in the middle 1970s consti- Most metallopeptidase inhibitors append a
tutes one of the maior advances in the rational zinc chelating functionality to a peptide or
design of drugs, the consequences of which are peptidomimetic that is recognized by the S1'-
still being realized. The discovery of these me- S3' subsites in the target enzyme. Successful
tallo peptidase inhibitors was carried out by clinical candidates invariably contain groups
Ondetti et al. as part of a long-term study to that replace the initial di- and tri-peptide moi-
develop antihypertensive drugs (80); in 1999 eties to achieve selectivity and orally activity.
they received the Lasker Prize in Clinical For example, neutral endopeptidase (NEP),
Medicine for their work. another endopeptidase involved in degrading
Angiotensin converting enzyme (ACE) is a the larger opioid peptides dynorphan and/or
carboxy zinc metallo dipeptidase that cleaves endorphan, is inhibited by thiorphan (48)
His-Leu from the C-terminus of angioten- (103) and a variety of NEP inhibitors: retro-
sin-I. Ondetti et al. reasoned that the prod- thiorphan (49) (104) and kelatorphan (50)
uct of normal reaction, the carboxyl group, (Fig. 15.23) (105).The hydroxamic acid moiety
could bind to the active site zinc ion, and is used in many inhibitors of metallopepti-
that the carboxyl group of a collected-prod- dases.
uct inhibitor also could bind weakly. To im- Inhibition of NEP also prevents the degra-
-prove the interaction between inhibitor and dation of atrial natriuretic factor (ANF),a nat-
enzyme zinc ion, they replaced the carboxyl ural hypotensive peptide. Dual inhibitors of
group with a sulfhydryl group, which binds NEP and ACE have been designed success-
zinc about 1000 times more tightly. This led fully because both enzymes share significant
to captopril (Capoten) (38) (Fig. 15.22) (80). structural homology, particularly in their ac-
Later developments by other companies led tive sites. Simultaneous inhibition of both
to many ACE inhibitors. Some are illus- peptidases produces a more powerful hypoten-
0
Thiorphan
(48) Omapatrilat
ACE IC50 = 6 nM
NEP IC50= 9 nM
(51)
Retrothiorphan
Sampatrilat
ACE IC50= 7 nM
NEP ICS0= 20 nM
0
Kelatorphan
(50)
Figure 15.23. Examples of TSA as NEP inhibitors.
sive response (106, 107). Several dual inhibi-

tors are in phase I11 clinical trial for treating
hypertension (Fig. 15.24). Omapatrilat (51,
BMS-189921) is the furthest along as of late
2001 (105). ACE = 25 nM
Matrix metalloproteases (MMP) are also NEP IC50= 3 nM
inhibited by hydroxamic acids and/or thiols.
Over 25 variants of these enzymes are known, Figure 15.24. Examples of TSA as dual ACE/NEP
and some are involved in diseases ranging inhibitors.
from inflammation to metastatic cancer (108).
MMPs contain a zinc ion in the active site and
function through the metallopeptidases cata- Other reported zinc binding chelators used
lytic mechanism already discussed. However, in matrix metalloproteinase inhibitors are
subtle differences between enzymes enable se- summarized in Fig. 15.26. For instance, one of
lective inhibitors to be developed (109). Fig. the oxygens in the phosphonamide (57) binds
15.25 lists some of the reported MMP inhibi- strongly to the zinc ion, whereas the other one
tors that use carboxylic acid (52-531, a hy- coordinates weakly with the metal (110). More
droxamic acid (54-55), or thiol groups (56)as recently, "suicide substrate" MMP inhibitors
metal chelators. have appeared (58) (Fig. 15.26) (111). The se-
(54) (55) (56)
Figure 15.25. Traditional TSA used to inhibit metallopeptidases.
lectivity of this type of compound arises from apeutically useful. Synthetic analogs have
the specific coordination of the thiirane with been synthesized that inhibit this enzyme.
the active-site zinc ion, which facilitates thi- Clinical candidates like Ro-32,7315 (59) (Fig.
irane ring opening by nucleophilic attack by 15.27) are starting to emerge, and more are
neighboring Glu-404. This novel mode of bind- expected in the near future (115,116).
ing was assessed by X-ray absorption studies Aminopeptidases, enzymes that cleave off
because of the difficulty to obtain a suitable the N-terminal amino acid from a peptide
crystal structure (111,112). chain, are bismetallo peptidases, a class of
ADAMs are membrane proteins that con- metallopeptidase that contain two metals ions
tain a disintegrin and a metalloprotease doin the catalytic site (117, 118). These can be
main. Disintegrins are RGD-containing pro- inhibited by compounds related to bestatin
teins that inhibit cewmatrix interactions (60) (Fig. 15.28), which contains the N-termi-
(adhesion) and cewcell interactions (aggrega- nal a-hydroxy-P-amino acid residue, some-
tion) through the integrin receptors. In addi- times referred to as norstatine. In leucine
tion, ADAMs have two other domains that are amino peptidase, chelation occurs between
involved in signaling and transport (113). both the amide carbonyl group and the adja-
There are more than 25 ADAMs proteases cent hydroxyl and the hydroxyl and the N-ter-
identified so far. ADAM 17 has been shown to mind amino group (119,120).
be TNF-a converting enzyme (TACE) (114).
Inhibition of TACE slows the production of 6.3 TSA-Derived Cysteine and Serine
TNF-a, a potent cytokine involved in inflam- Peptidase Inhibitors
matory responses to infection. Normally
TNF-a produces a useful response, but in Classical TSA inhibitors of cysteine and serine
some cases, too much TNF-a is released and proteases differ from the metallo- and aspartic
inhibition of TNF-a production would be ther- protease inhibitors in that they mimic the tet-
Ac-Pro-AalHN, P
NHPh
Figure 15.26. Novel TSA used to inhibit metallopeptidases.
rahedral intermediates for enzyme-catalyzed

amide bond hydrolysis only after a reversible
chemical reaction between enzyme and inhib-
itor takes place. Usually this involves the ad-
dition of the enzyme catalytic nucleophile (the
serine protease hydroxyl group or the cysteine
protease thiol group) to an eledrophilicgroup in
the inhibitor to generate ketal-like species (121).
Some of the serine and cysteine TSA moi-
eties are shown in Fig. 15.29. Selective inhibi-
tion between these two classes of protease can
be achieved easily. For example, trifluorom-
ethylketones (61) and peptidyl boronic acids
(62) do not efficiently inhibit cysteine pro-
teases. However, selective inhibition of en-
Figure 15.27. Example of TSAasan TNF-ainhibitor. zymes within each class can be very difficult.
thepsin B as potential anti-metastatic drugs

(124). Cathepsin K was recently discovered
and shown to be involved in osteoporosis and
bone regulation (125).
Inhibitors of cathepsin K illustrate the
principles developed to inhibit this class of en-
zyme. This enzyme sequence was detected in
1994 by sequencing of human DNA for the
human genome project (126).Cathepsin K was
found to be inhibited by leupeptin (63) and by
compound (641, which surprisingly binds
Bestatin "backwards" to the active site (Fig. 15.30). A
(60) hypothesis to develop symmetrical inhibitors
of cathepsin K derived from the superposition
Figure 15.28. Proposed binding mode of bestatin. of both aldehydes on the carbonyl carbon; this
led to the diamino ketone TSA (65). The di-
Many cysteine peptidases are involved in amino ketone moiety seems to work in several
the biosynthesis and degradation of biologi- classes of cysteine proteases (127).
cally important peptides. Most early work was Based on these results, Marquis et al. have
done with papain, a cysteine peptidase iso- recently described the design and synthesis of
lated from the papaya fruit and used in meat conformationally constrained cyclic ketones
tenderizer many years ago. The readily avail- as highly potent and selective cathepsin K in-
able source of this enzyme led to one of the hibitors (66-67) (Fig. 15.31) (128). The labile
very first X-ray crystal structures of any pep- stereogenic group in position a! of the ketone
tidase (122, 1231, despite the fact that no was shown to be important for the binding
cysteine peptidase was then known to be im- mode and pharmacokinetic profile of these
portant in human pathology. Since then, type of inhibitors. The crystal structure of the
cathepsins B, H, L, and S were discovered to be two epimers showed two alternate directions
involved in biosynthetic steps in human im- of binding to the enzyme active site. In both
mune response, inflammation, and other biol- structures, the primed region of the enzyhe
ogies. For example, cathepsin B is clearly in- was occupied by these inhibitors. Further in-
volved in the metastatic process and must act vestigation, resulted in the azepanone deriva-
at some stage to permit transformed tumor tive (68) as a configurationally stable template
cells to migrate to other parts of the body; for for the selective inhibition of this cysteine pro-
20 years, people have sought inhibitors of ca- tease (Ki = 4.8 pM) (129).
R1
gNAB/o~
H
I $ HN y N H%
0 OH 0
Trifluorornethylketone Boronic acid Diaminoketone
(61) (62)
Phosphonic acid or-Ketoamide

Figure 15.29. TSA used to inhibit serine or cysteine peptidases.
Speeding Peptidomimetic Research
AcHN
(.c;!c:"H2 HyHTr
0
Leupeptin
--
'r
0
H
0
-zko
N
-
NHCbz
Figure 15.30. Structure-based design of cathepsin K inhibitors.
Caspases are involved in a variety of cell mation of the typical tetrahedral intermediate
functions, especially in programmed cell death of the isatin type structures, which may com-
(apoptosis). These enzymes recognize tet- promise its selective inhibition of proteases
rapeptide sequences ending in an aspartic acid (133, 134).
recognition point: X-Y-Z-Asp-NHR. Much ef- These reversible caspase inhibitors differ
fort has been expended in trying to obtain se- from inhibitors that form irreversible covalent
lective inhibitors of the 14 different types bonds, the so-called "dead-end" or "suicide"
identified to date. In this context, selective in- inhibitors of these enzymes, For example, the
hibitors of caspase 1 or of caspase 317 have a-acetoxy ketone (72) in Fig. 15.32 is an alky-
recently been reported (130). lating irreversible inhibitor; the enzyme cys-
Peptidomimetic modifications of the tet- teinyl group displaces the a-acetoxy group to
rapeptide sequence have led to the conforma-
form an irreversible covalent bond (135).
tionally constrained compound (69)as a selec-
tive inhibitor of caspas&l or interleukin-lp
converting enzyme (ICE) as potential anti-in-
flammatory compounds (131). Recently, new 7 SPEEDING UP PEPTIDOMIMETIC
non-peptide peptidomimetic diphenyl ether RESEARCH
sulfonamides have been described as novel
lead structures (70) (Fig. 15.32) (132). As mentioned before, combinatorial chemis-
Finally, researchers from SmithKline try, high-throughput screening, and analo-
Glaxo have identified potent and selective in- gous techniques have become powerful tools
hibitors of caspases 3 and 7 that lack the re- to promote drug discovery in peptidomimetic
quired carboxyl group in P, (71) (Fig. 15.32). research. It is not the intention of this chapter
The X-ray co-crystal structure reveals the for- to summarize all these methods, and excellent
Figure 15.31. Cyclic ketones in novel cathepsin K inhibitors.
Figure 15.32. Examples of TSA as caspase inhibitors.

8 Toward Rational Drug Design: Discovery of Novel Non-Peptide Peptidomimetics
Figure 15.33. Somatostatin receptor agonists found through combinatorial chemistry.
reviews are available in the literature (136- Veber et al. (31) This approach yielded five
140). However, one successful approach devel- compounds (73-77) (Fig. 15.33), each being
oped at Merck for the rapid identification of selective for one of the somatostatin receptor
selective agonists of the somatostatin receptor subtypes: sstl (73), sst2 (74), sst3 (75), sst4
through combinatorial chemistry should be (76), and sst5 (77).
highlighted, because it illustrates the evolu-
tion of a constrained peptide into a non-pep-
tide peptidomimetic structure (141). 8 TOWARD RATIONAL DRUG DESIGN:
A series of combinatorial libraries were DISCOVERY OF NOVEL NON-PEPTIDE
constructed on the basis of molecular model- PEPTIDOMIMETICS
ing of known peptide agonists like MK-678
and ocreotide. A chemical collection of 200,000 Current pharmaceutical research has taken
compounds was screened, giving priority to advantage of newer computational methods,
the residues Tyr-Trp-Lys, important pharma- the so-called computer-aided drug design, and
cophores in somatostatin determined first by other physicochemical techniques such as X-
OMe
(79)
Rich et al.
Figure 15.34. Examples of
GRAB peptidomimetics.
ray crystallography and NMR (142). The main peptide inhibitors of renin (78) (Ki = 26
goal in rational drug design is to translate the and (79) (Ki = 31 nM)(Fig. 15.34)(84, 147-
structural information in the native peptide 149) stabilize an enzyme active site conforma-
into low molecular weight non-peptidic mole- tion different than the P-strand binding en-
cules. Over the past years, many 3D structures zyme conformation typical for other peptidase
of biological targets have been solved and have inhibitors. A close analysis of the X-ray crystal
been successfully used to design new, pharma- structure of the enzyme inhibitor complex
cologically useful compounds (vide infra). Dif- shows that the piperidine C4-phenyl group
ferent computer-aided design methods, e.g., binds to the enzyme to replace Tyr75that has
3D pharmacophore model, 3D quantitative rotated to another position. Interestingly,
SAR (QSAR), docking, and de novo design, Leu73also rotates to fill some of the vacated
have been extensively reviewed elsewhere (75, Tyr75pocket, and this in turn allows Trp3' to
143-146). occupy a new site formed in part by the va-
Recently, the importance of generating in- cated Leu73(Fig. 15.35). This cascade of con-
hibitors that target receptor conformational formational transitions in the renin active site
ensembles has been pointed out (10). This allows the optimized inhibitor to stabilize an
method goes beyond the current docking of enzyme conformation not observed when the
known structures to known active site con- classic peptide-derived peptidomimetics bind.
formers and can lead to type 111 and GRAB This stabilization process is defined as group
peptidomimetics. replacement process, and the piperidine inhib-
The concept of Group Replacement As- itors constitute a new type of peptidomimetic:
sisted Binding (GRAB) peptidomimetics de- GRAB peptidomimetics.
rives from the discovery at Roche of the piper- Comparison of (78) and (79) with the struc-
idine class of renin inhibitors. The non- tures of other peptide-derived inhibitors re-
-
9 Historical Development of Important Non-Peptide Peptidomimetics
Figure 15.35. GRAB peptidomimetics in action. See color insert.
veided how the different enzyme active site are highly modified peptidic structures that
C01 formation were found. Bursavich et al. stabilize the enzyme-bound extended p-con-
have successfully extended the initial renin formation (151, 152).
)deling to the design of inhibitors of two Another approach to achieve greater in
ier aspartic peptidases: pepsin and R. chi- vivo activity is to start with a molecular tem-
zsis pepsin (80) (Ki = 2 p&) and (81)(Ki = plate with proven useful pharmacokinetics
1 r-JM) (Fig. 15.34) (150). and oral bioavailability and to selectively mod-
The extended P-strand binding conforma- ify it to achieve the desired activity. Identifica-
tion could be changed into the piperidine bind- tion of the orally active anticoagulant warfa-
in@; conformation by a series of low-energy, rin (84) (Fig. 15.37) as a weak inhibitor (IC,,
me!chanisticallyrelated conformational changes
= 18 p&)of HIV protease was followed by two
in active site side-chains. The discovery of the
reports of 4-hydroxycoumarins as possible
ROIthe inhibitors and the correlation of these
type I11 HIV inhibitors. Subsequent SAR stud-
structures with peptide-derived inhibitors are
an;dogous to a peptidomimetic "Rosetta Stone." ies led to the more potent 5,6-dihydro-4-hy-
This design strategy has the potential for de- droxy-3-pyrone inhibitor (85)(IC,, = 2.7 nM),
signing novel types of peptidomimetic struc- which has good anti-viral activity (EC,, = 0.5
tur'es. CLM) and is orally bioavailable (153). Upjohn
researchers also used a structure-based design
approach based on warfarin to obtain (86),
9 HISTORICAL DEVELOPMENT OF their clinical candidate PNU-140690 (154). It
IMIPORTANT NON-PEPTIDE
should be noted that both inhibitors bind to
PEF'TIDOMIMETICS the extended P-strand binding active site con-
formation.
HIV Protease
Workers at DuPont used a pharmacophore
TYI?e-I HIV-1 protease inhibitors, Saquinavir, model and database search to develop the first
Ritonavir, Indinavir, Amprenavir, Viracept type I11 mimetic inhibitor of HIV protease,
(neflinavir mesilate), and Lopinavir (Fig. DuP 450 (87) (Fig. 15.38). This evolved from a
15.20) are established drugs for the treatment 3D pharnacophore that retained two key inter-
of 1UDS. All these inhibitors employ the cen- actions: replacement of the flap-bound water
tra:I hydroxyl transition state mimetic as a and a hydroxyl transition-state isostere (155).
scaffold on which varying functionality was Molecular modeling led to a cyclohexanone as
systematically added until the required bal- a better spacer between these groups, and fi-
anc:ebetween potency, in uiuo activity and oral nally the seven-membered cyclic urea (87) was
absorption was achieved. In general, the bind- created (Fig. 15.38). The development of these
ing interactions were optimized through inhibitors illustrates the importance of con-
iternative synthesis and co-crystallization of informational analysis in the design of con-
hib:itor with enzyme, molecular modeling, and strained analogs.
re-clesigning the inhibitor side-chains. Phar- Surprisingly, the symmetric cyclic sulfo-
mac:okinetic properties were addressed only nyl-urea derivative analog (88) (Fig. 15.38,
aft€:r the initial inhibitor was identified and Ki = 3 nM) binds differently in the active site
opt:imized. Compounds (82-83) (Fig. 15.36) and adopts a flipped conformation (156).
Figure 16.36. @-StrandHIV protease

inhibitors.
Moreover, SAR of the cyclic urea and cyclic The development of thrombin inhibitors
sulfamide inhibitors do not follow a straight- that lack the functionalized TSA highlights a
forward pattern. These contradictory results major new approach to type I peptidomimet-
clearly illustrate the structural diversity cre- ics. In 1995, a Lilly group found that D-Phe-
ated by a subtle structural modification in two Pro-Agmatine analogs showed increased se-
otherwise related peptidomimetic protease in- lectivity for thrombin over other fibrinolytic
hibitors. enzymes despite a 100-fold loss in potency
The peptidase inhibitors, (82) and (83),are caused by the removal of the aldehyde group
actually amino acid and transition-state mim- (160). These studies paved the way for
ics pieced together to emulate the typical Merck's development of picomolar thrombin
ligand-bound extended p-strand inhibitor con- inhibitors (161, 162), which use a similar mo-
formation. The structurally distinct heterocy- tif. Removal of an a-ketoamide transition
clic aspartic protease inhibitors (85-86) and state mimic from L-370,518 (89) (Fig. 15.39,
(87-88) are non-peptide peptidomimetics be- Ki = 0.09 nM)led to an expected 100-fold drop
cause of their remote structural relationship in potency for (90) (Ki = 5 nM). However, sys-
to native peptide substrates. Yet these two dis- tematic modification of the P, position re-
tinct peptidomimetic classes bind to the same stored potency and led to an inhibitor (91)
active site topography. These structurally dis- with a Ki = 2.5 pM. Interestingly, potency
tinct peptidomimetics selectively stabilize seems to be enhanced by a fortuitous hydro-
closely related enzyme conformations. phobic collapse into a favorable binding con-
formation.
9.2 Thrombin
Thrombin inhibitors (92) and (93) illus-
Thrombin and Factor Xa are both serine pro- trate a novel type 111 peptidomimetic. Most
teases involved in the blood coagulation cas- protease inhibitors bind in an extended
cade. Inhibition of these two enzymes is pro- p-strand conformation that is stabilized by
viding novel anticoagulants (157-159). multiple enzyme ligand hydrogen bonds.
Parke-Davis X-ray structure

(85)
Pharmacia-Upjohn X-ray structure Warfarin

(86) (84)
Figure 15.37. Warfarin analogs as non-peptide HIV protease inhibitors.
However, Boehringer Mannheim developed

thrombin inhibitors (92) (Fig. 15.40) that lack
these H-bonds (163). This idea was exploited
by researchers at 3D Pharmaceuticals, who
were able to crystallize (93) in the active site
(164, 165). In this example, the benzene ring
acts as a scaffold to display the three different
substituents to fill the three principal binding
pockets.
Other type I11 peptidomimetic inhibitors of
thrombin have been developed from screening
leads (166, 167) such as inhibitor (94) (Fig.
15.40). SAR led to the design of (95).Inhibitor
(96) was derived from docking studies with
the 5-amidino indole nucleus, followed by ad-
dition of a lipophilic side-chain to interact with
the important S, subsite of thrombin. The
crystal structures of both (95) and (96) in the
Figure 15.38. Cyclic ureas as non-peptide HIV active site of thrombin shows that the aro-
protease inhibitors. matic core, binds in the S, site as expected, but
One of the major drawbacks in thrombin

inhibitor design was the requirement for a ba-
sic side-chain in P, needed to form a salt
bridge to enzyme Asp189. However, the other
amino acid side-chains in S, are largely li-
pophilic and neutral. This feature suggested
that the strongly basic group in P, could be
replaced by a weaker base or even with hydro-
gen-bonding groups. Compounds (99-100)
are representative of this strategy (Fig. 15.40)
(171).An X-ray crystal structure of (99) shows
a new binding mode in which the formamide
group points out of the S, pocket and forms
new hydrogen bonds with Gly219 (172). The
ability to obtain crystal structures of throm-
bin inhibitors complexes for many of the in-
hibitors shown in Figs. 15.40-15.41 estab-
lishes that most are type 111 peptidomimetics.
9.3 Factor Xa
New approaches to design inhibitors of Factor
Xa as potential anticoagulants have been re-
viewed (173),and important type I11 mimetics
have been described (Fig. 15.42). All inhibitors
contain amidine or basic groups that bind in
the enzyme's S, site; none of the inhibitors
contains a classical electrophilic center of the
type employed in TSA inhibitors (174-180).
Compound (101)(Fig. 15.42) was designed
from a strategy involving connection of a
three-point pharmacophore derived from mo-
Figure 15.39. Non-TSA thrombin inhibitors. lecular modeling. Beginning with the X-ray
structure of the Factor Xa dimer, Gong et al.
does not pick up hydrogen bonding from the (176) envisioned three important enzymatic
important active site sequence Ser214- contact points: a phenylamidine in the S, sub-
Gly216 (168). Both crystal structures showed site, a phenylamidine in the S, site, and a car-
a similar binding mode; where interaction of boxylate moiety postulated by a group at Daii-
the C-2 side-chain with Trp6OD might explain chi to confer selectivity over thrombin
the high thrombin selectivity observed for this through an interaction with Gln192 of Factor
series (169). Xa. Systematic iterative modifications led to
Another type I11 peptidomimetic inhibitor the potent inhibitor (101) (Ki = 9 nM), which
was derived from the crystal structure of a has 350-fold selectivity over thrombin. This
bicyclic [3.1.31 inhibitor (170) complexed to approach highlights a truly de novo method
thrombin (97) (Fig. 15.41). The X-ray struc- where fragments were docked into the active
ture revealed that one of the carbonyls was site and an appropriate spacer was chosen to
oriented towards the hydrophobic P-pocket connect them. Further SAR data led to modi-
(S,). The desolvation necessary to place a car- fications that improved both potency and se-
bony1 in a hydrophobic pocket is unfavorable lectivity (176).
and various alkyl groups were used as possible
9.4 Clycoprotein ilb/llla (GP llb/llla)
replacements. This led to the potent (Ki = 13
nM) and selective (>760 for thrombin over Some outstanding examples of the use of con-
trypsin) inhibitor (98). formational restriction to characterize the
Figure 15.40. Non-H-bonding-based thrombin inhibitors.
bioactive conformations of Arg-Gly-Asp pep- tives of the RGD sequence, which were de-
tidometic antagonists illustrate the present signed by analogy with the somatostatin work
state-of-the-art. Members of the integrin fam- (vide supra). Excellent antagonists related to
ily of receptors recognize and bind the peptide (102) were obtained. Further constraint of the
sequence, Arg-Gly-Asp, as an important step peptide system by use of the o-thiol benzene
in platelet aggregation and other physiological derivatives led to the novel antagonist SKF
processes (181), and competitive antagonists 107260 (103) (Fig. 15.431, a good inhibitor of
for this process could serve as potential drug both platelet aggregation and binding to
candidates. Much effort has been directed to- GPIIbDIIa. Barker et al. (186) followed a sim-
ward identifying small ligands that might ilar strategy but used cyclic sulfides as an ad-
mimic the RGD peptide sequence (182). This ditional conformationally restricting element.
drug design concept was supported by the fact These derivatives had the advantage of being
that protein antagonists of integrin receptors rapidly synthesized by solid phase methods.
are known that contain the RGD sequence Systematic structure-activity studies with re-
(183) and that small peptide sequences con- spect to the amino acid preceding the RGD
taining the RGD moiety weakly antagonize sequence and the chirality of sulfoxide deriva-
the endogenous ligand (184). Consequently, tives led to the discovery of (3-4120 (1041, a
several groups synthesized conformationally potent, biologically active derivative.
restricted derivatives of small peptides as The conformation of both (103)and (104)
starting points for developing metabolically in water was found to be highly constrained,
stable peptides or peptidemimetics. Ali et al. and a single predominant conformation could
(185) svnthesized a series of disulfide deriva- be characterized in aqueous solution by use of
/
TMS
Figure 15.41. Optimized PI and P, thrombin inhibitors.
NMR methods and computational chemistry tematically modifying a natural receptor-

(185, 187). This bioactive conformation de- binding peptide (187-189).
fined the topographical placement of the argi- A variety of other scaffolds have been devel-
nine guanidine group and the aspartic car- oped by exploiting the idea that glycine repre-
boxyl group, and was superimposed onto a sents a spacer between the two important
conformationally restricted template of a class recognition residues Arg and Asp. This tem-
of compounds with generally suitable pharma- plate-based approach positions the key side
codynamic properties. In this case, the benzo- functionality, a basic function and an acid one
diazepine ring system was used, and the strat- within a distance of 11-17 A, required for pre-
egy generated the low molecular non-peptide sentation to the receptor. Several examples of
RGD receptor antagonist (105-107) (Fig. these scaffoldsare shown in Fig. 15.45 (190-195).
15.441, which contain at least two conforma- Recent results suggest that the RGD trip-
tional restrictions, the bicyclic heterocycle and eptide can adopt multiple conformations that
the acetylene linker. The compounds shown in allow tight binding to the receptor. This theory
Fig. 15.44 represent what can be achieved by is supported by the fact that nonpeptide RGD
applying the principles of conformational re- peptidomimetics can adopt a range of different
striction to peptides when no X-ray or NMR topographies such as found in cupped, turn-ex-
structural information are available for the tended-turn, or p-turn conformations (196).
complex between ligand and receptor. Benzo- RGD type I peptidomimetics are usually
diazepines (105-107) represent the first type poorly bioavailable compounds because of the
I11 peptidomimetics designed de novo by sys- presence of multiple hydrogen-bonding sites
Figure 15.42. Examples of FXa protease inhibitors.
plus the charged polar functional groups at CAAX motif (where C is a cysteine residue, A
both ends. Esters or coumarin (197) linkers is any aliphatic amino acid, and X is usually
have been used to provide orally available pro- Met, Ser, or Ala). This tetrapeptide is the sig-
drugs, and bioisosteric replacements of the nal for farnesylation of Ras proteins. Ras-far-
guanidiniurn group by a pyridine, (198) tetra- nesyltransferase is one of the most promising
hydronaphtyridine, (199), or aminobenzimi- targets for novel anti-cancer drugs, because at
dazole (200) moieties provided more bioavail- least 30% of the human cancers contain mu-
able analogs. tated Ras (201,202).
Two types of peptidomimetic structures
9.5 Ras-Farnesyltransferase
have been used to develop - inhibitors (203).
Inhibitors of Ras-farnesyltransferase have Some typical type I inhibitors were generated
been developed by mimicking the C-terminal by replacing the amide backbone with differ-
Figure 16.43. Conformationally restricted RGD cyclic peptides.
ent isosteres like the oxymethylene amide

bond in (108)(Fig. 15.46, IC,, = 60 nM) (204).
The central dipeptide segment of CA,A,X has
been replaced with rigid linkers like the 3-arni-
nomethylbenzoic acid (AMBA) in (109) (205).
This novel inhibitor was not farnesylated,
showing that the two amino acids in the mid-
dle of the CAAX tetrapeptide are required for
farnesylation. An imidazole group has been
used to replace the thiol group of the CAM
motif to produce compound (110)(206).
An outstanding example in peptidomimetic
design evolved from these studies. Truncation
and conformational restriction of a reduced
isostere of the parent peptide substrate, fol-
lowed by systematic replacement of the pep-
tide-like side-chains provided the potent non-
'Bu peptidic inhibitor (111) (Fig. 15.46) (207).
This approach highlights the transition from a
peptide-derived structure to a compound with
no apparent resemblance to the original pep-
tide.
Recently, crystal structures of farnesyl-
transferase complexed with a farnesyl group
donor and the native substrate or a type I pep-
tidomimetic show the structural basis for in-
hibition of this enzyme. The X-ray data show
(107) that the CAAX motif adopts an extended con-
formation rather than a p-turn, which is the
Figure 15.44. Benzodiazepines RGD analogs. conformation observed by transferred nuclear
Figure 15.45. Different scaffolds used in RGD mimetics.
Overhauser effect experiments; coordination (Fig. 15.47) (211). Subsequent SAR work led
of the cysteine side-chain to the Zn ion pro- to the potent inhibitor SCH 66701 (115)(Ki =
motes the conformational change in the pep- 1.7 a),which was crystallized within the en-
tide backbone. Moreover, differences in the zyme active site (212). This series of com-
conformation binding mode of peptides and pounds is completely non-peptidic and also
peptidomimetics is one of the bases for selec- lacks the free sulfhydryl or imidazole seen in
tive farnesylation (208). the other inhibitors discussed here. This is a
Other type 111 peptidomimetic inhibitors of breakthrough that shows that potency can be
this enzyme have also been reported. Inhibitor achieved even without the "essential" cysteine
(112) (Fig. 15.47) was developed by replacing or sulfhydryl mimic.
the A,& dipeptidyl sequence with a benzodi-
azepine scaffold (209). Later, SAR modifica- 9.6 Non-Peptidic Ligands for Peptide
tions of the benzodiazepine nucleus that in- Receptors
cluded a hydrophilic 7-cyano group and a This section illustrates the successful develop-
4-sulfonyl group provided the potent, orally ment of non-peptide peptidomimetics from a
available and in uiuo active (113)(210). screening lead by assuming the inhibitor
HTS also produced several non-peptide binds to the receptor in the same way as does
leads typified by inhibitor SCH 47307 (114) the native peptide hormone. These assump-
SMe
(109) (111)
Figure 15.46. Peptide-like Ras-farnesyltransferaseinhibitors.
tions actually led to effective inhibitors of the weak (IC,, = 43 pJ4) but quite selective A-I1
receptor. Later, site-directed mutagenesis of receptor antagonist (214). Using this as a lead
target receptors suggested that for many of compound, DuPont and SmithKline Glaxo re-
these compounds, the mimetic was binding to searchers independently developed potent
the receptor at ancillary, perhaps overlapping, small molecule A-I1 receptor antagonists. The
sites on the receptor. Later still, pharmacolog- DuPont group used the conformation sug-,
ical studies indicated that peptide receptors gested by Smeby and Fernandjian to guide the
adopted multiple states, suggesting that dif- design (215). It was speculated that the car-
ferent antagonists might bind to different re- boxyl group and the imidazol group of (116)
ceptor forms. Of course, if compounds do not were bound to the A-I1 terminal carboxyl
bind to the same receptor site as the endoge- group and to the imidazole group, respec-
nous hormone, SAR data collected on the nat- tively. This rationalization culminated in the
ural peptide substrate is not applicable to synthesis of nanomolar inhibitors, with com-
these antagonists. Most of these peptidomi- pound (117) as a clear representative (216).
metics are probably type I1 or functional mi- Although workers at SmithKline Glaxo
metic~.Yet the success of this approach sug- used the same conformation as starting point,
gests that at least for some non-peptide they postulated other binding modes to the
antagonists, there may be some congruent receptor. One of their alternative hypothesis
structure that interacts with the receptor. considered compound (116) as a constrained
These issues will only be determined unam- analog in which the benzyl and the carboxyl
biguously when high-resolution structures of groups corresponded to the Tyr side-chain and
the G-protein-coupled receptors (213) and the C-terminal carboxyl group of A-11. Follow-
other constitutive receptor systems are deter- ing this hypothesis, modification of lead com-
mined. pound (116)eventually led to compound (118)
(Fig. 15.48) with an IC,, = 1.45 nM and oral
9.6.1 Angiotensin II. The first non-peptide activity of 30% (217).
antagonists of the AT1 receptor were found by Site-directed mutagenesis studies on the
HTS. The imidazole (116) (Fig. 15.48) is a AT1 receptor revealed differences in the bind-
Figure 15.47. Non-peptidic Ras-farnesyltransferase inhibitors.
ing site of angiotensin and the small molecule 222). Antagonists of tachykinin receptors
non-peptide compounds (119-120) (Fig. 15.49). ~roducebeneficial effects in several CNS dis-
There is no evidence that the single residues ease states such as pain, asthma, emesis, and
involved in inhibitor binding overlap with en- depression.
dogenous peptide binding. A general approach for converting a variety
Some other non-peptide agonists have also of peptide structures into small, type I1 pep-
appeared in the literature. Surprisingly, their tidomimetic antagonists was devised by Hor-
binding mode differs from the binding mode of well and colleagues and is illustrated here for
the peptide agonist (121), as well as that of the antagonists to Substance P. An alanine scan of
structurally similar non-peptide antagonist the parent undecapeptide revealed that the
(122) (Fig. 15.49) (218). However, angiotensin Phe4-Phes sequence was required for binding.
and L-162,313 (122) require common critical Replacement of one these residues by Trp, fol-
residues for angiotensin AT1 receptor activa- lowed by introduction of conformational con-
tion (219). straints by a-alkylation, provided the sub-
nanomolar inhibitor (123) (Fig. 15.50) (223).
9.6.2 Substance P. The tachykinin recep- Improved brain penetration was achieved by
tors (NK-1, NK-2, and NK-3) and their endog- amine (124) (224).
enous ligands, the tachykinins, and neuroki- Chemical screening of corporate compound
nins are important neurotransmitters (220- libraries resulted in the discovery of another
Asp-Arg-Val-Tyr-He-His-Pro-Phe-OH
Angiotensin II
Figure 15.48. Angiotensin I1 inhibitors derived from a HTS lead.
type of non-peptidic NK-1 antagonist, CP- 9.6.3 Neuropeptide Y. Neuropeptide Y

96,345 (125) (Ki = 0.66 nM, Fig. 15.51). This (NPY) is a 36 amino acid polypeptide that is
compound heralded a breakthrough in the de- involved in hormonal, sexual, and cardiac ef-
sign of these potential drugs (225, 226). Re- fects (232, 233). In 1994, two "first genera-
placement of the basic quinuclidine ring with tion" type I NY-l selective antagonists, BIBP
a morpholine core improves duration of action 3226 (129) (234) and SR120107A (130)(235)
and insertion of an amino triazole unit confers were reported (Fig. 15.53). BIBP 3236 (129)
excellent solubility and CNS penetration corresponds to a truncated and modified pep-
(126) (Ki = 0.19 nM, Fig. 15.51) (227). tide in which the D-Arg is assumed to corre-
Dual NK-1/NK-2 inhibitors, e.g., (127) spond with in Neuropeptide Y.
(Fig. 15.52), have recently been designed by More recently, a series of indole Y1 antag-
determining the important sites for maintain- onists discovered by screening (236) led to the
ing NK-2 selectivity of the lead compound SR- benzimidazole (131) (Ki = 0.052 nM). In this
48968 and introducing NK-1 pharmacophore type of compound, the diamino moieties are
groups (228-230). postulated to mimic the two C-terminal argi-
Fewer NK-3 selective receptor antagonists nines of NPY (237).
have been described, but a quinoline scaffold Selective NPY-Y5 inhibitors have been shown
previously reported to be a selective NK-3 re- to inhibit food intake activity in vivo. Most inhibi-
ceptor antagonist, has been converted to a tors found by HTS and lead optimization gave
dual NK-2/NK-3 inhibitor (128) (Ki = 0.8 nM nanomolar and selective antagonists. It is not
NK-2 and 0.8 nM NK-3). The lead optimiza- known whether these are functional or topological
tion was carried out by docking potential mimetics (Fig. 15.54) (238-240).
structures into a novel receptor model. The
theoretical model compares closely with the 9.6.4 Growth Hormone Secretagogues.
recently published crystal structure of rho- Growth hormone (GH) releasing peptide mi-
dopsin (231). metics have become attractive alternatives to
Figure 15.49. Examples of angiotensin I1 inhibitors.
GH replacement therapy (241). The peptidyl identifying the important residues for bioac-
GH secretagogue GHRP-6 (242) was used to tivity in GHPR-6, the Merck group began
develop the clinical candidate MK-0677 (132) searching other receptor libraries for known
(Fig. 15.55, EC,, = 1.3 nM) (243, 244). After "privileged structures" in a combinatorial
synthetic fashion (see Section 4) (66). The
Arg-Pro-Lys-Pro-Gln-Gln-Phe-Phe-Gly-Leu-Met-NH2 more active derivative contained a spiropiperi-
Substance P dine moiety attached to an indoline ring.
More recently, ghrelin has been isolated
and identified as an endogenous ligand of the
GHS receptor and some new peptidomimetic
structures [e.g., 133 (Fig. 15.5511have started
to appear (245).
In another approach, SAR studies and sys-
tematic simplification of GHPR-6 at Novo
Nordisk produced the orally bioavailable de-
rivative NN-703 (134). Molecular modeling
overlapping of NN703 (134) (Fig. 15.56) and
MK-0677 (132) (Fig. 15.55) showed structural
similarities between both compounds. Highly
potent hybrids of Ipamorelin and NN-703
Figure 15.50. Development of a substance P (e.g., 135) (Fig. 15.56) have also been de-
inhibitor. scribed (246, 247).
Figure 15.52. Dual NK-1/NK-2 and NK-1/NK-2

inhibitors.
Figure 15.51. Examples of NK-1 antagonists.

ET-1 (251,252). Knowing that the carboxylic
acid was also necessary for good activity, re-
A common 3D pharmacophore was recently searchers at SmithKline overlaid their inhibi-
described for peptidic and non-peptidic GH tor with the aromatic groups Tyr13, Phe14,and
secretagogues by means of computational Asp1' in ET-1. After using a conformationally
chemistry. After QSAR analysis, four pharma- constrained analog of ET-1 to further define
cophoric sites were found: two aromatic rings, their NMR-derived structure of ET-1, the final
a proton acceptor, and a protonated amine. overlay suggested that a carboxylic acid at-
Using these strategies, some nanomolar an- tached with a linker of two to three atoms on
tagonists [e.g., 136 (Fig. 15.56)] were discov- the 2-position of the phenyl ring would pro-
ered (248). vide further binding interaction by mimicking
the C-terminal carboxylic acid. This led to
9.6.5 Endothelin. The first report of endo- compound (137) (Fig. 15.571, a potent antago-
thelin in 1988 stimulated a huge effort to de- nist of both the ETAand ET, receptors with
velop selective and non-selective endothelin Ki= 0.43 and 15.7 nM,respectively. Analogs
receptor (ETA and ETB) antagonists (249, based on a pyrrolidine scaffold are also effec-
250). One successful approach derived from tive (e.g., 138) (Fig. 15.57) (253).
the postulate that the phenyl groups of the The Kohonen neural network has been
screening lead might mimic two of the aro- used to develop bioisosteres of the methylen-
matic side-chains (Tyr13, Phe14, or Trp21) of dioxyphenyl group found in a variety of antag-
9 Historical Development of Important Non-Peptide Peptidomimetics 673
Tyr-Pro-Ser-Lys-Pro-Asp-Asn-Pro-Gly-Glu-Asp-Ala-Pro-Ala-Glu-Asp-Leu-Ala
Arg-Tyr-Tyr-Ser-Ala-Leu-Arg-His-Tyr-lle-A~n-Leu-lle-Thr-Arg-Gln-Arg-Tyr-NH~
Neuropeptide Y
H N NH2 ~
,NH
Figure 15.53. Examples of neuropeptide Y1 inhibitors.
onists [e.g., 139 (Fig. 15.58)l. The benzothia- Since the discovery of Ro46-2005 (141)
diazole (140) functions as a bioisostere that (Fig. 15.58), the first orally active ET inhibi-
retains and sometimes improves binding to tor, major efforts have been made to modify
the ETAreceptor (254-256). arylsulfonamide derivatives. An isoxazole as
Figure 15.54. Examples of neuropeptide Y5 inhibitors.
the heterocycle attached to the amino fundion- modeled. Thus, they must be classified as type
ality provided selectivity against ETA receptor I1 peptidomimetics until structural data can
(257)and led to BMS193884 (142)(Ki= 1.4 nM) resolve the issue.
(258)and others, e.g., TBC 3214 (143) (Ki = 0.04
nM) (2591, which are potent, selective, and
orally available ETAantagonists. 10 SUMMARY AND FUTURE
Different binding modes have been pro- DIRECTIONS
posed for ET antagonists. The acid or sulfon-
amido groups are needed to interact with a The "Holy Grail" of peptidomimetic research
cationic site in the receptor, and an aromatic in drug discovery has been to find ways to
interaction with Tyr12' is postulated to be re- transform the structural information con-
sponsible for ETA selectivity. However, be- tained in peptides into non-peptide structures
cause all these receptors are members of the that have drug-like pharmacodynamic proper-
GPCR, there is no assurance that any bind as ties. Many different strategies have been
His-D-Trp-Ala-Trp-D-Phe-Lys-NH2
GHRPB Ghrelin
8
\ / \ TFA
S02CH3
Figure 15.55. GHRP-6 and ghrelin non-peptide derivatives as growth hormone secretagogues
inhibitors.
Figure 16.56. Newer approaches to growth hormone secretagogues inhibitors.
Cys-Ser-Cys-Ser-Ser-Leu-Met
\ I
Figure 15.57. Non-peptide endothelin analogs.

N-S
OMe OMe
PD156707 EMD 122946
(139) (140)
I 7
BMS 193884
(142)
Figure 15.58. Examples of ETAinhibitors.
employed in the search for useful peptidomi- the progress made to date suggests that this
metics-rational design of amide bond re- goal will be achieved. We know that some non-
placements, mimics of turn structures, and peptide scaffolds are topographical mimetics
the like, as well as both designed and discov- of the extended P-strand of enzyme-bound
ered scaffolds that replace the amide bond protease inhibitors because we have the bio-
core of peptides. The field has a long way to go physical methods for characterizing both
before rational design of type I11 peptidomi- types of enzyme-inhibitor complexes. Type I11
metics can be achieved routinely. However, peptidomimetic inhibitors of peptidases have
References
been designed from the substrate sequences about protein-protein interactions is still
and they have been revealed by HTS processes quite limited, the rapid growth of structural
and optimized by application of structural bi- information and methods will eventually al-
ology. At this point, we have learned more low us to design rationally peptidomimetic
about the design of inhibitors by studying how compounds suitable for use in human therapy.
screening leads inhibit enzymes than from the
design of inhibitors from our current, limited
knowledge of enzyme catalysis. Probably the REFERENCES
most important recent discovery is that some 1. M. D. Fletcher and M . M . Campbell, Chem.
screening leads inhibit proteases by binding to Rev., 98, 763 (1998).
a different enzyme active site conformation 2. F. Haviv, T. D. Fitzpatrick, C. J. Nichols, E. N.
that is related mechanistically to the well- Bush, G. Diaz, G. Bammert, A. T . Nguyen, E. S.
characterized extended P-strand of enzyme- Johnson, J. Knittle, and J. Greer, J. Med.
bound protease inhibitors. This result empha- Chem., 37, 701 (1994).
sizes the importance of considering the entire 3. J. Hughes, T. W . Smith, H. W . Kosterlitz, L. A.
ensemble of protein conformations when de- Fothergill, B. A. Morgan, and H. R. Morris, Na-
ture, 258,577 (1975).
signing inhibitors of peptide-protein interac-
tions. 4. G. D. Smith and J. F. Griffin, Science, 199,
Our understanding of peptide mimicry for 1214 (1978).
ligands of constitutive receptors, such as G- 5. A. Aubry, N. Birlirakis, M . Sakarellos-Daitsi-
protein-coupled receptors (GPCR), is much otis, C. Sakarellos, and M. Marraud, Biopoly-
more primitive because high resolution struc- mers, 28,27 (1989).
tural data for agonist- and/or antagonist-re- 6. A. F. Bradbury, D. G. Smyth, and C. R. Snell,
ceptor complexes are not yet available. For Nature, 260, 165 (1976).
this reason, all attempts to rationalize the in- 7. P. S. Farmer in E. J. Ariens, Ed., Drug Design,
teractions between ligand and receptor con- Academic Press, New York, 1980.
tain a considerable element of speculation. It 8. A. B. Smith 111, T . P. Keenan, R. C. Holcomb,
is too early to know whether small non-pep- P. A. Sprengeler, M. C. Guzman, J. L. Wood,
tide structures that bind to GPCR are func- P. J. Carroll, and R. Hirschmann, J. Am.
Chem. Soc., 114,10672 (1992).
tional or topographical mimetics. However,
based on the results obtained by studying pep- 9. A. S. Ripka and D. H. Rich, Curr. Opin. Chem.
tidase inhibitors, it seem likely that at least Biol., 2,441 (1998).
some of the known functional peptidomimet- 10. M. G. Bursavich and D. H. Rich, J. Med. Chem.,
ics receptors ligands will be shown to be topo- 45, 541 (2002).
graphical mimetics. Others may be found to 11. A. F. Spatola in B. Weistein, Ed., Chem. Bio-
act more like GRAB-peptidomimetics in that chem. Amino Acids, Pept., Proteins, Marcel
Dekker, New York, 1983.
they bind to receptor conformations closely re-
lated in energy and mechanism to native con- 12. U. Gether, Endocr. Rev., 21,90 (2000).
formations. Still others will no doubt be found 13. D. P. Fairlie, G. Abbenante, and D. R. March,
that inhibit or stimulate the receptor system Curr. Med. Chem., 2,654 (1995).
by allosteric mechanisms or by interfering 14. R. A.Wiley and D. H. Rich, Med. Res. Rev., 13,
with some multi-step binding process preced- 327 (1993).
ing the formation of the active ligand-receptor 15. J. D. A. Tyndall and D. P. Fairlie, Curr. Med.
complex. In any case, it is clear that successful Chem., 8,893 (2001).
design of functional mimetics by assuming 16. N. R. A. Beeley, Drug Discov. Today., 5, 354
some structural relationship between a (2000).
screening lead and the parent peptide can 17. R. M. Freidinger, Trends Pharmacol. Sci., 10,
work (see Section 9.6),as can the systematic 270 (1989).
modification of the parent peptide. The appli- 18. R. M. Freidinger, Curr. Opin. Chem. Biol., 3,
cation of the principles of peptidomimetic re- 395 (1999).
search has become very important to drug dis- 19. A. Giannis and T . Kolter, Angew. Chem. Intl.
covery. Although our present knowledge Ed. Engl., 32, 1244 (1993).
20. G. J. Moore, Trends Pharmacol. Sci., 15, 124 43. G. R. Marshall, C. D. Barry, H. E. Bosshard,
(1994). R. A. Dammkoehler, and D. A. Dunn, ACS
21. D. C. Rees, Cum. Med. Chem., 1,145 (1994). Symp. Ser., 112,205 (1979).
22. E. E. Sugg, Annu. Rep. Med. Chem., 32, 277 44. R. M. J. Liskamp, Recl. Trav. Chim. Pays-Bas.,
(1997). 113, l(1994).
45. G. Holzemann, Kontakte (Darmstadt), 1, 3
23. T . K. Sawyer, Drugs Pharm. Sci., 101, 81
(1991).
(2000).
46. G. Holzemann, Kontakte (Darmstadt), 2, 55
24. B. A. Morgan and J. A. Gainor, Annu. Rep.
(1991).
Med. Chem., 24,243 (1989).
47. M. Kahn, Synlett, 821-826 (1993).
25. G. J. Moore, Proc. West. Pharmacol. Soc., 40,
115 (1997). 48. R. M. Freidinger, D. F. Veber, D. S. Perlow,
J . R. Brooks, and R. Saperstein, Science, 210,
26. A. Giannis and F. Rubsam, Adv. Drug Res., 29, 656 (1980).
1 (1997).
49. E. D. Thorsett, E. E. Harris, S. D. Aster, E. R.
27. M. Goodman and S. Ro in M. E. Wolff, Ed., Peterson, J. P. Snyder, J. P. Springer, J. Hir-
Burger's Medicinal Chemistry and Drug Dis- shfield, E. W . Tristram, A. A. Patchett, E. H.
covery, vol. 1, Wiley-Interscience, San Diego, Ulm, and T . C. Vassil, J. Med. Chem., 29,251
CA, 1995, pp. 803-861. (1986).
28. D. Obrecht, M. Altorfer, and J. A. Robinson, 50. G. A. Flynn, E. L. Giroux, and R. C. Dage,
Adv. Med. Chem., 4, 1 (1999). J.Am. Chem. Soc., 109,7914 (1987).
29. J. Gante, Angew. Chem. Zntl. Ed. Engl., 33, 51. U. Nagai, K. Sato, R. Nakamura, a n d R. Kato,
1699 (1994). Tetrahedron, 49,3577 (1993).
30. P. A. Hart and D. H. Rich, Pract. Med. Chem., 52. S. Hanessian, G. McNaughton Smith, H. G.
393-412 (1996). Lombart, and W . D. Lubell, Tetrahedron, 53,
31. D. F. Veber, Pept.: Chem. Biol., in J. A. R. 12789 (1997).
Smith and E. Jean, Eds., Proc. Am. Pept. 53. A. J. Souers and J. A. Ellman, Tetrahedron, 57,
Symp., 12th, ESCOM, Leiden, (1992). 7431 (2001).
32. G. R. Marshall, Tetrahedron, 49,3547 (1993). 54. K. Burgess, Acc. Chem. Res., 34,826 (2001).
33. J. W . Erickson and S. W . Fesik, Annu. Rep. 55. A. B. Smith 111, M. C. Guzman, P. A. Spren-
Med. Chem., 27,271 (1992). geler, T . P. Keenan, R. C. Holcomb, J. L.Wood,
34. G. Muller, Curr. Med. Chem., 7,861 (2000). P. J. Carroll, and R. Hirschmann, J. Ani.
Chem. Soc., 116,9947 (1994).
35. D. F. Mierke and C. Giragossian, Med. Res.
Rev., 21,450 (2001). 56. K. D. Stigers, M. J. Soth, and J. S. Nowick,
Curr. Opin. Chem. Biol., 3, 714 (1999).
36. D. F. Veber, F. W . Holly, R. F. Nutt, S. J. Berg-
strand, S. F. Brady, R. Hirschmann, M. S. 57. J. S. Nowick, E. M. Smith, and M. Pairish,
Glitzer, and R. Saperstein, Nature, 280, 512 Chem. Soc. Rev., 25,401 (1996).
(1979). 58. R. P. Cheng, S. H. Gellman, and W . F. De-
37. J. Rivier, M. Brown, and W . Vale, Biochem. Grado, Chem. Rev., 101, 3219 (2001).
Biophys. Res. Commun., 65, 746 (1975). 59. J. Venkatraman, S. C. Shankaramma, and P.
38. G. D. Rose, L. M. Gierasch, and J. A. Smith, Balaram, Chem. Rev., 101,3131 (2001).
Adv. Protein Chem., 37, 1 (1985). 60. D. P. Fairlie, M. L.West, and A. K. Wong, Curr.
Med. Chem., 5,29 (1998).
39. P. W . Schiller, The Peptides: Analysis, Synthe-
sis and Biology, Vol. 6, Academic Press, Or- 61. V . J. Hruby and P. M. Balse, Curr. Med. Chem.,
lando, FL, 1984. 7,945 (2000).
40. T . K. Sawyer, V . J. Hruby, P. S. Darman, and 62. V . J. Hruby, Acc. Chem. Res., 34,389 (2001).
M. E. Hadley, Proc. Natl. Acad. Sci. USA, 79, 63. H. Nakanishi and M. Kahn, Bioorg. Chem.
1751 (1982). Pept. & Protein, 12,395 (1998).
41. K. Ishikawa, T . Fukami, T . Nagase, K. Fujita, 64. R. Hirschmann, P. A. Sprengeler, T . Kawasaki,
T . Hayama, K. Niiyama, T . Mase, M. Ihara, J. W . Leahy, W . C. Shakespeare, and A. B.
and M. Yano, J. Med. Chem., 35,2139 (1992). Smith 111, Tetrahedron, 49,3665 (1993).
42. G. R. Marshall, F. A. Gorin, and M. L. Moore, 65. Y . Qian, A. Vogt, S. M. Sebti, and A. D. Hamil-
Annu. Rep. Med. Chem., 13,227 (1978). ton, J. Med. Chem., 39,217 (1996).
References
66. B. E. Evans, K. E. Rittle, M. G. Bock, R. M. 85. D. Banner, J. Ackermann, A. Gast, K. Guber-

DiPardo, R. M. Freidinger, W. L. Whitter, G. F. nator, P. Hadvary, K. Hilpert, L. Labler, K.
Lundell, D. F. Veber, P. S. Anderson, R. S. L. Mueller, G. Schmid, T. Tschopp, H. van de
Chang, V. J. Lotti, D. J. Cerino, T. B. Chen, Waterbeemd, and B. Wirz, Perspect. Med.
P. J. Kling, K. A. Kunkel, J. P. Springer, and J. Chem., 27-43 (1993).
Hirshfield, J. Med. Chem., 31,2235 (1988).
86. J. Rahuel, V. Rasetti, J. Maibaum, H. Rueger,
67. B. E. Evans, K. E. Rittle, M. G. Bock, R. M. R. Goschke, N. C. Cohen, S. Stutz, F. Cumin,
DiPardo, R. M. Freidinger, W. L. Whitter, N. P. W. Fuhrer, J . M. Wood, and M. G. Grutter,
Gould, G. F. Lundell, C. F. Homnick, and D. F. Chem. Biol., 7,493 (2000).
Veber, J. Med. Chem., 30, 1229 (1987).
87. M. J. Parry, A. B. Russell, and M. Szelke in J.
68. T. Clackson and J. A. Wells, Science, 267,383
Meienhofer, Ed., Chem. Biol. Pept., Proc. Am.
(1995).
Pept. Symp., 3rd, Ann Arbor Science, Ann Ar-
69. H. C. Kawato, K. Nakayama, H. Inagaki, and bor, MI, 1972, p. 541.
T. Ohta, Org. Lett., 3,3451 (2001).
88. M. Szelke, D. M. Jones, B. Atrash, A. Hallett,
70. K. Nakayama, H. C. Kawato, H. Inagaki, and
and B. J. Leckie in V. J. Hruby and D. H. Rich,
T. Ohta, Org. Lett., 3,3447 (2001).
Eds., Pept.: Struct. Funct., Proc. Am. Pept.
71. M. D. Fletcher and M. M. Campbell, Chem. Symp., 8th, Pierce Chemical Co., Rockford,
Rev., 98, 763 (1998). 1983, p. 579.
72. M. Chorev and M. Goodman, Acc. Chem. Res., 89. D. H. Rich and E. T. 0. Sun, Biochem. Phar-
26,266 (1993). macol., 29, 2205 (1980).
73. R. M. Williams, Biomed. Appl. Biotechnol., 1, 90. E. M. Gordon, J. D. Godfrey, J. Pluscec, D. Von
187 (1993). Langen, and S. Natarajan, Biochem. Biophys.
74. C. A. Lipinski, J.Pharmacol. Toxicol. Methods, Res. Commun., 126,419 (1985).
44,235 (2001). 91. D. H. Rich, J . Med. Chem., 28,263 (1985).
75. G. Klebe, J. Mol. Med., 78,269 (2000). 92. D. H. Rich, J. Green, M. V. Toth, G. R. Mar-
76. H . J . Bohm and G. Klebe, Angew. Chem. Intl. shall, and S. B. H. Kent, J. Med. Chem., 33,
Ed. Engl., 35,2589 (1996). 1285 (1990).
77. L. Pauling, Chem. Eng. News, 24,1375 (1946). 93. E. J. Lien, H. Gao, and L. L. Lien, Prog. Drug
78. R. Wolfenden, Annu. Rev. Biophys. Bioeng., 5, Res., 43,43 (1994).
271 (1976). 94. F. Lebon and M. Ledecq, Curr. Med. Chem., 7,
79. R. Wolfenden, Acc. Chem. Res., 5, 10 (1972). 455 (2000).
80. M. A. Ondetti, B. Rubin, and D: W. Cushman, 95. E. D. Thorsett and L. H. Latimer, Curr. Opin.
Science, 196, 441 (1977). Chem. Biol., 4,377 (2000).
81. D. Leung, G. Abbenante, and D. P. Fairlie, 96. M. S. Wolfe, J. Med. Chem., 44, 2039 (2001).
J. Med. Chem., 43,305 (2000). 97. S. Sinha, J. P. Anderson, R. Barbour, G. S.
82. R. E. Babine and S. L. Bender, Chem. Rev., 97, Basi, R. Caccavello, D. Davis, M. Doan, H. F.
1359 (1997). Dovey, N. Frigon, J. Hong, K. Jacobson-Croak,
N. Jewett, P. Keim, J. Knops, I. Lieberburg, M.
83. I. Schechter and A. Berger, Biochem. Biophys. Power, H. Tan, G. Tatsuno, J. Tung, D.
Res. Commun., 27, 157 (1967). Schenk, P. Seubert, S. M. Suomensaari, S.
84. H. P. Marki, A. Binggeli, B. Bittner, V. Bohner- Wang, D. Walker, J. Zhao, L. McConlogue, and
Lang,V. Breu, D. Bur, P. Coassolo, J. P. Clozel, V. John, Nature, 402, 537 (1999).
A. D'Arcy, H. Doebeli, W. Fischli, C. Funk, J. 98. A. K. Ghosh, D. Shin, D. Downs, G. Koelsch, X.
Foricher, T. Giller, F. Gruninger, A. Guenzi, R. Lin, J. Ermolieff, and J . Tang, J. Am. Chem.
Guller, T. Hartung, G. Hirth, C. Jenny, M. SOC., 122,3522 (2000).
Kansy, U. Klinkhammer, T. Lave, B. Lohri,
F. C. Luft, E. M. Mervaala, D. N. Muller, M. 99. L. Hong, G. Koelsch, X. Lin, S. Wu, S. Terzyan,
Muller, F. Montavon, C. Oefner, C. Qiu, A. A. K. Ghosh, X. C. Zhang, and J. Tang, Science,
Reichel, P. Sanwald-Ducray, M. Scalone, M. 290, 150 (2000).
Schleimer, R. Schmid, H. Stadler, A. Treiber, 100. A. K. Ghosh, G. Biker, C. Harwood, R. Kawa-
0.Valdenaire, E. Vieira, P. Waldmeier, R. Wie- hama, D. Shin, K. A. Hussain, L. Hong, J . A.
gand-Chou, M. Wilhelm, W. Wostl, M. Zell, and Loy, C. Nguyen, G. Koelsch, J. Ermolieff, and
R. Zell, Farmaco, 56, 21 (2001). J. Tang, J. Med. Chem., 44, 2865 (2001).
101. H . H . Rotmensch, P. H. Vlasses, B. N. Swan- 117. H. Umezawa, T . Aoyagi, H. Suda, M. Hamada,

son, J. D. Irvin, K. E. Harris, D. G. Merrill, and and T . Takeuchi, J. Antibiot. (Tokyo), 29,97
R. K. Ferguson, A.m. J. Cardiol., 53, 116 (1976).
(1984). 118. T . Aoyagi, H. Tobe, F. Kojima, M . Hamada, T .
102. A. A. Patchett, E. Harris, E. W . Tristram, M . J. Takeuchi, and H. Umezawa, J. Antibiot. (To-
Wyvratt, M . T . W u , D. Taub, E. R. Peterson, kyo), 31,636(1978).
T . J. Ikeler, J. ten Broeke, L. G. Payne, D. L. 119. H. Kim and W . N. Lipscomb, Proc. Natl. Acad.
Ondeyka, E. D. Thorsett,W . J. Greenlee, N. S. Sci. USA, 90,5006(1993).
Lohr, R. D. Hoffsommer, H. Joshua, W . V .
120. W . T . Lowther, A. M . Orville, D. T . Madden,
Ruyle, J. W . Rothrock, S. D. Aster, A. L. May-
S. J. Lim, D. H. Rich, and B. W . Matthews,
cock, F. M. Robinson, R. Hirschmann, C. S.
Biochemistry, 38,7678(1999).
Sweet, E. H. Ulm, D. M . Gross, T . C. Vassil,
and C. A. Stone, Nature, 288,280(1980). 121. For in-depth analysis, see D. F. Veber and S. K.
Thompson, Curr. Opin. Drug Discov. Dev., 3,
103. B. P. Roques, M. C. Fournie-Zaluski, E. Soroca,
362 (2000); A. Krantz, Bioorg. Med. Chem.
J. M. Lecomte, B. Malfroy,C. Llorens, and J . C.
Lett., 2,1327(1992).
Schwartz, Nature, 288,286(1980).
122. J. Drenth, J. N. Jansonius, and B. G. Wolthers,
104. B. P. Roques, E. Lucas-Soroca, P. Chaillet, J.
J. Mol. Biol., 24,449(1967).
Costentin, and M. C. Fournie-Zaluski, Proc.
Natl. Acad. Sci. USA, 80,3178(1983). 123. J. Drenth, J.N. Jansonius, R. Koekoek, J . Mar-
rink, J. Munnik, and B. G. Wolthers, J. Mol.
105. R. Bouboutou, G. Waksman, J. Devin, M . C.
Biol., 5,398(1962).
Fournie-Zaluski, and B. P. Roques, Life Sci.,
35,1023(1984). 124. S. Michaud and B. J. Gour, Expert Opin. Ther.
Pat., 8,645(1998).
106. J. Bralet and J. C. Schwartz, Trends Pharm.
Sci., 22,106(2001). 125. D. S. Yamashita and R. A. Dodds, Curr.
Pharm. Des., 6,1(2000).
107. S. DeLombaert, R. Chatelain, C. A. Fink, and
A. J . Trapani, Curr. Pharm. Design, 2, 443 126. K. Tezuka, Y . Tezuka, A. Maejima, T . Sato, K.
(1996). Nemoto, H . Kamioka, Y . Hakeda, and M. Ku-
megawa, J. Biol. Chem., 269,1106(1994).
108. J. W.Skiles, L. G. Monovich, and A. Y . Jeng,
Annu. Rep. Med. Chem., 35,167(2000). 127. D. S. Yamashita, W . W . Smith, B. Zhao, C. A.
Janson, T . A. Tomaszek, M. J. Bossard, M . A.
109. M. R. Michaelides and M. L. Curtin, Curr.
Levy, H.-J. Oh, T . J . Carr, S. K. Thompson,
Pharm. Design, 5,787(1999).
C. F. Ijames, S. A. Carr, M . McQueney, K: J.
110. M . Whittaker, C. D. Floyd, P. Brown, and D'Alessio, B. Y . Amegadzie, C. R. Hanning, S.
A. J. H. Gearing, Chem. Rev., 99,2735 (1999). Abdel-Meguid, R. L. DesJarlais, J . G. Gleason,
111. S. Brown, M . M . Bernardo, Z. H. Li, L. P. Ko- andD. F.Veber, J . h . Chem. Soc., 119,11351
tra,Y . Tanaka, R. Fridman, and S. Mobashery, (1997).
J. Am. Chem. Soc., 122,6799(2000). 128. R. W . Marquis, Y . Ru, J. Zeng, R. E. L. Trout,
112. 0. Kleifeld, L. P. Kotra, D. C. Gervasi, S. S. M . LoCastro, A. D. Gribble, J. Witherington,
Brown, M . M . Bernardo, R. Fridman, S. Mo- A. E. Fenwick, B. Gamier, T . Tomaszek, D.
bashery, and I. Sagi, J. Biol. Chem., 276,17125 Tew, M . E. Hemling, C. J . Quinn,W .W . Smith,
(2001). B. Zhao, M . S. McQueney, C. A. Janson, K.
113. M. L. Moss, J. M . White, M . H. Lambert, and D'Alessio, and D. F. Veber, J. Med. Chem., 44,
R. C . Andrews, Drug Discov. Today, 6, 417 725(2001).
(2001). 129. R. W . Marquis, Y . Ru, S. M . LoCastro, J. Zeng,
114. R. A. Black, C. T . Rauch, C. J. Kozlosky, J. J. D. S. Yamashita, H A . Oh, K. F. Erhard, L. D.
Peschon, J. L. Slack, M. F. Wolfson,B. J. Cast- Davis, T . A. Tomaszek, D. Tew, K. Salyers, J .
ner, K. L. Stocking, P. Reddy, S. Srinivasan, N. Proksch, K. Ward, B. Smith, M . Levy, M . D.
Nelson, N. Boiani, K. A. Schooley, M. Gerhart, Cummings, R. C. Haltiwanger, G. Trescher, B.
R. Davis, J. N. Fitzner, R. S. Johnson, R. J. Wang, M. E. Hemling, C. J . Quinn, H. Y .
Paxton, C. J. March, and D. P. Cerretti, Na- Cheng, F. Lin, W . W . Smith, C. A. Janson, B.
ture, 385,729(1997). Zhao, M. S. McQueney, K. D'Alessio,C.-P. Lee,
A. Marzulli, R. A. Dodds, S. Blake, S.-M.
115. F. Anon, Expert Opin. Ther. Pat., 10, 1617 Hwang, I. E. James, C. J . Gress, B. R. Bradley,
(2000). M . W . Lark, M. Gowen, and D. F. Veber,
116. H. Hilpert, Tetrahedron, 57,7675(2001). J. Med. Chem., 44,1380(2001).
References
130. R. V. Talanian, K. D. Brady, and V. L. Cryns, 145. F. Ooms, Curr. Med. Chem., 7,141 (2000).
J. Med. Chem., 43, 3351 (2000). 146. J. Wouters and F. Ooms, Curr. Pharm. Des., 7,
131. D. S. Karanewsky, X. Bai, S. D. Linton, J. F. 529 (2001).
Krebs, J. Wu, B. Pham, and K. J. Tomaselli, 147. E. Vieira, A. Binggeli, V. Breu, D. Bur, W. Fis-
Bioorg. Med. Chem. Lett., 8,2757 (1998). chli, R. Guller, G. Hirth, H. P. Marki, M. Mul-
132. A. B. Shahripour, M. S. Plummer, E. A. Lun- ler, C. Oefner, M. Scalone, H. Stradler, M. Wil-
ney, T. K. Sawyer, C. J. Stankovic, M. K. Con- helm, and W. Wostl, Bioorg. Med. Chem. Lett.,
nolly, J. R. Rubin, N. P. C. Walker, K. D. Brady, 9,1397 (1999).
H. J. Allen, R. V. Talanian, W. W. Wong, and C. 148. G. Guller, A. Binggeli, V. Breu, D. Bur, W. Fis-
Humblet, Bioorg. Med. Chem. Lett., 11, 2779 chli, G. Hirth, C. Jenny, M. Kansay, F. Mon-
(2001). tavon, M. Muller, C. Oefner, H. Stradler, E.
133. D. Lee, S. A. Long, J. L. Adams, G. Chan, K. S. Vieira, M. Wilhelm, W. Wostl, andH. P. Marki,
Vaidya, T. A. Francis, K. Kikly, J. D. Winkler, Bioorg. Med. Chem. Lett., 9, 1403 (1999).
C.-M. Sung, C. Debouck, S. Richardson, M. A. 149. C. Oefner, A. Binggeli, V. Breu, D. Bur, J.-P.
Levy, W. E. DeWolf Jr., P. M. Keller, T. To- Clozel, A. D'Arcy, A. Dorn, W. Fischli, F.
maszek, M. S. Head, M. D. Ryan, R. C. Halti- Gruninger, R. Guller, G. Hirth, H. P. Marki, S.
wanger, P.-H. Liang, C. A. Janson, P. J. McDe- Mathews, M. Muller, R. G. Ridler, H. Stadler,
vitt, K. Johanson, N. 0.Concha, W. Chan, S. S. E. Vieira, M. Wilhelm, F. K. Winklier, and W.
Abdel-Meguid, A. M. Badger, M. W. Lark, D. P. Wostl, Chem. Biol., 6, 127 (1999).
Nadeau, L. J. Suva, M. Gowen, and M. E. Nut- 150. M. G. Bursavich, C. W. West, and D. H. Rich,
tall, J. Biol. Chem., 275, 16007 (2000). Org. Lett., 3,2317 (2001).
134. D. Lee, S. A. Long, J. H. Murray, J. L. Adams, 151. A. B. Smith 111, R. Hirschmann, A. Pasternak,
M. E. Nuttall, D. P. Nadeau, K. Kikly, J. D. W. Yao, P. A. Sprengeler, M. K. Holloway, L. C.
Winkler, C.-M. Sung, M. D. Ryan, M. A. Levy, Kuo, Z. Chen, P. L. Darke, and W. A. Schleif,
P. M. Keller, and W. E. DeWolf Jr., J. Med. J. Med. Chem., 40,2440 (1997).
Chem., 44,2015 (2001).
152. J. D. A. Tyndall, R. C. Reid, D. P. Tyssen, D. K.
135. R. E. Dolle, J. Singh, J. Rinker, D. Hoyer, Jardine, B. Todd, M. Passmore, D. R. March,
C. V. C. Prasad, T. L. Graybill, J. M. Salvino, L. K. Pattenden, D. A. Bergman, D. Alewood,
C. T. Helaszek, R. E. Miller, and M. A. Ator, S.-H. Hu, P. F. Alewood, C. J. Birch, J. L. Mar-
J. Med. Chem., 37,3863 (1994). tin, and D. P. Fairlie, J. Med. Chem., 43,3495
136. B. K. Kay, A. V. Kurakin, and R. Hyde-DeRuy- (2000).
scher, Drug Discov. Today, 3,370 (1998). 153. S. E. Hagen, J. V. N. V. Prasad, F. E. Boyer, .
137. F.Al-Obeidi, V. J. Hruby, and T. K. Sawyer, J. M. Domagala, E. L. Ellsworth, C. Gajda,
Mol. Biotechnol., 9,205 (1998). H. W. Hamilton, L. J. Markoski, B. A. Stein-
138. A. E. P. Adang and P. H. H. Hermkens, Curr. baugh, B. D. Tait, E. A. Lunney, P. J. Tum-
Med. Chem., 8,985 (2001). mino, D. Ferguson, D. Hupe, C. Nouhan, S. J.
Gracheck, J. M. Saunders, and S. Vander-
139. K. S. Lam, M. Lebl, and V. Krchnak, Chem. Roest, J. Med. Chem., 40,3707 (1997).
Rev., 97,411 (1997).
154. T. M. Judge, G. Phillips, J. K. Morris, K. D.
140. A. C. Good, S. R. Krystek, and J. S. Mason, Lovasz, K. R. Romines, G. P. Luke, J. Tulin-
Drug Discov. Today., 5,61(2000). sky, J. M. Tustin, R. A. Chrusciel, L. A. Dolak,
141. S. P. Rohrer, E. T. Birzin, R. T. Mosley, S. C. S. A. Mizsak, W. Watt, J. Morris, S. L. V. Velde,
Berk, S. M. Hutchins, D.-M. Shen, Y. Xiong, J. W. Strohbach, and R. B. Gammill, J. Am.
E. C. Hayes, R. M. Parmar, F. Foor, S. W. Mi- Chem. Soc., 119,3627 (1997).
tra, S. J. Degrado, M. Shu, J. M. Klopp, S . J . 155. G. V. De Lucca, S. Erickson-Viitanen, and
Cai, A. Blake, W. W. S. Chan, A. Pasternak, L. P. Y. S. Lam, Drug Discov. Today, 2, 6 (1997).
Yang, A. A. Patchett, R. G. Smith, K. T. Chap- 156. W. Schaal, A. Karlsson, G. Ahlsen, J. Lindberg,
man, and J. M. Schaeffer, Science, 282, 737 H. 0. Andersson, U. H. Danielson, B. Classon,
(1998). T. Unge, B. Samuelsson, J. Hulten, A. Hall-
142. R. S. Bohacek, C. McMartin, and W. C. Guida, berg, and A. Karlen, J. Med. Chem., 44, 155
Med. Res. Rev., 16,3 (1996). (2001).
143. Y. Kurogi and 0. F. Guner, Curr. Med. Chem., 157. J. P. Vacca, Curr. Opin. Chem. Biol., 4, 394
8, 1035 (2001). (2000).
144. J. S. Mason, A. C. Good, and E. J. Martin, Curr. 158. P. E. J. Sanderson, Med. Res. Rev., 19, 179
Pharm. Des., 7,567 (2001). (1999).
159. P. E. J. Sanderson and A. M. Naylor-Olsen, Smith, C. V. Walker, X. L. Cockcroft, J. Am-

Curr. Med. Chem., 5,289(1998). bler, A. Mitchelson, M. D. Talbot, M. Tweed,
160. M. R. Wiley, N. Y. Chirgadze, D. K. Clawson, and N. Wills, Bioorg. Med. Chem. Lett., 10,
T. J. Craft, D. S. Gifford-Moore, N. D. Jones, 1563(2000).
J. L. Olkowaki, A. L. Schacht, L. C. Weir, and 173. R. Rai, P. A. Sprengeler, K. C. Elrod, and W. B.
G. F. Smith, Bioorg. Med. Chem. Lett.,5,2835 Young, Curr. Med. Chem., 8,101(2001).
(1995). 174. J. M. Fevig, D. J. Pinto, Q. Han, M. L. Quan,
161. S. F. Brady, K. J. Stauffer, W. C. Lumma, G. M. J. R. Pruitt, I. C. Jacobson, R. A. Galemmo Jr.,
Smith, H. G. Ramjit, S. D. Lewis, B. J. Lucas, S. Wang, M. J. Orwat, L. L. Bostrom, R. M.
S. J. Gardell, E. A. Lyle, S. D. Appleby, J. J. Knabb, P. C. Wong, P. Y. S. Lam, and R. R.
Cook, M. A. Holahan, M. T. Stranieri, J. J. Wexler, Bioorg. Med. Chem. Lett., 11, 641
Lynch Jr., J. H. Lin, I. W. Chen, K. Vastag, (2001).
A. M. Naylor-Olsen, and J. P. Vacca, J. Med. 175. S. I. Klein, M. Czekaj, C. J. Gardner, K. R.
Chem., 41,401(1998). Guertin, D. L. Cheney, A. P. Spada, S. A.
162. T. J. Tucker, W. C. Lumma, A. M. Naylor-Ol- Bolton, K. Brown, D. Colussi, C. L. Heran, S. R.
sen, S. D. Lewis, R. Lucas, R. M. Freidinger, Morgan, R. J. Leadley, C. T. Dunwiddie, M. H.
A. M. Mulichak, Z. Chen, and L. C. Kuo, Perrone, and V. Chu, J. Med. Chem., 41,437
J. Med. Chem., 40,830(1997). (1998).
163. R. A. Engh, H. Brandstetter, G. Sucher, A. 176. Y. Gong, H. W. Pauls, A. P. Spada, M. Czekaj,
Eichinger, U. Baumann, W. Bode, R. Huber, T. G. Y. Liang, V. Chu, D. J. Colussi, K. D. Brown,
Poll, R. Rudolph, and W. Von der Sad, Struc- and J. B. Gao, Bioorg. Med. Chem. Lett., 10,
ture, 4,1353(1996). 217(2000).
164. T. B. Lu, B. Tomczuk, R. Bone, L. Murphy, 177. D. K. Herron, T. Goodson, M. R. Wiley, L. C.
F. R. Salemme, and R. M. Soll, Bioorg. Med. Weir, J. A. Kyle, Y. K. Yee, A. L. Tebbe, J. M.
Chem. Lett.,10,83(2000). Tinsley, D. Mendel, J. J. Masters, J. B. Fran-
ciskovich, J. S. Sawyer, D. W. Beight, A. M.
165. T. B. Lu, R. M. Soll, C. R. Illig, R. Bone, L. Ratz, G. Milot, S. E. Hall, V. J. Klimkowski,
Murphy, J. Spurlino, F. R. Salemme, and B. E.
J. H. Wikel, B. J . Eastwood, R. D. Towner, D. S.
Tomczuk, Bioorg. Med. Chem. Lett., 10, 79
Gifford-Moore, T. J. Craft, and G. F. Smith,
(2000). J. Med. Chem., 43,859(2000).
166. N. Y. Chirgadze, D. J. Sall, V. J. Klimkowski, 178. Y. K. Yee, A. L. Tebbe, J. H. Linebarger, D. W.
D. K. Clawson, S. L. Briggs, R. Hermann, G. F. Beight, T. J. Craft, D. Gifford-Moore, T. Good-
Smith, D. S. Gifford-Moore, and J.-P. Wery, son, D. K. Herron, V. J. Klimkowski, J. A. Kyle,
Protein Sci., 6,1412(1997). J. S. Sawyer, G. F. Smith, J. M. Tinsley, R. D.
167. D. J. Sall, J. A. Bastian, S. L. Briggs, J. A. Towner, L. Weir, and M. R. Wiley, J. Med.
Buben, N. Y. Chirgadze, D. K. Clawson, M. L. Chem., 43,873 (2000).
Denney, D. D. Giera, D. S. Gifford-Moore, 179. M. R. Wiley, L. C. Weir, S. Briggs, N. A. Bryan,
R. W. Harper, K. L. Hauser, V. J. Klimkowski, J. Buben, C. Campbell, N. Y. Chirgadze, R. C.
T. J. Kohn, H.-S. Lin, J. R. McCowan, A. D. Conrad, T. J. Craft, J. V. Ficorilli, J. B. Fran-
Palkowitz, G. F. Smith, K. Takeuchi, K. J. ciskovich, L. L. Froelich, D. S. Gifford-Moore,
Thrasher, J. M. Tinsley, B. G. Utterback, S.- T. Goodson, D. K. Herron, V. J. Klimkowski,
C. B. Yan, and M. Zhang, J. Med. Chem., 40, K. D. Kurz, J. A. Kyle, J. J. Masters, A. M.
3489(1997). Ratz, G. Milot, R. T. Shuman, T. Smith, G. F.
168. M. F. Malley, L. Tabernero, C. Y. Chang, S. L. Smith, A. L. Tebbe, J. M. Tinsley, R. D.
Ohringer, D. G. M. Roberts, J. Das, and J. S. Towner, A. Wilson, and Y. K. Yee, J. Med.
Sack, Protein Sci., 5,221(1996). Chem., 43,883(2000).
169. N. Y. Chirgadze, D. J. Sall, S. L. Briggs, D. K. 180. Z. S. Zhao, D. 0. Arnaiz, B. Griedel, S. Sakata,
Clawson, M. Zhang, G. F. Smith, and R. W. J. L. Dallas, M. Whitlow, L. Trinh, J. Post, A.
Schevitz, Protein Sci., 9, 29(2000). Liang, M. M. Morrissey, and K. J. Shaw,
170. U. Obst, D. W. Banner, L. Weber, and F. Died- Bioorg. Med. Chem. Lett.,10,963(2000).
erich, Chem. Biol., 4,287(1997). 181. J. M. Smallheer, R. E. Olson, and R. R. Wexler,
171. A. von Matt, C. Ehrhardt, P. Burkhard, R. Annu. Rep. Med. Chem., 35,103(2000).
Metternich, M. Walkinshaw, and C. Tappar- 182. W. Wang, R.T. Borchardt, and B. Wang, Curr.
elli, Bioorg. Med. Chem., 8,2291(2000). Med. Chem., 7,437(2000).
172. U. Baettig, L. Brown, D. Brundish, C. Dell, A. 183. E. Ruoslahti and M. D. Pierschbacher, Science,
Furzer, S. Garman, D. Janus, P. D. Kane, G. 238,491(1987).
References
184. D. M. Haverstick, J. F. Cowan, K. M. Yamada, A. M. Naylor-Olsen, J. J. Cook, J. D. Glass,

and S. A. Santoro, Blood, 66,946 (1985). R. J. Lynch, G. Zhang, and R. J. Gould, Bioorg.
185. F. E. Ali, D. B. Bennett, R. R. Calvo, J. D. El- Med. Chem. Lett., 9,863 (1999).
liott, S. M. Hwang, T. W. Ku, M. A. Lago, A. J. 197. B. Wang, W. Wang, G. P. Camenisch, J. Elmo,
Nichols, T. T. Romoff, D. H. Shah, J. A. Vasko, H. Zhang, and R. T. Borchardt, Chem. Pharm.
A. S. Wong, T. 0. Yellin, C. K. Yuan, and J. M. Bull. (Tokyo), 47,90 (1999).
Samanen, J. Med. Chem., 37,769 (1994). 198. M. S. Smyth, J. Rose, M. M. Mehrotra, J.
186. P. L. Barker, S. Bullens, S. Bunting, D. J. Bur- Heath, G. Ruhter, T. Schotten, J. Seroogy, D.
dick, K. S. Chan, T. Deisher, C. Eigenbrot, Volkots, A. Pandey, and R. M. Scarborough,
T. R. Gadek, R. Gantzos, M. T. Lipari, C. D. Bioorg. Med. Chem. Lett., 11, 1289 (2001).
Muir, M. A. Napier, R. M. Pitti, A. Padua, C. 199. M. E. Duggan, L. T. Duong, J . E. Fisher, T. G.
Quan, M. Stanley, M. Struble, J. Y. K. Tom, Hamill, W. F. Hoffman, J. R. Huff, N. C. Ihle,
and J. P. Burnier, J. Med. Chem., 35, 2040 C.-T. Leu, R. M. Nagy, J. J. Perkins, S. B. Ro-
(1992). d m , G. Wesolowski, D. B. Whitman, A. E. Zart-
187. R. S. Mcdowell and T. R. Gadek, J.Am. Chem. man, G. A. Rodan, and G. D. Hartman, J. Med.
Soc., 114,9245 (1992). Chem., 43,3736 (2000).
188. W. E. Bondinell, R. M. Keenan, W. H. Miller, 200. A. Peyman, V. Wehner, J. Knolle, H. U. Stilz,
F. E. Ali, A. C. Allen, C. W. De Brosse, D. S. G. Breipohl, K.-H. Scheunemann, D. Carniato,
Eggleston, K. F. Erhard, R. C. Haltiwangerc, J.-M. Ruxer, J.-F. Gourvest, T. R. Gadek, and
W. F. Huffmana, S.-M. Hwangd, D. R. Jakasa, S. Bodary, Bioorg. Med. Chem. Lett., 10, 179
P. F. Kosterf, T. W. Kua, C. P. Leee, A. J. Ni- (2000).
cholsf, S. T. Rossa, J. M. Samanena, R. E. Val- 201. A. D. Cox, Drugs, 61, 723 (2001).
ocikf, J. A. Vasko-Moserf, J. W. Venslavskya, 202. H. Waldmann and M. Thutewohl, Top. Curr.
A. S. Wongd, and C.-K. Yuana, Bioorg. Med. Chem., 211, 117 (2001).
Chem., 2,897 (1994). 203. A. Wittinghofer and H. Waldmann, Angew.
189. B. K. Blackburn, A. Lee, M. Baier, B. Kohl, Chem. Int. Ed., 39,4192 (2000).
A. G. Olivero, R. Matamoros, K. D. Robarge, 204. N. E. Kohl, C. A. Omer, M. W. Conner, N. J.
and R. S. McDowell, J. Med. Chem., 40, 717 Anthony, J. P. Davide, S. J. Desolms, E. A.
(1997). Giuliani, R. P. Gomez, S. L. Graham, K. Ham-
190. J. D. Prugh, R. J. Gould, R. J. Lynch, G. X. ilton, L. K. Handt, G. D. Hartman, K. S. Ko-
Zhang, J. J. Cook, M. A. Holahan, M. T. blan, A. M. Kral, P. J. Miller, S. D. Mosser, T. J.
Stranieri, G. R. Sitko, S. L. Gaul, R. A. Bednar, Oneill, E. Rands, M. D. Schaber, J. B. Gibbs,
B. Bednar, and G. D. Hartman, Bioorg. Med. and A. Oliff, Nut. Med., 1, 792 (1995).
Chem. Lett., 7,865 (1997). 205. M. Nigam, C.-M. Seong, Y. Qian, A. D. Hamil-
191. N. J. Liverton, D. J. Armstrong, D. A. Clar- ton, and S. M. Sebti, J.Biol. Chem., 268,20695
emon, D. C. Remy, and J. J. Baldwin, Bioorg. (1993).
Med. Chem. Lett., 8,483 (1998). 206. J. T. Hunt, V. G. Lee, K. Leftheris, B. Seiz-
192. B. C. Askew, C. J. McIntyre, C. A. Hunt, D. A. inger, J. Carboni, J. Mabus, C. Ricca, N. Yan,
Claremon, J. J. Baldwin, P. S. Anderson, R. J. and V. Manne, J. Med. Chem., 39,353 (1996).
Gould, R. J. Lynch, C. C. T. Chang, J. J. Cook, 207. C. J. Dinsmore, J. M. Bergman, D. D. Wei, C. B.
J. J. Lynch, M. A. Holahan, G. R. Sitko, and Zartman, J. P. Davide, I. B. Greenberg, D. M.
M. T. Stranieri, Bioorg. Med. Chem. Lett., 7, Liu, T. J. O'Neill, J. B. Gibbs, K. S. Koblan,
1531 (1997). N. E. Kohl, R. B. Lobell, I. W. Chen, D. A.
193. E. J. Topol, T. V. Byzova, and E. F. Plow, Lan- McLoughlin, T. V. Olah, S. L. Graham, G. D.
cet, 353,227 (1999). Hartman, and T. M. Williams, Bioorg. Med.
194. C. B. Xue, J. Roderick, S. Jackson, M. Rafalski, Chem. Lett., 11, 537 (2001).
A. Rockwell, S. Mousa, R. E. Olson, and W. F. 208. S. B. Long, P. J. Hancock, A. M. Kral, H. W.
DeGrado, Bioorg. Med. Chem., 5,693 (1997). Hellinga, and L. S. Beese, Proc. Natl. Acad.
195. M. J. Fisher, B. Gunn, C. S. Harms, A. D. Sci. USA, 98, 12948 (2001).
Kline, J. T. Mullaney, A. Nunes, R. M. Scarbor- 209. G. L. James, J. L. Goldstein, M. S. Brown, T. E.
ough, A. E. Arfsten, M. A. Skelton, S. L. Um, Rawson, T. C. Somers, R. S. McDowell, C. W.
B. G. Utterback, and J. A. Jakubowski, J.Med. Crowley, B. K. Lucas, A. D. Levinson, and J. C.
Chem., 40,2085 (1997). Marsters Jr., Science, 260, 1937 (1993).
196. G. D. Hartman, M. E. Duggan, W. F. Hoffman, 210. J. T. Hunt, C. Z. Ding, R. Batorsky, M. Bed-
R. J. Meissner, J. J. Perkins, A. E. Zartman, narz, R. Bhide, Y. Cho, S. Chong, S. Chao, J.
Gullo-Brown, P. Guo, S. H. Kim, F. Y. F. Lee, Drozda, M. C. Desai, F. J. Vinick, R. W. Spen-

K. Leftheris, A. Miller, T. Mitt, M. Patel, B. A. cer, and H. J. Hess, Science, 251,435 (1991).
Penhallow, C. Ricca, W. C. Rose, R. Schmidt, 226. J. A. Lowe 111,S. E. Drozda, R. M. Snider, K. P.
W. A. Slusarchyk, G. Vite, and V. Manne, Longo, S. H. Zorn, J . Morrone, E. R. Jackson,
J. Med. Chem., 43, 3587 (2000). S. McLean, D. K. Bryce, J. Bordner, A. Naga-
211. F. G. Njoroge, R. J. Doll, B. Vibulbhan, C. S. hisa, Y. Kanai, 0 . Suga, and M. Tsuchiya,
Alvarez, W. R. Bihop, J. Petrin, P. Kir- J. Med. Chem., 35,2591 (1992).
schmeier, N. I. Carruthers, J. K. Wong, M. M. 227. T. Harrison, A. P. Owens, B. J. Williams, C. J.
Albanese, J. J. Piwinski, J. Catino, V. Giri- Swain, A. Williams, E. J. Carlson, W. Rycroft,
javallabhan, and A. K. Ganguly, Bioorg. Med. F. D. Tattersall, M. A. Cascieri, G. G. Chicchi,
Chem., 5, 101 (1997). S. Sadowski, N. M. J. Rupniak, and R. J. Har-
212. C. L. Strickland, P. C. Weber, W. T. Windsor, Z. greaves, J. Med. Chem., 44,4296 (2001).
Wu, H. V. Le, M. M. Albanese, C. S. Alvarez, D. 228. P. C. Ting, J. F. Lee, J. C. Anthes, N. Y. Shih,
Cesarz, J. del Rosario, J. Deskus, A. K. Mal- and J. J. Piwinski, Bioorg. Med. Chem. Lett.,
lams, F. G. Njoroge, J. J. Piwinski, S. Remisze- 11, 491 (2001).
wski, R. R. Rossman, A. G. Taveras, B. Vibul-
229. P. C. Ting, J. F. Lee, J. C. Anthes, N. Y. Shih,
bhan, R. J. Doll, V. M. Girijavallabhan, and
and J. J. Piwinski, Bioorg. Med. Chem. Lett.,
A. K. Ganguly, J. Med. Chem., 42,2125 (1999).
10,2333 (2000).
213. M. Gurrath, Curr. Med. Chem., 8,1605 (2001).
230. G. A. Reichard, Z. T. Ball, R. Aslanian, J. C.
214. F. Yushiyasu, K. Shoji, and N. Kohei, US Anthes, N. Y. Shih, and J. J. Piwinski, Bioorg.
Patent 4,355,040, 1982. Med. Chem. Lett., 10,2329 (2000).
215. R. R. Smeby and S. Fermandjian, Chem. Bio- 231. F. E. Blaney, L. F. Raveglia, M. Artico, S. Cav-
chem. Amino Acids, Pept., Proteins, 1978. agnera, C. Dartois, C. Farina, M. Grugni, S.
216. J. V. Duncia, A. T. Chiu, D. J. Carini, G. B. Gagliardi, M. A. Luttmann, M. Martinelli,
Gregory, A. L. Johnson, W. A. Price, G. J. G. M. M. G. Nadler, C. Parini, P. Petrillo, H. M.
Wells, P. C. Wong, J. C. Calabrese, and Sarau, M. A. Scheideler, D. W. P. Hay, and
P. B. M. W. M. Timmermans, J. Med. Chem., G. A. M. Giardina, J. Med. Chem., 44, 1675
33, 1312 (1990). (2001).
217. J. Weinstock, R. M. Keenan, J. Samanen, J. 232. A. W. Stamford and E. M. Parker, Annu. Rep.
Hempel, J. A. Finkelstein, R. G. Franz, D. E. Med. Chem., 34,31(1999).
Gaitanopoulos, G. R. Girard, J. G. Gleason,
233. J . Wright, Drug Discov. Today, 2, 19 (1997).
D. T. Hill, T. M. Morgan, C. E. Peishoff, N.
Aiyar, D. P. Brooks, T. A. Fredrickson, E. H. 234. K. Rudolf, W. Eberlein, W. Engel, H. A. Wie-
Ohlstein, R. R. Ruffolo, E. J. Stack, A. C. Sulpi- land, K. D. Willim, M. Entzeroth, W. Wienen,
zio, E. F. Weidley, and R. M. Edwards, J. Med. A. G. Beck-Sickinger, and H. N. Doods, Eur.
Chem., 34, 1514 (1991). J. Pharmacol., 271, R11 (1994).
218. S. Perlman, H. T. Schambye, R. A. Rivero, 235. R. E. Malmstroem, A. Modin, and J. M. Lund-
W. J. Greenlee, S. A. Hjorth, and T. W. berg, Eur. J. Pharmacol., 305,145 (1996).
Schwartz, J. Biol. Chem., 270, 1493 (1995). 236. P. A. Hipskind, K. L. Lobb, J. A. Nixon, T. C.
219. B. Vianello, E. Clauser, P. Corvol, and C. Mon- Britton, R. F. Bruns, J. Catlow, D. K. Dieck-
not, Eur. J. Pharmacol., 347, 113 (1998). man-McGinty, S. L. Gackenheimer, B. D. Git-
220. C. Swain and N. M. J. Rupniak, Annu. Rep. ter, S. Iyengar, D. A. Schober, R. M. A. Sim-
Med. Chem., 34, 51 (1999). mons, S. Swanson, H. Zarrinmayeh, D. M.
Zimmerman, and D. R. Gehlert, J. Med.
221. J. A. Lowe 111, Med. Res. Rev., 16, 527 (1996).
Chem., 40,3712 (1997).
222. S. McLean, Med. Res. Rev., 16,297 (1996).
237. H. Zarrinmayeh, D. M. Zimmerman, B. E.
223. D. C. Horwell, J. A. H. Lainton, J. A. O'Neill, Cantrell, D. A. Schober, R. F. Bruns, S. L.
M. C. Pritchard, and J. Raphy, Spec. Publ. R. Gackenheimer, P. L. Ornstein, P. A. Hipskind,
Soc. Chem., 264,95 (2001). T. C. Britton, and D. R. Gehlert, Bioorg. Med.
224. V. A. Ashwood, M. J . Field, D. C. Horwell, C. Chem. Lett., 9, 647 (1999).
Julien-Larose, R. A. Lewthwaite, S. McCleary, 238. M. H. Norman, N. Chen, Z. D. Chen, C. Fotsch,
M. C. Pritchard, J. Raphy, and L. Singh, C. Hale, N. H. Han, R. Hurt, T. Jenkins, J.
J. Med. Chem., 44,2276 (2001). Kincaid, L. B. Liu, Y. L. Lu, 0. Moreno, V. J.
225. R. M. Snider, J. W. Constantine, J. A. Lowe 111, Santora, J. D. Sonnenberg, and W. Karbon,
K. P. Longo, W. S. Lebel, H. A. Woody, S. E. J. Med. Chem., 43,4288 (2000).
References
239. 0. Dellaz Uana, M. Sadlo, M. Gerrnain, M. 251. E. H. Ohlstein, P. Nambi, S. A. Douglas, R. M.

Feletou, S. Chamorro, F. Tisserand, C. de Edwards, M. Gellai, A. Lago, J. D. Leber, R. D.
Montrion, J . F. Boivin, J . Duhault, J. A. Bou- Cousins, A. M. Gao, J. 5. Frazee, C. E. Peishoff,
tin, and N. Levens, Int. J. Obes., 25,84 (2001). J. W. Bean, D. S. Eggleston, N. A. Elshourbagy,
240. H. Rueeger, P. Rigollier, Y. Yamaguchi, T. C. Kumar, J.A. Lee, T. L. Yue, C. Louden, D. P.
Schmidlin, W. Schilling, L. Criscione, S. White- Brooks, J. Weinstock, G. Feuerstein, G. Poste,
bread, M. Chiesi, M. W. Walker, D. Dhanoa, I. R. R. Ruffolo, J. G. Gleason, and J. D. Elliott,
Islam, J. Zhang, and C. Gluchowski, Bioorg. Proc. Natl. Acad. Sci. USA, 91,8052 (1994).
Med. Chem. Lett., 10, 1175 (2000). 252. J. D. Elliott, M. A. Lago, R. D. Cousins, A. M.
241. R. P. Nargund, A. A. Patchett, M. A. Bach, Gao, J. D. Leber, K. F. Erhard, P. Nambi, N. A.
M. G. Murphy, and R. G. Smith, J.Med. Chem., Elshourbagy, C. Kumar, J. A. Lee, J. W. Bean,
41,3103 (1998). C. W. Debrosse, D. S. Eggleston, D. P. Brooks,
242. C. Y. Bowers, F. A. Momany, G. A. Reynolds, G. Feuerstein, R. R. Ruffolo, J. Weinstock,
and A. Hong, Endocrinology, 114,1537 (1984). J. G. Gleason, C. E. Peishoff, and E. H. Ohl-
243. M. H. Chen, M. G. Steiner, A. A. Patchett, K. stein, J. Med. Chem., 37, 1553 (1994).
Cheng, L. T. Wei, W. W. S. Chan, B. Butler, 253. H . 4 . Jae, M. Winn, T. W. von Geldern, B. K.
T. M. Jacks, and R. G. Smith, Bioorg. Med. Sorensen, W. J. Chiou, B. Nguyen, K. C.
Chem. Lett., 6,2163 (1996). Marsh, and T. J. Opgenorth, J. Med. Chem.,
244. A. A. Patchett, R. P. Nargund, J. R. Tata, M. H. 44,3978 (2001).
Chen, K. J. Barakat, D. B. R. Johnston, K. 254. S. Anzali, W. W. K. R. Mederski, M. Osswald,
Cheng, W. W. S. Chan, B. Butler, G. Hickey, T. and D. Dorsch, Bioorg. Med. Chem. Lett., 8 , 1 1
Jacks, K. Schleim, S. S. Pong, L. Y. P. Chaung, (1998).
H. Y. Chen, E. Frazier, K. H. Leung, S. H. L. 255. W. W. K. R. Mederski, D. Dorsch, M. Osswald,
Chiu, and R. G. Smith, Proc. Natl. Acad. Sci. S. Anzali, M. Christadler, C.J. Schmitges, P.
USA, 92,7001 (1995). Schelling, C. Wilm, and M. Fluck, Bioorg. Med.
245. B. L. Palucki, S. D. Feighner, S. S. Pong, K. K. Chem. Lett., 8, 1771 (1998).
McKee, D. L. Hreniuk, C. Tan, A. D. Howard,
L. H. Y. Van der Ploeg, A. A. Patchett, and 256. W. W. K. R. Mederski, M. Osswald, D. Dorsch,
R. P. Nargund, Bioorg. Med. Chem. Lett., 11, S. Anzali, M. Christadler, C.-J. Schmitges, and
1955 (2001). C. Wilm, Bioorg. Med. Chem. Lett., 8, 17
(1998).
246. T. K. Hansen, M. Ankersen, B. S. Hansen, K.
Raun, K. K. Nielsen, J. Lau, B. Peschke, B. F. 257. P. D. Stein, J. T. Hunt, D. M. Floyd, S. More-
Lundt, H. Thogersen, N. L. Johansen, K. Mad- land, K. E. J. Dickinson, C. Mitchell, E. C. K.
sen, and P. H. Andersen, J. Med. Chem., 41, Liu, M. L. Webb, N. Murugesan, J. Dickey, D.
3705 (1998). Mcmullen, R. G. Zhang, V. G. Lee, R. Serafino,
247. T. K. Hansen, M. Ankersen, K. Raun, and B. S. C. Delaney, T. R. Schaeffer, and M. Kozlowski,
Hansen, Bioorg. Med. Chem. Lett., 11, 1915 J. Med. Chem., 37, 329 (1994).
(2001). 258. N. Murugesan, Z. X. Gu, P. D. Stein, S. Sper-
248. P. Huang, G. H. Loew, H. Funamizu, M. gel, A. Mathur, L. Leith, E. C. K. Liu, R. A.
Mimura, N. Ishiyama, M. Hayashida, T. Zhang, E. Bird, T. Waldron, A. Marino, R. A.
Okuno, 0. Shimada, A. Okuyama, S. Ikegami, Morrison, M. L. Webb, S. Moreland, and J. C.
J. Nakano, and K. Inoguchi, J.Med. Chem., 44, Barrish, J. Med. Chem., 43, 3111 (2000).
4082 (2001). 259. C. D. Wu, E. R. Decker, N. Blok, J. Li, A. R.
249. M. L. Webb and T. D. Meek, Med. Res. Rev., 17, Bourgoyne, H. Bui, K. M. Keller, V. Knowles,
17 (1997). W. Li, F. D. Stavros, G. W. Holland, T. A.
250. A. Ray, L. G. Hegde, A. Chugh, and J. B. Gupta, Brock, and R. A. F. Dixon, J. Med. Chem., 44,
Drug Discov. Today, 5,455 (2000). 1211 (2001).
CHAPTER SIXTEEN
Analog Design
JOSEPH G . CANNON
The University of Iowa
Iowa City, Iowa
Contents
1 Introduction, 688
2 Bioisosteric Replacement and Nonisosteric
Bioanalogs (Nonclassical Bioisosteres), 689
3 Rigid or Semirigid (Conformationally Restricted)
Analogs, 694
4 Homologation of Alkyl Chain or Alteration of
Chain Branching; Changes in Ring Size; Ring-
Position Isomers; and Substitution of an
Aromatic Ring for a Saturated One, or the
Converse, 699
5 Alteration of Stereochemistry and Design of
Stereoisomers and Geometric Isomers, 704
6 Fragments of the Lead Molecule, 707
7 Variation in Interatomic Distances, 710

Analog Design
1 INTRODUCTION and synthesis, it must be recognized that the

newly created analogs are chemical entities
This chapter is limited to nonprotein thera- different from the lead compound. It is not
peutic candidates. The subject of peptide ana- possible to retain all and exactly the same sol-
logs and peptidomimetic agents merits sepa- ubility and solvent partition characteristics,
rate consideration. Contemporary search for chemical reactivity and stability, acid or base
new drugs makes extensive use of robotic strength, and/or in vivo metabolism proper-
techniques of combinatorial chemistry and ties of the lead compound. Thus, although the
high throughput synthesis, whereby huge new analog may demonstrate pharmacological
numbers of compounds can be prepared for similarities to the lead compound, it is not
high throughput screening. However, this likely to be identical to it, either chemically or
nonselective synthetic method based on a ran- biologically, nor will its similarities and differ-
dom screening philosophy should not replace ences always be predictable.
the strategy of analog design, but rather it The goal of analog design is twofold: (1)to
should be considered as a useful prelude to modify the chemical structure of the lead com-
analog design. pound to retain or to reinforce the desirable
In any strategy aimed at designing new pharmacologic effect while minimizing un-
drug molecules or analogs of known biologi- wanted pharmacological (e.g., toxicity, side ef-
cally active compounds, there are no absolute fects, or undesired routes of and/or unaccept-
guidelines or rules for procedure; the knowl- able rates of metabolism) and physical and
edge, imagination, and intuition of the medic- chemical properties (e.g., poor solubility and
inal chemist are the most important contribu- solvent partition characteristics or chemical
tors to success. Analog design is as much an instability), which may result in a superior
art as it is a science. The concept of analog therapeutic agent; and (2) to use target ana-
design presupposes that a lead has been dis- logs as pharmacological probes (i.e., tools used
covered; that is, a chemical compound has for the study of fundamental pharmacological
been identified that possesses some desirable and physiological phenomena) to gain better
pharmacological property. The search for and insight into the pharmacology of the lead mol-
identification of leads is a challenge and is a ecule and perhaps to reveal new knowledge of
separate topic. It is sufficient for the present basic biology. Studies of analog structure-ac-
discussion to note that lead compounds are tivity relationships may increase the medici-
frequently identified as endogenous partici- nal chemist's ability to predict optimum
pants (hormones, neurotransmitters, second chemical structural parameters for a given
messengers, or enzyme cofactors) in the pharmacological action.
body's biochemistry and physiology, or a lead Analog design is greatly facilitated if the
may result from routine, random biological medicinal chemist can initially define the
screening of natural products or of synthetic pharmacophore of the lead compound: that
molecules that were created for purposes combination of atoms within the molecule
other than for use as drugs. that is responsible for eliciting the desired
Analog design is most fruitful in the study pharmacologic effect. Analog design may be
of pharmacologically active molecules that are directed toward maintaining this combination
structurally specific: their biological activity of atoms intact in a newly designed molecule
depends on the nature and the details of their or toward a carefully planned, systematic
chemical structure (including stereochemis- modification of the pharmacophore. If the me-
try). Hence, a seemingly minor modification of dicinal chemist is uncertain about the struc-
the molecule may result in a profound change tural features of the pharmacophoric portion
in the pharmacological response (increase, di- of the molecule, a prime initial goal of analog
minish, completely destroy, or alter the nature design should be to define the pharmacophore.
of the response). In pursuing analog design The medicinal chemist should address the fol-
2 Bioisosteric Replacement and Nonisosteric Bioanalogs (Nonclassical Bioisosteres) 689
lowing questions: What change(s)can be made cation of a combination of these strategies to

in the lead molecule that permit(s) retention the lead molecule may be advantageous. Con-
or reinforcement of pharmacological action? sidering the possible permutations and combi-
and What change(s) can be made in the mole- nations of these changes that are possible
cule that diminish, destroy, or qualitatively within a single lead molecule, it is obvious that
change the basic pharmacologic action? The the number of analogs that can be designed
ideal program of analog design should involve from a lead molecule is potentially extremely
asingle structural change in the lead molecule large. Some structural changes that might be
with each new compound designed and syn- proposed are chemically impracticable (e.g.,
the molecule is incapable of existence) or the
thesized. An analog in which multiple changes
proposed analog may represent an over-
in the structure of the lead molecule have been
whelmingly formidable synthetic challenge.
made simultaneously may occasionally reveal
These negative factors will diminish the pop-
highly desirable pharmacologic effects. How- ulation of possible analogs to be considered for
ever, relatively little useful structure-activity synthesis; nevertheless, the medicinal chemist
information will be gained from such a mole- will always be confronted with a multitude of
cule. It cannot be readily determined which possible target molecules. Rational decisions
change (or combination of changes) was re- must be made concerning which compounds
sponsible for the change in the pharmacologi- should be synthesized, and synthetic priorities
cal effect. On a practical basis, it is frequently must be established for target compounds. All
chemically impossible to effect only one dis- other factors being equal, the medicinal chem-
crete change in the lead molecule; one simple ist should synthesize the less-challenging
molecular structural alteration can influence compounds first. Beyond this truism, the me-
many structural and chemical parameters. dicinal chemist's best resources are intuition
Nonetheless, the medicinal chemist should be and imagination. Selection and application of
cognizant of the disadvantages inherent in specific molecular modification strategies de-
"shotgun" (nonsystematic, multiparametric) pend on the chemical structure of the lead
modification of lead molecules. compound and, to a certain extent, on the
In analog design, molecular modification of pharmacological action to be studied.
the lead compound can involve one or more of All of the strategies of analog design as well
the following strategies: as subsequent decisions concerning target
compounds to be synthesized can be facili-
1. Bioisosteric replacement. tated by the use of computational chemistry
2. Design of rigid analogs. (computer-assisted molecular modeling) tech-
3. Homologation of alkyl chain(s) or alter- niques. These may give the medicinal chemist
ation of chain branching, design of aro- further insights into structural, stereochemi-
matic ring-position isomers, alteration of cal, and electronic implications of the pro-
ring size, and substitution of an aromatic posed molecular modification.
ring for a saturated one, or the converse.
4. Alteration of stereochemistry, or design of
2 BlOlSOSTERlC REPLACEMENT A N D
geometric isomers or stereoisomers.
NONISOSTERIC BIOANALOCS
5. Design of fragments of the lead molecule (NONCLASSICAL BIOISOSTERES)
that contain the pharmacophoric group
(bond disconnection). The concept of bioisosterism derives from
6. Alteration of interatomic distances within Langmuir's (1)observation that certain phys-
the pharmacophoric group or in other parts ical properties of chemically different sub-
of the molecule. stances (e.g., carbon monoxide and nitrogen,
ketene and diazomethane) are strikingly sim-
None of these strategies is inherently pref- ilar. These similarities were rationalized on
erable to the others; all merit the medicinal the basis that carbon monoxide and nitrogen
chemist's attention and consideration. Appli- both have 14 orbital electrons and, similarly,
690 Analog Design
Table 16.1 Bioisosteric Atoms and Groups bioisosteres. Floersheim et al. (5) proposed
1. Univalent that such compounds be designated as
-F -OH -NH2 -CH3 -C1 nonisosteric bioanalogs, replacing the older
S H -pH2 term, "nonclassical bioisosteres." However,
-I t-C4H9 most of the contemporary literature retains
-Br i-C3H7 the nonclassical bioisostere terminology. Ta-
2. Bivalent ble 16.2 lists representative nonclassical bio-
-0- 4- S e - -CH,- -NH- isosteres.
3. Tervalent Dihydromuscimol(1) and thiomuscimol(2)
-N= -CH=
are cyclic analogs of y-aminobutyric acid
-P= -As=
4. Quadrivalent
(GABA) (31, in which the C=N moiety of the
-C- S i -
5. Ring equivalents
-CH=CH- -S-(e.g., benzene, thiophene)
4 H - =N-(e.g., benzene, pyridine)
-0- S - -CH2- -NH-
diazomethane and ketene both have 22 orbital

electrons. Medicinal chemists have expanded
and adapted the original concept to the analy-
sis of biological activity. The following defini-
tion has been proposed: "Bioisosteres are
groups or molecules which have chemical and
physical properties producing broadly simi-
lar biological properties" (2). This definition
might be modified to include the concept that
bioisosteres may produce opposite biological
effects, and these effects are frequently a re-
-
flection of some action on the same biological
process or at the same receptor site. Bioisos-
teric similarity of molecules is commonly as-
signed on the basis of the number of valence heterocyclic ring is considered to be bioisos-
electrons of an atom or a group of atoms rather teric with the of GABA. The -S- moiety
than on the total number of orbital electrons, of thiomuscimol is bioisosteric with the ring
as was originally specified by Langmuir. In a -0- of dihydromuscimol. Both (1)and (2) are
remarkable number of instances, compounds highly potent agonists at GABA, receptors, as
result that have similar (or even diametrically determined in an electrophoresis-based assay
opposite) pharmacological effects compared (6).
with those of the parent compound. Catego- Because of its bioisosteric similarity to the
ries of classic bioisosteres have been described normal physiological substrate L-dopa (4),
(2) (Table 16.1). L-mimosine (5) inhibits catechol oxidation by
A more recent comprehensive review of the enzyme tyrosinase (7). These compounds
bioisosterism appeared in 1996 (3). In a short exemplify a situation in which bioisosteres dis-
communication, Burger (4) discussed and pro- play opposite pharmacologic effects at the
vided valuable insights into isosterism and same receptor.
bioanalogy in drug design. The sulfonium bioisostere (6) of N,N-di-
Many compounds have been identified that methyldopamine (7) retains the dopaminergic
comply with the "biology" aspect of the bio- agonist effect displayed by (7) (8). The fact
isostere concept but that do not fit the strict that (6) bears a permanent unit positive
chemical (steric and electronic) definition of charge was invoked in support of the hypoth-
Table 16.2 Nonclassical Bioisosteres
1. Carbonyl group
Carboxylic acid group
H
-NHCN -CH(CN)2
Catechol
H
X=O,NR
Halogen
X CF3 CN N(CN)2 C(CN),
Thioether
Thiourea
NO2 R NR3
Hydrogen
H F
Analog Design
use of the antidepressant dibenzazepine deriva-

tive irnipramine (8) as the lead. The structural
similarity between imipramine and the phe-
nothiazine antipsychotics [typified by chlor-
promazine (9)] is apparent. Although these two
cH2
I
H2N- C-H
I
COOH
(4)
CH~
I
H2N-C-H
I
COOH
bioisosteric molecules have different pharma-

cological properties and therapeutic uses and
likely have different mechanisms and sites of
action in the central nervous system (91, they
share the property of being psychotropic'
agents. They illustrate the observation that
bioisosteric manipulation of a molecule may
change its mode of action. In the antidepres-
sant dibenzocycloheptene derivative amitrip-
tyline (lo),the ring nitrogen of imipramine is
esis that /3-phenethylamines such as (7)inter-

act with dopamine receptors in their proton- replaced by an exocyclic olefin moiety. De-
ated (cationic) form. mexiptiline (1l),doxepin (121, and dothiepin
Bioisosteric replacement strategy has been (13) represent other bioisosteric modifications
fruitful in design of psychoactive agents, by of imipramine that possess antidepressant ac-
uncertain. Apparently, attempts were made
(13) to isolate the E- and2-isomers of all of the
compounds prepared in the series studied, but
no information was provided about the stereo-
chemistry of the dothiepin material used in
the pharmacological studies.
Replacement of the entire indole ring sys-
tem of melatonin (14) by a naphthalene ring
(15)permitted retention of binding affinity in

an ovine pars tuberalis membrane assay (14).
From a study (15) of a series of muscarinic
M, agonists derived from the structure of
arecoline (16) and typified by (171, it was con-
tivity (10). Variations in the precise nature of

psychotropic effects manifested by compounds
(8-13)may be ascribed to the marked changes
in orientation in space of the two benzene ring
components of the tricyclic portion of these
molecules, imposed on them by the isoste-
ric moieties (-CH=CH-, -CH,-+Hz-, S-,
-CH20-, CH2S-). The 2-isomer of oxepin is a
more potent antidepressant than the E-iso-
mer, but the drug is marketed as a mixture of
isomers (11).Doxepin is also a potent antago-
nist at histamine H, receptors. The 2-isomer
is somewhat more potent than the E-isomer
against histamine in the guinea pig ileum (12).
The identity of the geometric isomer of dothi-
epin (13) used in pharmacological testing is
Analog Design
cluded that the 2-N-methoxyimidoyl nitrile COOH

group serves as a stable methyl ester bioisos- I
tere. The 2-isomer has an l&fold higher affin-
ity than its E-isomer for the rat cerebral cortex
tissue used in the binding studies.
Replacement of the methyl ester moiety of
the muscarinic partial agonist arecoline (16)
by the putative nonclassical bioisosteric 1,2,4-
oxadiazole ring system (181, where R = un-
165 nM) with respect to potency and effect on

two arsenic-resistant strains of the organism.
Although the strategy of bioisosteric re-
placement may be a powerful and highly pro-
ductive tool in analog design, Thornber (2) has
emphasized that fundamental chemical and
physical chemical changes can be expected to
result from these molecular modifications,
which may in themselves profoundly affect the
pharmacological action of the resulting mole-
cules. Contributing factors include change in
branched C,-C, alkyl) permits retention of the size of the atom or group introduced,
muscarinic agonism (16). which may affect the overall shape and size of
The 1,2,4-oxadiazole ring system of quis- the molecule; changes in bond angles; change
qualic acid (19),an agonist at a subpopulation in partition coefficient; change in the pK, of
the molecule; alteration of chemical reactivity
and chemical stability of the molecule, with
accompanying qualitative and quantitative 4-
teration of in vivo metabolism of the molecule;
and change in hydrogen-bonding capacity.
The chemical and biological results and phar-
macological significance of many of these fac-
tors are unpredictable and must be deter-
mined experimentally.
3 R I G I D O R SEMIRIGID
(CONFORMATIONALLY RESTRICTED)
ANALOGS
of glutamate receptors (171, can be considered Imposition of some degree of molecular rigid-
to be a nonclassical bioisostere of the corre- ity on a flexible organic molecule (e.g., by in-
sponding carboxyl group of glutamic acid (20). corporation of elements of the flexible mole-
Compounds (21-23) illustrate further ex- cule into a rigid ring system or by introduction
amples of nonclassical bioisosteres. Com- of a carbon-carbon double or triple bond) may
pound (21) was reported to display anti- result in potent, biologically active agents that
trypanosomal activity (18).The analogs (22) show a higher degree of specificity of pharma-
and (23) also displayed antitrypanosomal ac- cologic effect. There are possible advantages
tivity (19). Compound (22) demonstrated the to this technique (20): the key functional
most impressive activity (IC,, values of 40 and groups are held in one steric disposition or, in
3 Rigid or Semirigid (Conformationally Restricted) Analogs 695
the case of a semirigid structure, the key func-

tional groups are constrained to a limited
range of steric dispositions and interatomic
distances. By the rigid analog strategy, it is
possible to approximate "frozen" conforrna-
tions of a flexible lead molecule that, if an en-
hanced pharmacological effect results, may as-
sist in defining and understanding structure-
activity parameters, including the three-
dimensional geometry of the pharmacophore.
These data may be useful in constructing a ferent conformations assumed by the flexible
model of the topography of the receptor. Com- dopamine molecule at its various in vivo sites
putational chemistry strategies can be a valu- of action.
able tool in designing rigid analogs. Restriction of conformational freedom of
The semirigid tetralin congeners (24) and the acyl moiety in 4-DAMP (261, an antimus-
(25) of Nfl-dimethyldoparnine (7) represent
the two rotameric conformational extremes of

the spatial relationship of the aromatic ring of
dopamine to the ethylamine side chain when
the ring and the side chain are coplanar. Com- carinic compound displaying higher &nity at
pounds (24) and (25) display effects at differ- ileal M, acetylcholine receptors than at atrial
ent subpopulations of dopamine receptors M, receptors) was imposed by the structure of
(211, which have been proposed to reflect dif- the spiro-compound (27) (22).
Analog Design
pound (29) was described as a partial agonist.

It was concluded that the conformation of the
indole-3-ethylamine portion of the fused sys-
tem (29) reflects the conformation assumed by
the flexible system (28) when it binds to the
5-HT,, receptor.
Imposition of rigidity into the piperidine
ring of the opioid analgesic meperidine (30)by
introduction of a methylene bridge between
carbons 2 and 5 resulted in epimers (31)and
(32), representing "frozen" conformations of
Spiro-DAMP (27) was slightly more potent meperidine (24).
at M, muscarinic receptors than at M, recep-
tors. It was proposed that the geometry of the
spiro-molecule might reflect the receptor-
bound conformation of 4-DAMP (26);this con-
formation differs from that observed in the
crystal structure of 4-DAMP.
Conformational restriction was introduced
into the side chain of a nonclassical serotonin
bioisostere (28), a selective 5-HT,, and
5-HT,, receptor agonist) by its incorporation
into a fused six-membered ring (29) (23).
The conformational restrictions imposed The exo-phenyl isomer (32) was six times as
on the indole-3-ethylamine moiety permitted potent as the endo-phenyl isomer (31), and it
retention of affinity for the 5-HT,, receptor was twice as potent as meperidine itself in a
but it diminished affinity for the 5-HT,, re- benzoquinone-induced writhing assay for an-
ceptor by a factor of 1000. In two functional algesic effect.
assays, (29) exhibited potency equal to or mar- Rigid analogs (331, (341, and (35)of phen-
ginally greater than that of serotonin. Com- cyclidine (36) possess a rigid carbocyclic struc-
3 Rigid or Semirigid (Conformationally Restricted) Analogs 697
Incorporation of the choline portion of ace-

tylcholine (37) into a cyclopropane ring sys-
tem resulted in cis- and trans-1,Bdisubsti-
tuted molecules, (38) and (391, in which the
acetylcholine molecule is locked into folded

("cisoid") and extended ("transoid") confor-
mations.
The (lS),(2S)-(+)-trans-isomer (39) was
somewhat more potent than acetylcholine it-
self in tissue and whole-animal assays for
muscarinic agonism (27) and it was an excel-
lent substrate for acetylcholinesterase. The
(1R),(2R)enantiomer of (39) was exponen-
tially less potent than its (1S),(2S)enantiomer
in the assays cited, but it was a good substrate
for acetylcholinesterase. The (2)-cis-isomer
(38) was almost inert at nicotinic and musca-
rinic receptors and it was a poor substrate for
ture and an attached piperidine ring that is acetylcholinesterase. These data were taken
free to rotate. All three rigid analogs showed as evidence that the flexible acetylcholine mol-
low to no affinity for the PCP receptor, but ecule interacts with muscarinic receptors in
they had good affinity in a a-receptor-binding an extended geometry of the chain of atoms
assay (25). These binding data were proposed (28). When this semirigid analog strategy was
to be useful in defining a model for the a-re- applied to a cyclobutane ring system (com-
ceptor pharmacophore. This study also pro- pound 40), there was a marked loss of phar-
vided additional evidence that the a-receptor macologic effect (29).This result is enigmatic;
is independent of the PCP-binding site (cf. differences in interatomic distances and bond
Ref. 26 and references therein). angles in the pharmacophoric moiety as well
Analog Design
as differences in extraneous molecular bulk

seem insufficient to account for the dramatic
difference in pharmacological potencies be-
tween the three- and the four-membered ring
systems.
The cyclopropane ring was employed to im-
part a degree of rigidity to the side chain of
dopamine (structures 41 and 42) (30).
corpus striatum tissue, but the binding affini-

ties for (43a) and (43b) are much less than
that of dopamine (32). Racemic trans-(43a)
was more potent than the trans-primary
amine (43b), but it was still much less potent,
than dopamine. The racemic cis-isomer of
(44b) demonstrated very low aMinity for the
receptor.
A P-phenethylamine moiety was incorpo-
rated into the trans-decalin ring system (45)
Neither isomer displayed effects at dopa-

mine receptors, but both were a-adrenoceptor
agonists, with the (+)-trans-isomer(41) being
approximately five times more potent than the
(+)-cis-isomer(42). It was suggested (31)that
these findings may contribute to determining and the racemic modifications of all four pos-
the preferred conformation of P-phenethyl- sible isomers were prepared as "frozen" ana-
amines at the a-adrenoceptor. The racemic logs of possibly significant conformations of
trans-cyclobutane congeners (43a) and (43b) the flexible norepinephrine molecule (33). All
are more potent than their racemic cis-iso- four compounds displayed approximately
mers (44a) and (44b) in binding studies on rat equal (extremely low) potency. This result il-
4 Homologation of Alkyl Chain or Alteration of Chain Branching 699
lustrates that the achievement of conforma- succinate. These results led to the conclusion
tional integrity by incorporation of a flexible that the molecular shape of the E-ester (51)
pharmacophore into a bulky, complex mole- more closely approximates that assumed by
cule may be at the expense of biological activ- succinylcholine when it interacts with myo-
ity. neural nicotinic receptors.
Rigidity was introduced into the glutamic Restricted rotation was also introduced
acid moiety in a series of bioisosteric conge- into the succinic acid moiety of succinyldicho-
ners (46-48) (34). These systems showed po- line by preparation of the choline esters of cis-
and trans-cyclopropane-1,2-dicarboxylic acids
(52) and (53)(36,371. Myoneural blocking ac-
tivity was assessed in dogs (37) and-cats (36)
and, as indicated above for the E- and Z-ole-
finic esters (51) and (50), the extended trans-
isomer (53) demonstrated much greater po-
tency and a longer duration of action than
those of the cis-isomer (52). The cyclobutane
congeners (54) and (55)presented unexpected
results that are difficult to rationalize: the cis-
isomer (54) was much less potent than the
tent agonist activity at subpopulations of trans-isomer (55)in a cat assay for myoneural
metabotropic glutamate receptors. The geom- blockade, but it presented a decidedly longer
etry of these congeners led to the conclusion duration of action than that of the trans-iso-
that glutamic acid itself interacts with the mer (36).
metabotropic glutamate receptors in a fully
extended conformation.
The rotational orientation of the ester moi- 4 H O M O L O G A T I O N O F ALKYL C H A I N
eties of the myoneural blocking agent succi- O R ALTERATION O F C H A I N BRANCHING;
nyldicholine (49) was restricted by introduc- CHANGES IN RlNG SIZE; RING-POSITION
tion of a double bond into the succinic acid ISOMERS; A N D SUBSTITUTION O F A N
portion (501, (51) (35). The E-fumarate ester AROMATIC RlNG FOR A SATURATED
ONE, O R THE CONVERSE
Change in size or branching of an alkyl chain

on a bioactive molecule may have profound
(and sometimes unpredictable) effects on
physical and pharmacological properties. Al-
teration of the size andlor shape of an alkyl
substituent can affect the conformational
preference of a flexible molecule and may alter
the spatial relationships of the components of
the pharmacophore, which may be reflected in
the ability of the molecule to achieve comple-
mentarity with its receptor or with the cata-
lytic surface of a metabolizing enzyme. The
alkyl group itself may represent a binding site
with the receptor (through hydrophobic inter-
actions), and alteration of the chain may alter
its binding capacity. Position isomers of sub-
stituents (even alkyl groups) on an aromatic
(51) was approximately one-half as potent as ring may possess different pharmacological
the flexible succinate ester (49), whereas the properties. In addition to their ability to affect
2-maleate ester (50) was 1/40 as potent as the electron distribution over an aromatic ring sys-
Analog Design
tem, position isomers may differ in their comple- remainder of a conformationally variable mole-
mentarity to receptors, and the position of a sub- cule. What has sometimes been trivialized as
stituent on a ring may influence the spatial "methyl group roulette" may indeed be an im-
occupancy of the ring system with respect to the portant parameter in the design of analogs.
Homologation of the N-alkyl chain in rats. It seems likely that the enhanced dopa-
norapomorphine (56) from methyl (57) to n- minergic agonist effects conferred by N-ethyl
propyl(59) produced incremental increases in and N-n-propyl groups on aporphine and
emetic response in dogs and in stereotypy re- p-phenethylamine-derived molecules are not
sponses in rodents (38,39). related merely to enhanced lipophilic charac-
ter or to partitioning phenomena, but rather
to the likelihood that the two- and three-car-
bon chains have a positive affinity for subsites
on certain dopamine receptors. It may be spec-
ulated that these receptor subsites do not ac-
commodate longer alkyl chains (e.g., n-butyl
or n-pentyl). However, different assays fordo-
paminergic stimulant effects and different an-
imal species were used in refs. 41,42, and 43,
and care must be exercised in drawing firm
structure-activity relationship conclusions
based on these data.
The alkyl linker between the two heterocy-
clic ring systems in structure (65) was modi-
he next member of the series,the n-butyl

homolog (601, demonstrated a tremendous
loss in potency and activity compared to that
of the lower homologs (39). Studies of NJV-
dialkyl dopamines (61-64) revealed that some
(66) linker = Y =H
combinations of alkyl groups may impart a
high degree of dopamine agonist effects (40). (67) linker = 4 Y=H
NJV-dimethyldopamine (61) is extremely CH3
potent in assays for dopaminergicagonism (pi- (68) linker = Y =H
geon pecking, emesis in dogs, and inhibition of
cat cardioaccelerator nerve), as is NJV-di-n- (69) linker = Y =H
propyldopamine (62) (41). N-n-Propyl-N-n- CH3
butyldopamine (63) is potent in behavioral as-
says in nigra-lesioned rats (42). However, fied in studies of the ability of analogs to bind
NJV-di-n-butyldopamine(64)is virtually inert to the cholecystokinin-B receptor (44). When
in these assays (41, 42). N,N-di-n-Pentyldo- this linking group was -CH2--CH2-, the com-
pamine was reported (43) to be inert in a pound (structure 66) was extremely potent in
caudectomized mouse behavioral assay and in radioligand displacement assays on mouse
a rotatory behavioral assay in nigra-lesioned brain membranes. Introduction of carbon-car-
Analog Design
bon unsaturation (E-olefin) into the linker ished toxicity 16-fold compared to that of the
(structure 67) resulted in a 16-fold decrease in nonmethylated system (70).
binding ability; this suggests that conforma- In contrast, (R)-(71), the 2'-methyl conge-
tional restriction and limitation of molecular ner, exhibited only a fivefold decrease in anti-
flexibility have deleterious effects on biologi- viral potency compared to that of compound
cal activity. However, no data were reported (70),but it also exhibited a 30-fold lessening of
on the 2-isomer of this olefinic molecule, so toxicity, to produce a substantial increase in
that caution should be exercised in drawing therapeutic index over that of (70). The (S)-
conclusions. Introduction of a bromine sub- (71) enantiomer was somewhat less potent
stituent (65, Y = Br) into (66) produced a than its (R)-enantiomer. The gem-dimethyl
threefold increase in potency, whereas the congener (73) was also somewhat less potent
same structural modification of the olefin (67) than the (R)-2'-monomethyl compound (71)
resulted in a threefold decrease in potency. and it was markedly more toxic. The (S)-2'-
Branching the linker chain with a methyl methyl stereoisomer of (71) exhibited a decid-
group adjacent to the quinazolinone ring (68) edly lower therapeutic index than that of its
resulted in a 350-fold decrease in affinity. (R)-enantiomer.
However, chain branching with a methyl Closely related to alteration of chain
group in the alternate position on the ethylene length andlor chain branching is alteration
chain produced compound (69), whose recep- of ring size. Compound (74) showed nano-
tor affhity was of the same order of magnitude
as the extremely potent lead compound (66).
The exponential difference in receptor-bind-
ing ability exhibited by the two isomeric
branched-chain linker compounds (68) and
(69) was ascribed to unfavorable steric inter-
actions between the receptor and the linker
methyl group of (68) (44). This conclusion may
be compromised by the fact that both (68) and
(69) were evaluated as their racemates.
A study (45) of 2-(phosphonomethoxy)eth-
ylguanidines (70-73) as antiviral (herpes and
molar-level activity as an inhibitor of 5-li-

poxygenase (46). The size of the oxygen-con-
taining ring as well as the position of the
oxygen member with respect to the methoxy
and aryl substituents was varied. The (sev-
en-membered) oxepane ring derivative (79)
and the (six-membered) tetrahydropyran
ring derivative (78) showed two- to 10-fold
(70)R=R'=H enhanced potency over that of the tetrahy-
(71) R = H ; R'=CH3 drofuran lead compound (74). The other an-
(72) R = C H 3 ; R ' = H alogs shown demonstrated much weaker en-
(73) R = R' = H; R = gem-dimethyl zyme inhibitory activity.
In a series of spiro-tetraoxacycloalkanes
(go), with varying heterocyclic ring sizes, it
HIV) agents revealed that branching of the was found that the compound where n = 1
ethylene chain by introduction of a methyl demonstrated marked antimalarial activity
group at the 1'-position (as in racemic 72) di- against P. bergei and P. falciparum, and
minished antiviral activity 25-fold and dimin- showed low toxicity (47). The analog in which
I Hotnologation of AIkyl Chain w Ateration of Chain Branching
60-fold in a rat superior cervical ganglion as-

say (49). Other structural variations studied
included benzene ring A meta-substituted and
benzene ring L meta-substituted; benzene
ring A meta-substituted and benzene ring L
para-substituted; and benzene ring A para-
substituted and benzene ring L para-substi-
tuted. All of these variations were much less
potent than those of (82).
n = 4 showed strong activity against P. falci- The phenolic group of serotonin (83) was
parum but it was unimpressive in the P. bergei incorporated into a pyran ring (84) (50), thus
assay.
In a series of arylsulfonamidophenethano-
lamines (81) (48), derivatives bearing the sul-
fonamido group meta to the ethanolamine side

chain displayed properties of a p-adrenoceptor
partial agonist, whereas 19 compounds bear-
ing the sulfonamido group in thepara position
were p-antagonists.
Changing the positions of attachment of also introducing an alkyl substituent at posi-
the two benzene rings linking the quinolinium tion 4 of the indole ring system.
moieties of the calcium-activated potassium This tricyclic analog (84) lacked serotonin-
channel blocker (82) reduced activity 10- to like affinity for 5-HT, receptors, but it demon-
strated high and selective affinity for 5-HT,
receptors. Like serotonin, it stimulated phos-
phatidyl inositol turnover in rat brain slices.
The low affinity for 5-HT, receptors was ratio-
nalized, in part, on the basis of steric interfer-
ence between the dihydropyran ring and the
aminoethyl side chain, which inhibits the
tryptamine system from assuming the folded
ergotlike conformation, as illustrated in (851,
which probably approximates the conforrna-
tion of serotonin at 5-HT, receptors. The
methyl ether of serotonin exhibits approxi-
mately the same affinity for 5-HT,, sites as
does serotonin (51). The methyl ether also has
marked affinity for 5-HT,, and 5-HT,, recep-
tors, but it has diminished affinity (compared
with serotonin) at 5-HT,, receptors. It was
Analog Design
In a study of anticonvulsant agents, the

(S)-benzene ring analog (90) was somewhat
more potent in a mouse assay than was the
(S)-cyclohexane analog (91) (56). There was
suggested (50) that the high af'finity for the
5-HT, receptor exhibited by such compounds
as (84) demonstrates that the C-5 hydroxyl
group of serotonin can function as a hydrogen-
bond acceptor at the receptor.
Replacement of the benzene ring of the po-
tent indirect acting central noradrenergic
stimulant methamphetamine (86) by a cyclo-
hexane ring (compound 87) results in some only a slight difference in potency between
(R)- and (S)-(90). The (R)-enantiomer of (91)
was not reported.
5 ALTERATION O F STEREOCHEMISTRY
A N D D E S I G N O F STEREOISOMERS
A N D G E O M E T R I C ISOMERS
The earlier, almost universally accepted belief
that if one enantiomer of a chiral molecule
demonstrates pharmacological activity, the
other enantiomer will be pharmacologically
inert, is not valid. It must be anticipated that
all stereoisomers of an organic molecule will
exhibit pharmacological effects, frequently
widely different and unpredictable. Many ex-
amples of qualitative and quantitative differ-
loss of pressor effect, but the drug, like am- ences in metabolism of enantiomers are docu-
phetamine, has been used as a nasal deconges- mented (57).
tant, and it has CNS-mediated anorexigenic ( 2)-3-(3-Hydroxypheny1)-N-n-propylpip-
effect (52,53). It is said to have somewhat less eridine (3-PPP, 92) was originally described
central stimulant action than the correspond- (58) as having highly selective activity at do-
ing aromatic ring derivatives (54a-d). paminergic autoreceptors.
The benzene (88) and cyclohexane (89) At high doses (R)-(92) selectively stimu-
congeners have almost identical effects in lated presynaptic dopaminergic receptor sites,
blocking bronchoconstriction produced by his- whereas at lower doses it selectively stimu-
tamine, serotonin, or acetylcholine in the lated postsynaptic receptor sites (59). In con-
guinea pig in vivo (55).They also showed iden- trast, the (S)-enantiomer stimulated presyn-
tical LD,, values in mice. The stereochemistry aptic dopamine receptors and at the same dose
of these compounds was not addressed. level, it blocked postsynaptic dopamine recep-
5 Alteration of Stereochemistry and Design of Stereoisomers and Geometric Isomers 705
aration) (61). Both enantiomers bind strongly

to 5-HT,, receptors from rat forebrain mem-
brane. The phenomenon of enantiomers that
possess opposite effects (agonist-antagonist)
at the same receptor, once considered to be
extremely rare, has recently been noted more
often, probably because of the increasing rec-
ognition by medicinal chemists and pharma-
cologists that each member of an enantiomeric
pair may possess its own unique and unpre-
dictable pharmacology.
tors. Thus, this enantiomer exhibits a bifunc- In addition to stereochemistry about a car-
tional mode of doparninergic attenuation: that bon center, other potentially chiral atoms of-
of presynaptic agonism and postsynaptic an- fer possibilities for pharmacological signifi-
tagonism. The observed pharmacological ef- cance. A gastroprokinetic compound (95) with
fects of the racemic modification are the sum
total of the complex activities of the two enan-
tiomers, and the pharmacology of racemic
3-PPP is not an accurate reflection of the
pharmacological properties of the individual
enantiomers. The contemporary literature
strongly reflects the philosophy that pharma-
cological testing only of a racemic mixture is
inadequate and may be misleading.
(R)-(-)-11-Hydroxy-10-methylaporphine
(93) is a highly selective serotonergic 5-HT,,
agonist (60).
Remarkably, the (S)-enantiomer (94) is a
potent antagonist at this same subpopulation
of serotonin receptors (guinea pig ileum prep-
serotonergic activity bears a chiral sulfoxide
moiety (62). The enantiomers are equiptent,
but the (S)-enantiomerdemonstrates a greater
intrinsic activity than that of the (R)-enantio-
mer.
Casy (63) cited pharmacological differences
between stereoisomers of chiral sulfoxide moi-
eties in cholinergic oxathiolane congeners
(96-99) of muscarine.
cis- and trans-4-Aminocrotonic acids (100)

and (101) were prepared (64) as congeners of
yaminobutyric acid ( G B A ) (6).
Analog Design
H2C COOH
I
geometric isomer is an open-chain analog of

the natural estrogen estradiol (105) (66).'In
COOH dienestrol(106),the geometric isomerism pos-
H 2 N d
sible with olefinic moieties has been further
The folded 2-isomer (100) was inactive in

assays for GABA agonism, whereas the ex-
tended E-isomer (101) was active. These data
demonstrate biological differences of geomet-
ric isomers, which in turn involve a parameter
discussed previously: imposition of a degree of
structural rigidity on the molecule. A strategy
analogous to this El2 olefinic GABA congener
design addressed cis- and trans-1,2-disubsti-
tuted cyclopropane derivatives (102) and
(1031, whose relative effects at GABA recep-
tors paralleled those of the olefinic derivatives
(65).
The E-isomer of the diethylstilbestrol
structure (104) has 10 times the estrogenic
potency of the 2-isomer; this effect has been
rationalized from the conclusion that the E-
F
6 Fragments of the Lead Molecule
exploited to achieve a similar kind of open- noid X-receptor ligand and it is inactive at the
chain analogy to the steroid ring system as in retinoic acid receptor, whereas the (R,R)-en-
diethylstilbestrol, and a high level of estro- antiomer is an extremely weak agonist at the
genic activity results. retinoid X-receptor, although it has some ef-
Hexestrol(107), the saturated congener of fect a t the retinoic acid receptor. Thus, the
molecular modifications shown in (109) re-
sult in selectivity of action a t these two re-
ceptors.
6 FRAGMENTS OF THE LEAD MOLECULE
Design of fragments of a lead molecule is based

on the premise that some lead molecules, es-
diethylstilbestrol (104), is the meso-form of pecially polycyclic natural products, may be
the molecule. It has the greatest estrogenic much more structurally complex than is nec-
potency of the three possible stereoisomers; essary for optimal pharmacologic effect. It is
however, it is less potent than diethylstilbes- hypothesized that a pharmacophoric moiety
trol(67). may be buried within the complex structure of
A partial restriction of side-chain flexibility the lead compound and, if this pharmacophore
in retinoic acid (108) was achieved by incorpo- can be clearly defined, it may be possible to
rating portions of the side chain into a ben- "dissect" it out chemically. The result may be
zene ring and a cyclopropane ring (109) (68). biologically active, simpler molecules that may
themselves be used as leads in further analog
design. A bond disconnection strategy may be
employed in which bonds in the polycyclic
structure are broken or removed to destroy
one or more of the rings. The result may be a
valuable drug that is more accessible (through
chemical synthesis) than the original lead
molecule. A possible disadvantage to this
strategy of analog design is that the greater
flexibility that is introduced into a rigid mole-
cule may compromise or destroy the confor-
mational integrity that may have existed in
the pharmacophoric portion, at the expense of
activity and/or potency. There may be a simi-
lar destruction of chiral centers, which may be
undesirable. Morphine (110) typifies a lead
molecule for which fragment analog design
I
has been used.
COOH
Introduction of the cyclopropane ring

changes the corresponding trans-olefinic moi-
ety of (108) to a cisoid disposition in (log),
thus changing the overall steric disposition of
the side chain. Moreover, the cyclopropane
ring introduces chirality into the molecule.
The (S,S)-enantiomershown is a potent reti-
Analog Design
The analgesic preceptor pharmacophore

of morphine has been defined (69) as compris-
ing the basic nitrogen atom, the aromatic ring
located three carbon atoms from the nitrogen,
and a quaternary carbon adjacent to the aro-
matic ring, which provides a region of molec-
ular bulk. A bond disconnection strategy in-
volved disruption of the hydrofuran ring to
give rise to morphinan derivatives [e.g., levor-
phanol (11I)], whose pharmacologic effects
must be noted, however, that the discovery of

analgesic activity in 4-phenylpiperidine deriv-
atives was not a result of a systematic struc-
ture-activity study of the morphine molecule,
but was serendipitous (71).
Asperlicin (115), a potent cholecystoki-
nin-A antagonist, was subjected to two differ-
ent bond disconnection strategies, as indi-
cated (72).
closely parallel those of morphine (70). Fur- Path A leads to tryptophan derivatives
ther simplification of the morphine ring sys- (1161,some of which are potent cholecystoki-
tem led to benzomorphan derivatives, typified nin antagonists (73). Some quinazolinone de-
by metazocine (112), in which morphinelike rivatives (117) of disconnection pathway B
showed extremely high potency and excellent
selectivity as cholecystokinin-B receptor sub-
type ligands (44). A combination of X-ray crys-
tallography and computational chemistry was
used in the decision-making process in the
bond disconnection (44) and in the design of
the specific target molecules.
The myoneural-blocking pharmacophore
in d-tubocurarine (118)was speculated to in-
clude the two cationic heads (the quaternary
ammonium group and the protonated tertiary
analgesic activity is retained. Finally, 4-phe- m i n e ) ; the cationic heads are separated by 10
nylpiperidine derivatives typified by meperi- atoms (nine carbons and one oxygen).
dine (113) and the nonheterocyclic system Based on these parameters, a simple mole-
cule, decamethonium (119),in which two tri-
methylammonium heads are separated by 10
methylene groups to approximate the interni-
trogen distance in d-tubocurarine, was de-
signed independently by two groups of inves-
tigators (74, 75). This synthetic fragment/
analog of d-tubocurarine exhibits a high
degree of potency and activity in production of
flaccid paralysis of skeletal muscles, superfi-
methadone (114) present the putative analge- cially like that of the lead compound. How-
sic pharmacophore with a seemingly minimal ever, the myoneural blockade from d-tubocu-
number of extraneous atoms. These simple rarine is of the nondepolarizing type, whereas
compounds retain opioid analgesic activity. It decamethonium produces a depolarizing skel-
6 Fragments of the Lead Molecule
eta1muscle blockade. This fundamental mech- ence in the spectrum and severity of side
anistic difference is probably attributed, at effects and in the technique of employment of
least in part, to the flexibility of the decame- these two drugs in clinical practice. In all types
thonium molecule compared with that of d- of analog design, changes in chemical struc-
tubocurarine. There is a considerable differ- ture may result in unanticipated changes in
Analog Design
duces a pharmacological change from gangli-

onic blockade to myoneural blockade, and fur-
ther extension to 16-18 methylenes results in
loss of myoneural effects and a return of gan-
glionic blocking action.
mechanism of action, even though the chemi- Hemicholinium (124) competitively inhib-
cal nature of the pharmacophore may not be its the high affinity, sodium-dependent uptake
altered. of choline into the nerve terminal (the rate-
determining step in acetylcholine synthesis in
7 VARIATION IN INTERATOMIC the nerve terminal), thus depleting stores of
DISTANCES acetylcholine and producing slow onset, long-
duration myoneural blockade (78, 79). In a se-
Alteration of distances between portions of ries of congeners of hemicholinium, the cen-
the pharmacophore of a molecule (or even be- tral biphenyl portion of the molecule was
tween other portions of the molecule) may changed to terphenyl (125) and to p-phe-
produce profound qualitative and/or quantita- nylene (126). Both changes resulted in pro-
tive changes in pharmacological actions. In found loss of the myoneural blockade charac-
a,w-bis-trimethylammonium polymethylene teristic of hemicholinium (68).This result was
compounds (120-123), maximal activity for ascribed to alteration of the proposed opti-
blockade of autonomic ganglia (nicotinic N,

receptors) resides in those derivatives where n
= 5 or 6 (compounds 120 and 121) (76,77).
Ganglionic effects drop drastically when n
= 4 or 7. These observations have been ratio-
nalized as being a reflection of attainment of
optimal interquaternary distance in the
penta- and hexamethylene congeners, for op-
timal interaction with ganglionic receptor
subsites. Remarkably, as the number of meth-
ylene groups in (120) is greatly increased, a
high level of ganglionic-blocking potency re-
turns. The hexadecyl and octadecyl congeners
(122) and (123) are approximately four times
as potent at autonomic ganglia as the penta-
and hexamethylene compounds. As was men-
tioned previously, polymethylene bis-quater-
nary systems, in which the cationic heads are
separated by 10 methylene groups, have po-
tent effects at myoneural junctions (nicotinic
N, receptors) and have little ability to affect
nerve activity at autonomic ganglia. Thus,
extension of a bis-quaternary polyalkylene
molecule from five or six methylenes to 10 pro-
7 Variation in Interatomic Distances
mum interquaternary nitrogen distance of In a series of phenylalkylenetrimethylam-

14.4 A in hemicholinium (124),to 18.4 A in the monium derivatives (133-136), nicotinic ago-
terphenyl analog, and to 10.2 A in thep-phe- nism is maximal when n = 3 (compound 136).
nylene analog.
The central biphenyl spacer in hemicho-
linium was changed to a 2,7-disubstituted
phenanthrene (127), trans,trans-4,4'-bicyclo-
hexyl(128), and 2,2'-dimethylbiphenyl(129).
In all three of these svstems
" the 1 4 . 4 4 inter-
quaternary distance found in hemicholinium
was maintained; all of these congeners were
qualitatively and quantitatively similar to
hemicholinium in inhibition of neuromuscu-
lar transmission. Conformational analysis of
the polyalkylene congeners (130) and (131) It was concluded (82) that a moiety (here, a
demonstrated that, when the flexible polyal- benzene ring) with high electron density three
kylene chain is maximally extended and is in a or four single bond lengths (-6& from the
staggered conformation, the interquaternary cationic center is a requirement for nicotinic
distance in the hexamethylene congener (130) agonism in the series. These conclusions may
is approximately 14 A,and in the heptameth- be compromised by the fact that the alkylene
ylene congener (131)it is approximately 15 A. series was not extended beyond the three-car-
Both compounds exhibited hemicholinium- bon spacer chain. Therefore, it is not known
like inhibition of neuromuscular transmis- whether the four-carbon homolog would dis-
sion, although they were less potent than play greater or lesser potency than that of the
hemicholinium (80). This diminution of -DO- three-carbon molecule. Peculiarly, the first
tency might be ascribed to the compromising two members of the series have only very weak
of another structural parameter in the hemi- nicotine-like activity in the presence of atro-
cholinium molecule: the rigidity of the central pine.
biphenyl spacer unit that maintains the inter- A series of compounds, illustrated by (137),
nitrogen distance. was evaluated for in vitro affinity for a, and
Replacement of the benzene ring linkers of a,-adrenoceptors by radioligand-binding as-
(82) (see above) by alkyl linkers (structure says (83). All compounds showed good affini-
132) permitted retention of blocking activity ties for the a, adrenoceptor, with Ki values in
on calcium-activated potassium channels (81). the low nanomolar range. The polymethylene
The most potent member of the series studied chain spacer between fmylpiperazinylpyradiz-
was that in which m = n = 3. In this compound inone and aryl piperazine moieties was shown
the two respective internitrogen distances to influence the affinity and selectivity of these
closely approximate those in the benzene ring- compounds. A gradual increase in affinity for
linked com~ound(82). the a, adrenoceptor was observed, by length-
Analog Design
ening the polymethylene chain, up to a maxi- 13. M. Protiva, M. Rajsner, V . Seidlova, E.

mum of seven carbon atoms. Adlerova, and Z. Vejdelek, Experientia, 18,326
The a,/a, ratio of adrenoceptor-binding af- (1962).
finities for the series of compounds did not 14. S. Yous, J. Andrieux, H. E. Howell, P. J. Mor-
parallel the a , adrenoceptor-binding affinities gan, P. Reynard, B. Pfeiffer,D. Lessieur, and B.
Guardiola-Lemaitre, J. Med. Chem., 35, 1484
for the series, although all of the seven (C,-C,)
(1992).
congeners of (137)had somewhat more afin-
15. S. M. Bromidge, F. Brown, F. Cassidy, M. S. G.
ity for the a, receptor. Clark, S. Dabbs, M. S. Hadley, J. Hawkins, J. M .
Loudon, C. B. Naylor, B. S. Orlek, and G. J.
REFERENCES Riley, J. Med. Chem., 40,4265(1997).
16. P. Sauerberg, J. W . Kindtler, L. Nielsen, M. J.
1. I . Langmuir, J.Am.Chem. Soc., 41,868(1919). Sheardown, and T . Honor6, J. Med. Chem., 34,
2. C. W . Thornber, Chem. Soc. Rev., 8,563(1979). 687(1991).
3. G. A. Patani and E. G. LaVoie, Chem. Rev., 96, 17. M. Hollmann, A. O'Shea-Greenfield, S. W . Ro-
3147(1996). hers, and S. Heinemann, Nature, 342, 643
4. A. Burger, Med. Chem. Res., 4,89(1994). (1989).
5. P. Floersheim, E. Pombo-Villar, and G. Shapiro, 18. F. H. Bellevue, M. L. Boahbedason, R. H. W u ,
Chimia, 46,323(1992). R. A. Casero Jr., D. Rattendi, C. J. Bacchi, and
6. P. Krogsgaard-Larsen, H. Hjeds, E. Falk, F. S. P. M. Woster, Bioorg. Med. Chem. Lett.,6,2765
J~rgenson,and L. Nielsen, Adv. Drug Res., 17, (1996).
38(1988). 19. P. M. Woster, Annu. Rep. Med. Chem., 36,99
7. H.Hashiguchi and H. Takahashi, Mol. Pharma- (2001).
col., 13,362 (1977). 20. E. Mutschler and G. Lambrecht in E. Ariens,W .
Soudjin, and P. B. M. W . M. Timmermans, Eds.,
8. K. Anderson, A. Kuruvilla, N. Uretsky, and D. Stereochemistry and Biological Activity of
Miller, J. Med. Chem., 24,683(1981).
Drugs, Blackwell, Oxford, UK, 1983, p. 65.
9. R. J. Baldessarini in A. G. Gilman, L. S. Good- 21. J. G. Cannon in E. Jucker, Ed., Progress in Drug
man, T . W . Rall, and F. Murad, Eds., Goodman Research, Vol. 29, Birkhauser Verlag, Basel,
and Gilman's The Pharmacological Basis of Switzerland, 1985, pp. 324-334.
Therapeutics, 7 t h ed., Macmillan, New York,
22. C. Melchiorre, A. Chiarini, M. Gianella, D. Giar-
1985, pp. 393-397,414.
dina,W . Quaglia, and V . Tumiatti i nV . Claasen,
10. S. I. Ankier i n G. P. Ellis and G. B. West, Eds., Ed., Trends i n Drug Research,Vol. 13,Elsevier,
Progress i n Medicinal Chemistry, Vol. 23, Amsterdam, 1990, pp. 37-48.
Elsevier, Amsterdam, 1986, p. 121. 23. F. D. King, A. M. Brown, L. M. Gaster, A. J.
11. E. I. Isaacson in J. N. Delgado and W . A. Rem- Kaumann, A. D. Medhurst, S. G. Parker, A. A.
ers, Eds., Wilson and Gisvold's Textbook of Or- Parsons, T . L. Patch, and P. Raval, J. Med.
ganic Medicinal and Pharmaceutical Chemis- Chem., 36,1918(1993).
try, 10th ed., Lippincott-Raven, Philadelphia, 24. P. S. Portoghese, A. A. Mikhail, and H. J.
1998, p. 473. Kupferberg,J. Med. Chem., 11,219(1968).
12. L. Otsuki, J. Ishiko, M. Sakai, K. Shiniahara, 25. R. M. Moriarty, L. A. Enache, L. Zhao, R. Gi-
and T . Momiyama, Pharmacometrics (Tokyo),6, lardi, M. V . Mattson, and 0. Prakash, J. Med.
973(1972). Chem., 41,468(1998).
References
26. C. M. Bertha, B. J.Vilner, M.V . Mattson, W . D. 46. G. C. Crawley, R. I. Dowell, P. N . Edwards, S. J .

Bowen, K. Becketts, H. X u , R. B. Rothman, J. L. Foster, R. M. McMillan, E. R. H . Walker, and D.
Flippen-Anderson, and K. C. Rice, J. Med. Waterson, J. Med. Chem., 35, 2600 (1992).
Chem., 38,4776 (1995). 47. H.-S. Kim, Y . Nagai, K. Ono, K. Begum, Y .
27. C. Y . Chiou, J. P. Long, J. G. Cannon, and P. D. Wataya, Y . Hamada, K. Tsuchiya, A. Ma-
Armstrong, J. Pharmacol. Exp. Ther., 166,243 suyama, M. Nojima, and K. J. McCullough,
(1969). J. Med. Chem., 44, 2357 (2001).
r
28. J. G. Cannon and P. D. Armstrong, J. Med. 48. R. H. Uloth, J. R. Kirk, W . A. Gould, and A. A.
Chem., 13, 1037 (1970). Larsen, J. Med. Chem., 9,88 (1966).
29. J. G. Cannon, T . Lee, V . Sankaran, and J. P. 49. J. Campos-Rosa, D. Galanakis, A. Piergentili, K.
Long, J. Med. Chem., 18, 1027 (1975). Bhandari, C. R. Ganellin, P.M. Dunn, and D. H.
30. P. W . Ehrhardt, R. J. Gorczynski, and W . G. Jenkinson, J. Med. Chem., 43,420 (2000).
Anderson, J. Med. Chem., 22,907 (1975). 50. J. E. Macor, C. B. Fox, C. Johnson, B. K. Koe,
31. R. R. Ruff010 in G. Kunos, Ed., Adrenoceptors L. A. Label, and S. H. Zorn, J. Med. Chem., 35,
and Catecholamine Action, Part B, Wiley-Inter- 3625 (1992).
science, New York, 1983, pp. 10-11. 51. R. A. Glennon i n J. G. Cannon, Ed., Advances in
32. H. J. Komiskey, J. F. Bossart, D. D. Miller, and CNS Drug-Receptor Interactions, JAI Press,
P. N. Patel, Proc. Natl. Acad. Sci. USA, 75,2641 Greenwich, CT, 1991, pp. 147-148.
(1978). 52. B. B. Hoffmanin J. G. Hardman and L. E. Lim-
33. E. E. Smissman and W . H. Gastrock, J. Med. bird, Eds., Goodman and Gilman's The Phar-
Chem., 11, 860 (1968). macological Basis of Therapeutics, 10th ed.,
34. J. A. Monn, M. J. Valli, S. M. Massy, M. M. Han- McGraw-Hill, New York, 2001, p. 218.
sen, T . J. Kress, J. P. Wepsiec, A. R. Harkness, 53. R. L. Johnson in J. N. Delgado and W . A. Rem-
J. L. Grutsch Jr., R. A. Wright, B. G. Johnson, ers, Eds., Wilson and Gisvold's Textbook of Or-
S. L. And&, A. Kingston, R. Tomlinson, R. ganic Medicinal and Pharmaceutical Chemis-
Lewis, K. R. Griffey, J. P. Tizzano, and D. D. try, 10th ed., Lippincott-Raven, Philadelphia,
Schoepp, J. Med. Chem., 42,1027 (1999). 1998, p. 493.
35. J. F. McCarthy, J. G. Cannon, and J. P. Buckley, 54. (a) A. M. Lands, J. R. Lewis, and V . L. Nash,
J. Pharm. Sci., 52, 1168 (1963). J. Pharmacol. Exp. Ther., 83, 253 (1945); ( b )
36. J. F. McCarthy, J. G. Cannon, J. P. Buckley, and A. M. Lands, V . L. Nash, and B. L. Dertlinger,
W . J . Kinnard, J. Med. Chem., 7,72 (1964). J. Pharmacol. Exp. Ther., 89, 382 (1947); (c)
37. A. Burger and G. R. Bedford, J. Med. Chem., 6, D. F. Marsh and D. A. Herring, J. Pharmacol.
402 (1963). Exp. Ther., 97, 68 (1949); ( d ) E. J. Fellows, E.
Macko, and R. A. McLean, J. Pharmacol. Exp.
38. M. V . Koch, J. G. Cannon, and A. M. Burkman, Ther., 100,267 (1950).
J. Med. Chem., 11,977 (1968).
55. M. D. Mashkovsky, L. N. Yakhoutov, M. E. Ka-
39. E. R. Atkinson, F. J. Bullock, F. E. Granchelli, S. minka, and E. E. Mikhlina i n E. Jucker, Ed.,
Archer, F. J. Rosenberg, D. G. Teiger, and F. C. Progress in Drug Research,Vol. 27, Birkhauser
Nachod, J. Med. Chem., 18, 1000 (1975). Verlag, Basel, Switzerland, 1983, pp. 35-38.
40. J. G. Cannon in ref. 21, pp. 309-310.
56. P. Pevarello, A. Bonsignori, P. Dostert, F. Hei-
41. J. G. Cannon, F.-L. Hsu, J. P. Long, J. R. Flynn, demperger, V . Pinciroli, M. Colombo, R. A.
B. Costall, and R. J. Naylor, J. Med. Chem., 21, McArthur, P. Salvati, C. Post, R. G. Fariello,
248 (1978). and M. Varasi, J. Med. Chem., 41, 579 (1998).
42. J. Z. Ginos a n d F. C. Brown, J. Med. Chem., 21, 57. A. F. Casy, The Steric Factor i n Medicinal
155 (1978). Chemistry. Dissymmetric Probes of Pharmaco-
43. J. Z. Ginos, G. C. Cotzias, and D. Doroski, logical Receptors, Plenum, New YorkiLondon,
J. Med. Chem., 21,160 (1978). 1993, pp. 52-61.
44. M. J. Y u , J. R. McCowan, N. R. Mason, J. B. 58. S. Hjorth, A. Carlsson, H. Wikstrom, P. Lind-
Deeter, and L. G. Mendelsohn, J. Med. Chem., berg, D. Sanchez, U . Hacksell, L.-E. Arvidsson,
35,2534 (1992). 0. Svensson, a n d J. L. G. Nilsson, Life Sci., 28,
45. K.-L. Y u , J. J . Bronson, H. Yang, A. Patick, M. 1225 (1981).
Alam,V. Brankovan, R. Datema, M. J. M . Hitch- 59. H.Wikstrom, D. Sanchez, P. Lindberg, U . Hack-
cock, and J. C. Martin, J. Med. Chem., 35,2958 sell, L.-E. Arvidsson, A. M. Johansson, S.-0.
(1992). Thorberg, J. L. G. Nilsson, K. Svensson, S.
Analog Design
Hjorth, D. Clark, and A. Carlsson, J. Med. 72. M. J. Yu, K. J. Thrasher, J. R. McCowan, N. R.

Chem., 27, 1030 (1984). Mason, and L. G. Mendelsohn, J. Med. Chem.,
60. J. G. Cannon, P. Mohan, J. Bojarski, J. P. Long, 34, 1505 (1991).
R. K. Bhatnagar, P. A. Leonard, J. R. Flynn, and 73. F. W. Hahne, R. T. Jensen, G. F. Lemp, and J. D.
T. K. Chattejee, J. Med. Chem., 31,313 (1988). Gardner, Proc. Natl. Acad. Sci. USA, 78, 6304
61. J. G. Cannon, S. T. Moe, and J. P. Long, Chiral- (1981).
ity, 3, 19 (1991).
62. B. T. Butler, G. Silvey, D. M. Houston, D. R. 74. R. B. Barlow and H. R. Ing, Nature, 161, 718
Borcherding, V. L. Vaughn, A. T. McPhail, D. M. (1948).
Radzik, H. Wynberg, W. Ten Hoeve, E. Van 75. W. D. M. Paton and E. J. Zaimis, Nature, 161,
Echten, N. K. Ahmed, and M. D. Linnik, Chiral- 718 (1948).
ity, 4, 155 (1992). 76. D. J. Triggle, Neurotransmitter-Receptor Znter-
63. See ref. 57, pp. 244-247; see also M. Pigini, L. actions, Academic Press, New York, 1971, p.
Brasili, M. Gianella, and F. Gualtieri, Eur. 360.
J. Med. Chem., 16,415 (1981).
64. G. A. Johnston, D. R. Curtis, P. M. Beart, C. J. A. 77. V. Trcka in D. A. Kharkevich, Ed., Handbook of
Game, R. M. McColloch, and B. Twitchin, Experimental Pharmacology, Vol. 53, Springer-
J. Neurochem., 24, 157 (1975). Verlag, New York, 1980, p. 138.
65. R. D. Allan, D. R. Curtis, P. M. Headley, G. A. 78. J. G. Cannon, T. M.-L. Lee, A. M. Nyanda, B.
Johnson, D. Lodge, and B. Twitchen, J. Neuro- Bhattacharyya, and J. P. Long, Drug Des. De-
chem., 34,652 (1980). liv., 1,209 (1987).
66. D. S. Fullerton in R. F. Doerge, Ed., Wilson and 79. J. G. Cannon, Med. Res. Rev., 14,505 (1994).
Gisvold's Textbook of Organic Medicinal and
Pharmaceutical Chemistry, 8th ed., Lippincott, 80. J. G. Cannon, T. M.-L. Lee, Y.-a. Chang, A. M.
Philadelphia, 1982, p. 670. Nyanda, B. Bhattacharyya, J. R. Flynn, T. Chat-
terjee, R. K. Bhatnagar, and J. P. Long, Pharm.
67. R. W. Brueggemeier, D. D. Miller, and D. T. Wi-
Res., 5,359 (1988).
tiak in W. 0. Foye, T. L. Lemke, and D. A. Wil-
liams, Eds., Principles of Medicinal Chemistry, 81. J.-Q. Chen, D. Galanakis, C. R. Ganellin, P. M.
4th ed.,Williams & Wilkins, Media, PA, 1995, p. Dunn, and D. H. Jenkinson, J. Med. Chem., 43,
474. 3478 (2000).
68. V. Vuligonda, S. M. Thacher, and R. A. S. Chan- 82. W. C. Holland in E. J. Ariens, Ed., Proceedings
draratna, J. Med. Chem., 44, 2298 (2001). of the Third International Pharmacological
69. T. Nogrady, Medicinal Chemistry, 2nd ed., Ox- Meeting, Vol. 7, Pergamon, Oxford, UK, 1966,
ford University Press, New York, 1988, p. 457. pp. 295-303;K.C. Wongand J . P. Long, J.Phar-
70. J. H. Jaffe and W. R. Martin in ref. 9, p. 513. macol. Exp. Ther.,137,70 (1962).
71. J. V. Aldrich in M. E. Wolff, Ed., Burger's Medic- 83. R. Barbaro, L. Betti, F. Corelli, G. Giannacinni,
inal Chemistry and Drug Discovery, 5th ed., Vol. L. Maccari, F. Manetti, G. Straoaghetti, and S.
3, Wiley-Interscience, New York, 1996, p. 357. Corsano, J. Med. Chem., 44, 2118 (2001).
CHAPTER SEVENTEEN
: Approaches to the Rational

Design of Enzyme Inhibitors
MICHAEL J. MCLEISH
GEORGE L. KENYON
University of Michigan
Ann Arbor, Michigan
Contents
1 Introduction, 716
1.1 Enzyme Inhibitors in Medicine, 716
1.2 Enzyme Inhibitors in Basic Research, 720
2 Rational Design of Noncovalently Binding
Enzyme Inhibitors, 720
2.1 Forces Involved in Forming the Enzyme-
Inhibitor Complex, 721
2.1.1 Electrostatic Forces, 723
2.1.2 van der Wads Forces, 723
2.1.3 Hydrophobic Interactions, 724
2.1.4 Hydrogen Bonds, 724
2.1.5 Cation-n Bonding, 724
2.2 Steady-State Enzyme Kinetics, 725
2.2.1 The Michaelis-Menten Equation, 725
2.2.2 Treatment of Kinetic Data, 726
2.3 Rapid, Reversible Inhibitors, 728
2.3.1 Types of Rapid, Reversible Inhibitors,
728
2.3.1.1 Competitive Inhibitors, 728
2.3.1.2 Uncompetitive Inhibitors, 729
2.3.1.3 Noncompetitive Inhibitors, 730
2.3.2 Dixon Plots, 731
2.3.3 IC,, Values, 731
2.3.4 Examples of Rapid Reversible
Inhibitors, 733
2.4 Slow-, Tight-, and Slow-Tight-Binding
Inhibitors, 734
2.4.1 Slow-Binding Inhibitors, 734
2.5 Inhibitors Classified on the Basis of
Structure/Mechanism, 740
2.5.1 Ground-State Analogs, 740
Burger's Medicinal Chemistry and Drug Discovery 2.5.2 Multisubstrate Analogs, 741
Sixth Edition, Volume 1: Drug Discovery 2.5.3 Transition-State Analogs, 748
Edited by Donald J. Abraham 3 Rational Design of Covalently Binding Enzyme
ISBN 0-471-27090-3 O 2003 John Wiey & Sons, Inc. Inhibitors, 754
715
716 Approaches to the Rational Design of Enzyme Inhibitors
3.1 Evaluation of the Mechanism of Inactivation of 3.2 Affinity Labels, 760

Covalently Binding Enzyme Inhibitors, 756 3.3 Mechanism-Based Inhibitors, 764
3.1.1 Criteria for the Study of Affinity La- 3.4 Pseudoirreversible Inhibitors, 771
bels, 756 4 Conclusions, 774
3.1.2 Criteria for the Study of Mechanism-
Based Inactivators, 759
1 INTRODUCTION dosage and high specificity combine to reduce

both the toxicity caused by inhibition of other
Many of the top 100 drugs sold worldwide are vital enzymes and the problems arising from
enzyme inhibitors. In recent years, enzyme in- the formation of toxic decomposition prod-
hibitors not only have provided an increasing ucts. Further, high specificity will generally
number of potent therapeutic agents for the avoid depletion of the inhibitor concentrations
treatment of diseases, but also have signifi- in the host by nonspecific pathways. The areas
cantly advanced the understanding of enzy- of potency and specificity will both be ad-
matic transformations. The aim of this chap- dressed in this chapter. Clearly, the choice of
ter is to present current approaches to so- target enzyme is also of prime importance for
called rational inhibitor design, which uses
chemotherapy, although that is beyond the
knowledge of enzymic mechanisms and struc-
scope of this review. However, there are a
tures in the design process. Rational inhibitor
number of texts available that provide a good
design is intended to complement laborious
and resource-consuming screening processes, introduction to this subject (1-4). Good bio-
which consist of testing large numbers of syn- availability of the drug is also crucial for the
thetic chemicals or natural products for inhib- drug to reach its site of action in the body in
itory activity against a chosen target enzyme. effective therapeutic concentrations. For ex-
ample, highly polar or charged compounds,
such as phosphorylated compounds, fre-
1.1 Enzyme lnhibitors in Medicine
quently cannot readily cross cell membranes
A human cell contains thousands of enzymes and are therefore generally less useful as
each of which can, theoretically, be selectively drugs. Physical approaches to facilitate the
inhibited. These enzymes constitute the vari- transport of this class of compounds into the
ous metabolic pathways that, in concert, pro- cell include the use of liposomes or nanopar-
vide the requirements for the viability of the ticles (5-7). Chemical approaches may also be
cell. A selective inhibitor may block either a employed. These include the use of prodrugs,
single enzyme or a group of enzymes, leading in which functional groups on the inhibitor
to the disruption of a metabolic pathway(s). are modified in such a manner that they are
This will result in either a decrease in the con- able to be taken up by the cell and, later, met-
centration of enzymatic products or an in- abolically converted to the active drug. Pro-
crease in the concentration of enzymatic sub- drugs are discussed in more detail in Volume
strates. The effectiveness of an enzyme 2, Chapter 14.
inhibitor as a therapeutic agent will depend on As indicated earlier, a wide variety of en-
(1)the potency of the inhibitor, (2)its specific- zyme inhibitors have found use in the clinic.
ity toward its target enzyme, (3) the choice of Tables 17.1-17.3 show a number of these com-
metabolic pathway targeted for disruption, pounds and, although they provide by no
and (4) the inhibitor or a derivative possessing means an exhaustive list, they do give an indi-
appropriate pharmacokinetic characteristics. cation of the range of human disease states
Higher potency will mean less drug is required that can be ameliorated with the use of en-
to obtain a physiological response, whereas zyme inhibitors.
high specificity means that the inhibitor will The human body, even though its defenses
react only with its target enzyme and not with are constantly on guard, is still susceptible to
other sites in the body. Taken together, low invasion by foreign pathogens. Since the de-
1 Introduction 717
Table 17.1 Examples of Enzyme Inhibitors Used in the Treatment of Bacterial, Fungal, Viral,
and Parasitic Diseases
Clinical Use Enzyme Inhibited Inhibitor
Antibacterial Dihydropteroate synthetase Sulphonamides
Antibacterial Dihydrofolate reductase Trimethoprim, methotrexate
Antibacterial Alanine racemase D-Cycloserine
Antibacterial Transpeptidase Penicillins, cephalosporins
Antifungal Fungal sterol l4a-demethylase Clotrimazole, ketoconazole
Antifungal Fungal squalene epoxidase Terbinafine, naftifine
Antiviral Thymidine kinase and thymidylate kinase Idoxuridine
Antiviral DNA, RNA polymerases Cytosine arabinoside (Ara-C)
Antiviral Viral DNA polymerase Acyclovir, vidarabine
Antiviral HIV reverse transcriptase Dideoxyinosine, zidovudine
Antiviral HIV protease Saquinavir
Antiviral Influenza virus neuraminidase Zanamavir, oseltamivir
Antiprotozoal Pyruvate dehydrogenase Organoarsenical agents
Antiprotozoal Ornithine decarboxylase a-Difluoromethylornithine
velopment of the sulfa drugs (sulfonamides), ble to exploit subtle structural differences
enzyme inhibitors have played a vital role in between the isozymes to obtain a highly spe-
controlling these infectious agents. Table 17.1 cific inhibitor that preferentially binds to
provides a list of enzyme inhibitors that have the invader's version. Trimethoprim (2)
been used in the treatment of the various dis- shows this selective toxicity. An inhibitor of
eases caused by these agents. All these com- dihydrofolate reductase, trimethoprim is a
pounds needed to satisfy the usual require- potent antibacterial agent because the bac-
ments for specificity and low toxicity. terial enzyme is inhibited at a concentration
This can be achieved in a variety of ways. several thousand times lower than that re-
For instance, it is possible to inhibit an es- quired for inhibition of the mammalian
sential pathway in the pathogen that does isozyme (12). Acyclovir ( 3 4 , an antiviral
not exist in the host. D-Cycloserine (1) (Fig. drug used for the treatment of herpes infec-
17.11, for example, inhibits alanine race- tions (13, 141, also fits into this category. It
mase, an enzyme involved in bacterial cell binds very tightly to the Herpes simplex
wall biosynthesis and not found in humans DNA polymerase with an estimated half-life
(8,9). D-Cycloserine is active against a broad of about 40 days. Acyclovir is a prodrug be-
spectrum of both gram-positive and gram- cause it requires transformation by a viral
negative bacteria (lo), but plays its major thymine kinase and cellular phosphotrans-
role in the treatment of tuberculosis (11). ferases to the corresponding triphosphate
Conversely, even if both host and pathogen (3b) to serve in vivo as an inhibitor of the
contain the same enzymes, it may be possi- viral DNA polymerase (15).
Table 17.2 Examples of Enzyme Inhibitors Used in the Treatment of Cancer

Type of Cancer Enzyme Inhibited Inhibitor
Benign prostatic hyperplasia Steroid 5a-reductase Finasteride
Estrogen-mediated breast cancer Aromatase Arninoglutethimide, fadrozole
Leukemia, osteosarcoma, head, Dihydrofolate reductase Methotrexate
neck, and breast cancer
Colorectal cancer Thymidylate synthase 5-Fluorouracil
Leukemia Glutamine-PRPP 6-Mercaptopurine, azathioprine
amidotransferase
Small-cell lung cancer, non- Topoisomerase I1 Etoposide
Hodgkin's lymphoma
Hairy-cell leukemia Adenosine deaminase Pentostatin
Table 17.3 Examples of Enzyme Inhibitors Used in Various Human Disease States
Clinical Use Enzyme Inhibited Inhibitor
Epilepsy GABA transaminase y-Vinyl GABA
Epilepsy Carbonic anhydrase Sulthiame
Epilepsy Succinic semialdehyde dehydrogenase Sodium valproate
Antidepressant Monoamine oxidase (MAO) Tranylcypromine, phenelzine
Antihypertensive Angiotensin converting enzyme Captopril, enalaprilat
Cardiac disorders Na',K'-ATPase Cardiac glycosides
Gout Xanthine oxidase Allopurinol
Ulcer Hf ,K+-ATPase Omeprazole
Hyperlipidemia HMG-CoA reductase Atorvastatin, simvastatin
Anti-inflammatory Prostaglandin synthase, Aspirin, naproxen, ibuprofen
Cyclooxygenase (COX) I and I1
Arthritis Cyclooxygenase (COX) I1 Celecoxib
Glaucoma Acetylcholinesterase Neostigmine
Glaucoma Carbonic anhydrase I1 Acetazolamide, dichlorphenamide
Although their inhibitors are not specifi- prodrug form of an inactivator of thymidylate
cally therapeutic agents in themselves, the synthase (25),and methotrexate (71, an inhib-
p-lactamases are another important target for itor of dihydrofolate reductase (26, 27), both
drug design. These are bacterial enzymes and, fit into this category. Unfortunately, rapidly
as with the alanine racemases, are not found dividing normal cells, such as hair follicles, the
in humans. Inhibitors of p-lactamases include cells lining the gastrointestinal tract, and the
clavulanic acid (4) (16-20) and sulbactam bone marrow cells involved in the immune
(penicillanic acid sulfone) (5) (18, 21-24). system are also significantly affected. The re-
These two compounds act to prevent the bac- sultant hair loss, nausea, and susceptibility to
terial degradation of penicillins and cephalo- infection means that this type of chemother-
sporins by p-lactamases, thereby extending apy is seldom employed as a first-line defense
their lifetime and effectiveness. Accordingly, against cancer.
both clavulanic acid (4) and sulbactam (5) The inhibition of enzymes involved in met-
have reached the market as drugs that act syn- abolic pathways is not restricted to anticancer
ergistically with these commonly prescribed agents. A variety of diseases have been corre-
antibacterial agents. lated with either the dysfunction of an enzyme
Even though it has proved possible to selec- or an imbalance of metabolites. A cross section
tively inhibit the enzymes of a number of of the disease states treated with enzyme in-
pathogens, the enzymes of cancer cells have hibitors is shown in Table 17.3. Practically,
proved to be a far more elusive target. Indeed, these may be treated by the inhibition of an
the majority of the currently employed antitu- individual enzyme or by using enzyme inhibi-
mor agents can be described as antiprolifera- tors to regulate the metabolite concentration
tive agents. These take advantage of the fact in the body. For example, an imbalance of the
that many, but not all, tumor cells grow and two neurotransmitters, glutamate and y-ami-
divide more rapidly than normal cells. Lym- nobutyric acid, is responsible for the convul-
phomas, for example, proliferate more rapidly sions observed in epileptic seizure. The latter
than solid tumors, whereas, conversely, acute is metabolized by y-aminobutyric acid amino-
leukemia cells divide more slowly than the transferase (GABA-T) and, consequently, in-
surrounding bone marrow cells. Most of the hibitors of this enzyme offered themselves as
enzyme inhibitors used as these antiprolifera- potential antiepileptic candidates. This led to
tive agents (Table 17.2) can also be described the development of the GABA-T inhibitor, vi-
as antimetabolites (i.e., they inhibit a meta- gabatrin (8)(28),which clinically results in an
bolic pathway), often those involved in DNA increase of the brain concentration of y-ami-
biosynthesis, which are important for cell sur- nobutyric acid and cessation of epileptic con-
vival or replication. 5-Fluorouracil (6), the vulsions. As with the anticancer agents, block-
1 Introduction
-0
N,
0
2 H2N ~q~~
I\\
OMe
OMe
RO 4
(3a) R = H
(3b) R = PPP
Figure 17.1. Examples of enzyme inhibitors used clinically.
ade of a metabolic pathway may also have found to be effective in the treatment of hyper-
therapeutic benefits. The statins, a group of lipidemia and familial hypercholesteremia
serum cholesterol-lowering drugs, are inhibi- (33,341 and have become some of the world's
tors of hydroxymethylglutaryl-CoA (HMG- best-selling drugs.
CoA) reductase (29). HMG-CoA reductase cat- Finally, enzyme inhibitors can also be used
alyzes the irreversible conversion of HMG- to induce an animal model of a genetic disease.
CoA to mevalonic acid, the rate-determining Inactivation of y-cystathionase by propargyl-
step in cholesterol biosynthesis (3032). In- glycine, for example, produces an experimen-
hibitors such as simvastatin (9) have been tal model of the disease state known as cysta-
720 Approaches to the Rational Design of Enzyme inhibitors
Table 17.4 Classification of Enzyme Inhibitors Employed in This Chapter

Noncovalent Inhibitors Covalent Inhibitors
Rapid reversible inhibitors (ground-state analogs) Chemical modifiers
Tight, slow, slow-tight binding inhibitors Affmity labels
Multisubstrate analogs Mechanism-based inhibitors
Transition-state analogs Pseudoirreversible inhibitors
thioninuria (35). Deficiency of this enzyme cating how it may be evaluated. The
leads to the accumulation of cystathionine in discussion will be accompanied by references
the urine and has sometimes been associated to recent, representative examples from the
with mental retardation (36). literature. Where appropriate, these examples
will be of inhibitors of therapeutic interest.
1.2 Enzyme lnhibitors in Basic Research It should be noted that we will concentrate
In basic research enzyme inhibitors have on inhibitors directed at the active site of the
found a multitude of uses. They serve as useful enzyme. While recognizing that there are in-
tools for the elucidation of structure and func- hibitors that bind to regions other than the
tion of enzymes, as probes for chemical and active site, such as allosteric effectors, these
kinetic processes, and in the detection of are not the focus of this chapter and will not be
short-lived reaction intermediates (37). Prod- included. There are many reviews of enzyme
uct inhibition patterns provide information inhibitors available in the literature (37,
about an enzyme's kinetic mechanism and the 46-48) and the reader is referred to them for
order of substrate binding (38). Covalently more detailed analysis.
binding enzyme inhibitors have been used to
identify active-site amino acid residues that
could potentially be involved in substrate 2 RATIONAL DESIGN OF
binding and catalysis of the enzyme (39, 40). NONCOVALENTLY BINDING
Reversible enzyme inhibitors are routinely ENZYME INHIBITORS
used to facilitate enzyme purification by using
the inhibitor as a ligand for affinity chroma- As their name indicates, this class of inhibi-
tography (41, 42) or as eluants in affinity-elu- tors binds to the enzyme's active site without
tion chromatography (43). Immobilized en- forming a covalent bond. Therefore the affin-
zyme inhibitors can also be used to identify ity and specificity of the inhibitor for the active
their intracellular targets (44), whereas irre- site will depend on a combination of the elec-
versible inhibitors can be used to localize and trostatic and dispersive forces, and hydro-
quantify enzymes in vivo (45). phobic and hydrogen-bonding interactions.
In Table 17.4 we have provided the classifi- Traditionally, noncovalently binding enzyme
cation of the various types of enzyme inhibi- inhibitors were analogs of substrates, prod-
tors that we employ in this chapter. The clas- ucts, or reaction intermediates. More recently,
sification may appear somewhat arbitrary, in an explosion in the use of combinatorial chem-
that some inhibitors may fit into more than istry and rapid screening techniques has seen
one category. This can arise because these cat- the development of large numbers of enzyme
egories are attempting to bring together some inhibitors that bear little or no resemblance to
nonrelated properties such as structure, the substrate or products, yet still bind selec-
mechanism of action, and kinetic behavior. tively to their target enzyme. Computer-aided
Thus, what we have classed as a reversible drug design, in the broadest sense, encom-
inhibitor may, simply because it has a slow passes both structure-based drug design and
dissociation rate, be described elsewhere in quantitative structure-activity relationship
the literature as being irreversible. In each in- (QSAR) methods. A complement to the rapid
stance we will discuss approaches to the de- screening techniques, computer-aided meth-
sign of that type of inhibitor, as well as indi- ods provide a more focused approach to the
2 Rational Design of Noncovalently Binding Enzyme Inhibitors 721
design and discovery of both substrate and ing of the forces involved in the binding of
nonsubstrate analog inhibitors. substrates and inhibitors to an enzyme's ac-
In structure-based design, the structure tive site.
of a drug target interacting with small mol-
ecules is used to guide drug discovery. Con- 2.1 Forces Involved in Forming the Enzyme-
sequently, either the three-dimensional en- Inhibitor Complex
zyme structure or, at a minimum, t h e To understand the design concepts of the var-
pharmacophore structure must be known. A ious types of noncovalently binding enzyme
pharmacophore represents the nature of the inhibitors, a basic knowledge of the binding
chemical groups of a given ligand and their forces between an enzyme's active site and its
relative orientation important for inhibitor inhibitors is required. The forces involved in a
binding. Today, structure-based design, substrate or an inhibitor binding to an en-
used in conjunction with docking tech- zyme's active site are, as with a drug binding
niques, combinatorial chemistry, and rapid to a receptor, the same forces that are experi-
screening not only leads more quickly to enced by all interacting organic molecules.
novel enzyme inhibitors but also greatly re- These include ionic (electrostatic) interac-
duces the number of compounds that must tions, ion-dipole and dipole-dipole interac-
be synthesized. More information on these tions, hydrogen bonding, hydrophobic interac-
approaches may be found in Chapter 10 and tions, and van der Waals interactions. A brief
some recent monographs (49-52). overview of the forces involved follows. More
Traditionally, an increase in inhibitory or comprehensive treatments can be found in
Chapter 4 and elsewhere (57-60).
biological activity was achieved by synthesiz-
The binding of an inhibitor is dependent on
ing an analog of the substrate and then mak-
a variety of interactions, and it is the sum of
ing gradual empirical changes in the structure these interactions that will determine the de-
by adding or removing functional groups. gree of affinity of an inhibitor for the particu-
QSAR methods provide a means of making lar enzyme. The reversible binding of an in-
this empirical testing more focused. In this hibitor to an enzyme's active site can be
technique there is no need to know the struc- described as shown in Equation 17.1.
ture of the active site. Instead, computer algo-
rithms are employed to correlate the biological
activity of a series of inhibitors with their
chemical structure, thereby allowing better
predictions as to how to change the structure
to obtain a more potent inhibitor. This topic is There is an equilibrium between the free
discussed further in Chapter 1, and detailed enzyme (E), inhibitor (I), and the enzyme-in-
reviews are also available (53-56). hibitor complex (E . I). The affinity of an in-
Table 17.4 shows the classification of non- hibitor for the enzyme is measured by the in-
covalent inhibitors we use in this chapter. hibition constant Ki, which is the dissociation
Based on their kinetics it is possible to distin- constant of the enzyme-inhibitor complex, at
guish among rapid reversible, tight-binding, equilibrium (Equation 17.2).
slow-binding, slow-tight-binding, irreversible,
and pseudoirreversible inhibitors. Conversely,
inhibitors classified on the basis of structure,
such as ground-state analogs, multisubstrate
inhibitors, and transition-state analogs, which The lower the Ki value, the better the in-
mimic the structures of substrates and prod- hibitor, given that the equilibrium lies more in
ucts, reaction intermediates, and transition favor of enzyme-inhibitor complex formation.
states, may fall into any of the kinetic catego- The affinity of an inhibitor for an enzyme may
ries. However, before introducing these cate- be related to the standard free energy (AG") of
gories, it is important to have an understand- a system by Equation 17.3.
For example, water has a higher melting

point, boiling point, and heat of vaporization
where R is the universal gas constant and T the than those of comparable hydrides such as
temperature in degrees Kelvin. The more nega- H,S and NH,. The heat capacity of water in-
tive the value of AG", the more favorable the dicates that it is highly structured and its sur-
interaction at equilibrium, and the smaller the face tension (73 dyne cm-' at 20°C) is consid-
Kivalue. It should be noted that, from Equation erably higher than that of most liquids (20-40
17.3, at physiological temperature relatively dyne cm-l). The dielectric constant of water
small changes in free energy, only 2-3 k d m o l , (80) is also considerably higher than that of
most liquids, which are generally less than 30.
will have a significant effect on Ki.
Ethanol, for example, has a dielectric constant
The standard free energy (A@)can also be
of 24, whereas those of benzene and hexane
expressed in terms of enthalpic ( W and ) en- are 2.3 and 1.9, respectively. All told, water is
tropic (AS") components (Equation 17.4). a unique solvent, and one that has a major
influence on binding interactions between an
enzvme and an inhibitor.
Hydrogen bonds are readily formed be-
Equation 17.4 states that the free energy of tween water and biologically important atoms
a system is lowered (i.e., the reaction is made such as the hydrogen bond acceptors N and 0
more favorable) by either a decrease in en- and, to a lesser extent, S. The conjugate acids
thalpy or an increase in entropy. This is also NH and OH may act as hydrogen bond donors.
an important concept because there are both Molecules containing these atoms have the ca-
enthalpic and entropic components to the pacity for many hydrogen-bonding interac-
forces that contribute to the strength of the tions with water and, as a result, are usually
enzyme-inhibitor interaction. soluble in water. However, solute-solute hy-
When discussing the forces involved in the drogen bonding interactions are less favorable
noncovalent binding of a substratelinhibitor because their formation will require the dis-
to an enzyme, or drug to a receptor, it must be ruption of favorable solute-water hydrogen
recognized that these interactions will be car- bonds. Thus, what may be strong hydrogen
ried out in an aqueous medium. The physical bonds in the gas phase, or in organic media,
properties of water mean that noncovalent in- are often considerably weaker in aqueous m 8
teractions in aqueous solution will be signifi- dia.
cantly different from those interactions ob- Water's high dielectric constant makes it
served in either an organic medium or in the extremely effective in solvating, dissociating,
gas phase. A water molecule has electronic and dissolving most salts. Because of its per-
asymmetry; the strongly electronegative oxy- manent dipole, water is readily able to interact
gen atom withdraws electron density from the with ionic species, with the result that ionic
hydrogen atoms. This creates partial positive solute-solute interactions are less favored.
charges on the hydrogens and a partial nega- The situation is analogous to that observed for
tive charge on the oxygen. As a result a water hydrogen bonding and again results in a weak-
molecule possesses a permanent dipole mo- ening of the normally strong interactions be-
ment, facilitating strong interactions with tween ions that occur in the gas phase or non-
other water molecules as well as with any polar media. This is sometimes described as a
charged or polar species. "leveling effect."
Water is both a donor and acceptor of hy- Small amounts of many nonpolar sub-
drogen bonds. Consequently, in bulk solvent, stances can also dissolve in water. However,
water molecules are extensively hydrogen these substances do not interact verv " well
bonded to each other. These are relatively with water and prefer to interact with each
weak bonds (-5 kcallmol) and, at physiologi- other. The force driving this interaction,
cal temperature, are rapidly broken and re- known as the hydrophobic force, is not so
formed. However, the hydrogen-bonding net- much an attraction between hydrophobic mol-
work affects many of the properties of water. ecules as an entropic effect arising from the
displacement of water. Indeed, there are no polar solvents. As discussed above, because of
hydrophobic forces in the gas phase or in non- its high net permanent dipole moment, water
polar solvents. However, collectively, hydro- is very polar and has a large dielectric con-
phobic forces are thought to transcend other stant. The high polarity of water greatly di-
types of forces, particularly in the folding of minishes the attraction or repulsion forces be-
proteins, in all biological systems. tween any two charged groups giving rise to
the leveling effect of water. It is somewhat dif-
2.1.1 Electrostatic Forces. Although we re- ficult to predict the exact strength of a charge-
cognize that, in essence, all forces between at- charge interaction between an enzyme and an
oms and molecules are electrostatic, here we inhibitor. For example, the formation of a salt
use the term to describe ion-ion, ion-dipole, bridge (charge-charge) interaction between an
and dipole-dipole interactions. At physiologi- enzyme (Enz) and an inhibitor (I) may be de-
cal pH, the side-chains of basic residues such scribed by Equation 17.6.
as lysine and arginine and, to a lesser extent,
the imidazole ring of histidine will be proton-
ated, whereas the acidic groups on the side
. -
~ n z - f i ~ , (H20), + I-COP (HZO), =
chains of aspartic and glutamic acid residues
will be deprotonated. In addition, the N-termi-
nal amino groups and C-terminal carboxylates Both the charged species are initially sol-
will be ionized. Therefore, in addition to atoms vated by water, and to form the salt bridge
with permanent and induced dipoles, an en- both ions must be desolvated. This comes at
zyme potentially will have several charged some enthalpic cost, but the freeing of water
groups available for binding to charged or po- molecules leads to a concomitant, favorable in-
larized groups on a substrate or inhibitor. As crease in entropy. The strength of the ion pair
described by Equation 17.5, the electrostatic will depend on the stability of the salt bridge
force (F)between the charged atoms (q, and vs. that of the individual solvated ions. If the
q,) will depend on the distance between the salt bridge is buried in a relatively hydropho-
charged groups (r)and the dielectric constant bic active site, it is less solvated and will be
of the surrounding medium (D). more favored than the same interaction in a
solvent-exposed active site.
2.1.2 van der Waals Forces. Also called

nonpolar interactions or London dispersion
The strength of an ion-ion interaction is forces, these are the universal attractive inter-
inversely related to the square of distance be- actions that occur between atoms. As two mol-
tween the ions, whereas ion-dipole and dipole- ecules closely approach each other there is an
dipole interactions have llr4 and llr6 relation- interpenetration of their electron clouds. As a
ships, respectively. Because the strength of consequence, temporary local fluctuations in
the interaction decreases more slowly - with the electron density occur, giving rise to a tem-
distance, ion-pair interactions can be thought porary dipole in each molecule, even though
of as long-range interactions. Conversely, in- the molecules may, in themselves, have no net
teractions involving dipoles are effective over dipole moment. Thus there will be an attrac-
only a short range, although, because they are tive force between the two molecules, with the
much more prevalent, dipole interactions may magnitude of the force depending on the po-
be more significant to the overall binding pro- larizability of the particular atoms involved
cess. Clearly, the dependency of the strength and the distance between each other. Electro-
of interaction on the distance between atoms negative oxygen has, for example, a much
is an important consideration when designing lower polarizability than that of a nonpolar
potential enzyme inhibitors. methylene group. Accordingly, dispersion
Equation 17.5 also leads to the fact that forces are considerably stronger between non-
electrostatic interactions are less favorable in polar compounds than between nonpolar com-
pounds and water. The optimal distance be- significant in nonpolar solvents, water greatly
tween the atoms is the sum of each of their van diminishes their magnitude. The energy of the
der Wads radii, so these forces come into play amide-amide N H . -0hydrogen bond is about
only when there is good complementarity be- 5 kcdmol, and is typical for hydrogen bonds
tween enzyme and inhibitor. Although van (60).
der Wads forces are quite weak, usually It should be remembered that, for a hydro-
around 0.5-1.0 kcal/mol for an individual at- gen bond to form between an enzyme and an
om-atom interaction, they are additive and inhibitor, any hydrogen bonds between the in-
can make an important contribution to inhib- hibitor and water, as well as those between the
itor binding. enzyme and water, must be broken (Equation
17.7).
2.1.3 Hydrophobic Interactions. Hydropho-
bic interactions may be described as entropy-
based forces. When a nonpolar compound is
dissolved in water, the strong water-water in-
teractions around the solute lead to an effec-
tive "ordering" of the structure of the solvent.
This is entropically unfavorable; that is, there
is negative entropy of dissolution. When a
nonpolar inhibitor binds to a nonpolar region
of an enzyme, all the ordered water molecules
become less ordered as they associate with
bulk solvent, leading to an increase in entropy. Overall, the total number of hydrogen
According to Equation 17.4 any increase in en- bonds remains constant and, provided that
tropy will lead to a decrease in free energy and, the hydrogen bonds between the inhibitor and
through Equation 17.3, stabilization of the en- enzyme are not significantly more favorable
zyme-inhibitor complex. It has been calcu- than those between water and the inhibitor or
lated that a single methylene-methylene in- those between water and the enzyme, the net
teraction releases about 0.7 kcdmol of free change in enthalpy is usually insignificant. On
energy. Even though this figure is not high, the other hand, formation of the enzyme-in-
given that enzymes and inhibitors usually hibitor complex usually leads to an overall'in-
have large regions of hydrophobic surface, this crease of entropy because the inhibitor re-
type of bonding may also play a significant role mains bound to the enzyme and the formerly
in inhibitor binding. bound water molecules are released.
2.1.4 Hydrogen Bonds. A hydrogen bond 2.1.5 Cation-.rr Bonding. Recently it has
occurs when a proton is shared between two become apparent that there is another impor-
electronegative atoms e . , -X-H. . .Y). tant noncovalent binding force that may be
Electron density is pulled from the hydrogen exploited when designing enzyme inhibitors.
by X, giving the hydrogen a partial positive Cations, from simple ions such as Lit to more
charge that is strongly attracted to the non- complex organic molecules such as acetylcho-
bonded electrons of Y. The bond is usually line, are strongly attracted to the electron-
asymmetric, with one of the heteroatoms, the rich (T) face of benzene and other aromatic
hydrogen bond donor, having a normal cova- compounds (61,62).Cation-T bonds, as well as
lent bond distance to the proton. The other other amino-aromatic interactions, are com-
heteroatom, the hydrogen bond acceptor, is mon in structures in the protein data bank
usually at a distance somewhat shorter than (631, and it has been estimated that more than
the van der Wads contact distance and, for 25% of tryptophan residues are involved in in-
optimal hydrogen bonding, the atoms should teractions of this type (64). The finding that
be arranged linearly. A hydrogen bond is a spe- the cationic group of acetylcholine was bound
cial type of dipole-dipole interaction and, as we primarily by aromatic residues, most espe-
have seen, although these forces can be quite cially by a tryptophan residue, not by the ex-
2 Rational Design of Noncovalently Binding Enzyme Inhibitors
pected carboxylate anion, provided evidence As can be seen from the following discus-
that cation-.rrinteractions may play an impor- sion, it is not difficult to carry out a kinetic
tant role in ligand binding (65,66). Model sys- analysis of a single-substrate reaction such as
tems suggest that, energetically, the cation-.rr that described in Equation 17.8. However, as
interaction can compete with full aqueous sol- more substrates are added the task becomes
vation in binding cations (61), and there is more complex. Fortunately, kinetic analysis of
now significant effort being expended in enzymatic reactions involving two or more
studying the contribution of these interac- substrates can be made easier by varying the
tions to molecular recognition (62,661. concentration of only one substrate at a time.
In summary, the Ki provides an indication By keeping all but one of the substrates at
of the relative stability of the enzyme-inhibi- fixed, saturating concentrations, the reaction
tor complex compared to stability of the en-
rate will depend only on the concentration of
zyme and inhibitor free in solution. Moreover,
the varied substrate. This permits the use of
it is clear that entropy, enthalpy, and water
the kinetic analysis employed for enzyme-cat-
will all have a major impact on the binding of
an inhibitor to an enzyme. alyzed, single-substrate reactions even for
complex multisubstrate reactions. In a further
2.2 Steady-State Enzyme Kinetics
simplification, the dissociation of the E P .
complex is assumed not to be rate limiting,
Just as an appreciation of the forces involved and the reversion of product to substrate is
is essential to comprehending the binding of assumed to be negligible. The latter assump-
an inhibitor to an enzyme, so is an under- tion is valid under what are known as initial
standing of the kinetic analysis of an enzyrne- velocity conditions, that is, when less than
catalyzed reaction essential to any kinetic about 5%of substrate has been consumed. Un-
evaluation of an inhibitor. In this section we der these conditions, the concentration of P is
provide a brief introduction to the study of low, and Equation 17.8 simplifies to Equation
enzyme kinetics, particularly steady-state ki- 17.9.
netics. Regardless, the reader is advised to re-
fer to other sources for more in-depth reviews
of the kinetic equations and mathematical
derivations involved (38, 60, 67-71).
2.2.1 The Michaelis-Menten Equation. In Generally, kinetic analyses are carried

the simplest case, an enzyme-catalyzed reac- out by studying the reaction under steady-
tion involves the conversion of a single sub- state conditions, that is, when the concen-
strate to a single product, as shown in Equa- tration of the enzyme is well below that of
tion 17.8. the substrate. Under those circumstances,
following a brief preequilibrium period, the
concentrations of the various enzyme-bound
. .
species, E S and E P in Equation 17.8, be-
The free enzyme (E) binds the substrate (S) come effectively constant and the rate of
to form a noncovalent enzyme-inhibitor com- conversion of substrate to product will
plex (E S). This is assumed to be a rapid, re-
e greatly exceed the change in concentration
versible process, not involving any chemical of any enzyme species. This is an approxima-
changes, and with the affinity of the substrate tion but, provided the substrate concentra-
for the enzyme's active site being determined tion does not greatly change (e.g., under ini-
by the binding forces discussed above. A chem- tial velocity conditions), it is a very useful
ical transformation of substrate to product approximation. Given steady-state condi-
(P), initially in complex with enzyme (E P), . tions, t h e Michaelis-Menten equation
then takes place. Finally, the product (P) is (Equation 17.10) is a quantitative descrip-
released into the medium with concomitant tion of the reaction described by Equation
regeneration of free enzyme (E). 17.9.
Approaches to the Rational Design of Enzyme Inhibitors
The Michaelis-Menten constant KM is a

combination of rate constants and is indepen-
dent of enzyme concentration under steady-
state conditions. It is equal to the substrate
concentration at which half the maximum ve-
locity of the enzyme-catalyzed reaction is
reached; that is, when [S] = KM, then v =
?hV ., For the reaction illustrated in Equa-
tion 17.9, KMis described by Equation 17.11.
Figure 17.2. Plot showing dependency of the ini-
tial velocity (v)on substrate concentration [S] for an
enzyme-catalyzed reaction obeying Michaelis-Men-
ten (saturation) kinetics.
If, for a given reaction, k-, * k,, then
Equation 17.11 simplifies to KM= Ks, where
Ks is the dissociation constant for the enzyme
substrate complex. It is important to remem-
ber that the Michaelis-Menten equation holds
true not only for the mechanism as stated
This implies that the initial velocity (v) is above, but for many different mechanisms
directly proportional to the enzyme concen- that are not included in this treatment. In
tration [El, and that v follows saturation ki- summary, KMcan be described as an apparent
netics with respect to the substrate concentra- dissociation constant for all enzyme-bound
tion [S]. This is shown graphically in Figure species and, in all cases, it is the substrate
17.2 and explained as follows: at very low sub- concentration at which the enzyme operates
strate concentrations v increases in a linear at half-maximal velocity.
fashion, so that v = Vm,[SI/KM. As the sub- Another parameter often referred to when
strate concentration increases, the observed discussing Michaelis-Menten kinetics is k,,/
increase in v is less than the increase in [Sl.
KM.This is an apparent second-order rate con-
This trend continues until, at high (saturat-
stant that relates the reaction rate to the free
ing) substrate concentrations, u becomes ef-
fectively independent of [S] and tends toward (not total) enzyme concentration. As de-
the limiting value V., scribed above, at very low substrate concen-
trations when the enzyme is predominantly
vm, is the maximal velocity that can be unbound, the velocity (v) is equal to [El [Slk,,I
achieved at a specific enzyme concentration.
In the simple Michaelis-Menten mechanism KM.The value of k, JKM sets a lower limit on
described by Equation 17.9, there is only one the rate constant for the association of enzyme
.
E S complex and all binding steps are rapid. and substrate. It is sometimes referred to as
In this instance, V,, is the product of the the specificity constant because it determines
enzyme concentration [El and k, (also known the specificity of the enzyme for competing
as k,,), which is the first-order rate constant substrates.
for the chemical conversion of the E S com- . Again, for more detailed treatment of this
plex to free enzyme and product. The catalytic subject the reader should refer to more spe-
constant k, is often referred to as the turn- cialized texts (38, 60, 67-69).
over number because it represents the maxi-
mum number of substrate molecules con- 2.2.2 Treatment of Kinetic Data. Analysis
verted to products per active site per unit of Michaelis-Menten kinetics is greatly facili-
time. In a more complicated reaction, k, is a tated by a linear representation of the data.
function of all the first-order rate constants Converting the Michaelis-Menten Equation
and, effectively, sets a lower limit on all the 17.10 into Equation 17.12 leads to the popular
chemical rate constants. Lineweaver-Burk plot.
Figure 17.3. The Lineweaver-Burk plot.
Figure 17.4. The Eadie-Hofstee plot.
Finally, it is possible to directly plot pairs of

v, [S] data in such a way as to directly deter-
mine KMand V ,, values. Of the linear graph-
If you plot llv against 1/[S1(Fig. 17.31,the
ical methods, the direct linear plot of
y-intercept gives a value of l/Vm, and the x-
Eisenthal and Cornish-Bowden (721, shown in
intercept gives a value of - l/KM.The slope of
Fig. 17.6, is often considered to provide the
the line is equal to KM/Vm,. Although very
best estimates of KM and V,, values. In this
popular, the Lineweaver-Burk plot suffers
method pairs of v and [Sl values are obtained
from the disadvantage that it emphasizes
in the usual manner. A v value is plotted on
points at lower concentrations and com-
the y-axis and a corresponding negative value
presses data points obtained at high concen-
of [S] is plotted on the x-axis. A straight line is
trations (67). As a result it is not recorn-
then drawn, passing through the points on the
mended for obtaining accurate kinetic
two axes and extending beyond the "point of
constants.
intersection." This is repeated for each set of u
A preferable, alternative form of the
Michaelis-Menten equation is that of the and [S] values. Thus, there are n sets of lines
for n pairs of values. A horizontal line drawn
Eadie-Hofstee plot (Equation 17.13)
from the point of intersection to the y-axis pro-
vides the V, value, whereas a vertical line
from the point of intersection to thex-axis pro-
vides the KM value.
Each of these linear plots has its own mer-
As shown in Fig. 17.4, plotting v against its, particularly for plotting inhibition data
v/[S] results in the y-intercept providing a
value of V, whereas the x-intercept pro-
vides Vm,/KM, and the slope of the line is
equal to -KM.
Another linear representation of the
Michaelis-Menten equation is the Hanes-
Woolf plot (Equation 17.14).
Thus a plot of [S]/v vs. [S] is linear, with a

slope of l/Vm, (Fig. 17.5). The y-intercept
gives KM/Vm, and the x-intercept gives -KM. Figure 17.6. The Hanes-Woolf plot.
Approaches to the Rational Design of Enzyme lnhibitors
Michaelis-Menten kinetics and, depending on

their preference of binding to the free enzyme
and/or the enzyme-substrate complex, com-
petitive, uncompetitive, and noncompetitive
inhibition patterns can be distinguished. For
the purposes of this discussion it will be as-
sumed that the initial equilibrium of free and
bound substrate is established significantly
faster than the rate of the chemical transfor-
mation of substrate to product, that is, k,, kk,
* k , (Equation 17.9). As discussed in section
2.2.1, this reduces K, to the dissociation con-
Figure 17.6. Eisenthal-Cornish-Bowdendirect lin- .
stant K, of the E S complex.
ear plot of enzyme kinetic data fitting the Michaelis- 2.3.1.1 Competitive Inhibitors. A competi-
Menten equation. tive inhibitor often has structural features
similar to those of the substrates whose reac-
tions they inhibit. This means that a compet-
(38), and its own drawbacks (67). However, itive inhibitor and enzyme's substrate are in
the rapid advances in personal computing direct competition for the same binding site on
make it relatively easy to fit kinetic data to the the enzyme. Consequently, binding of the sub-
Michaelis-Menten equation (or other appro- strate and the inhibitor are mutually exclu-
priate hyperbolic functions) by use of a variety sive. A kinetic scheme for competitive inhibi-
of commercial graphical or spreadsheet pack-
tion is shown in Equation 17.15.
ages. One simple package, HYPER, which is
readily available on the Internet (http:/t
www.ibiblio.org/pub/academic/biology/molbio/
ibmpc/hyperl02.zip), simultaneously fur-
nishes Michaelis-Menten parameters ob-
tained using hyperbolic regression analysis, as
well as those obtained using three of the plots
described here. As such, it provides a rapid
contrast of these graphical methods but, un-
fortunately, is not suitable for the study of in- The enzyme-bound inhibitor may either
hibition kinetics. In addition, the recent lack an appropriate functional group for fur-
monograph by Copeland (71) provides a list of ther reaction, or may be bound in the wrong
useful computer software and Internet sites position with respect to the catalytic residues
for the study of enzymes. or to other substrates. In any event, the en-
.
zyme-inhibitor complex E I is unreactive (it
2.3 Rapid, Reversible lnhibitors is sometimes referred to as a dead-end com-
plex) and the inhibitor must dissociate and
This class of inhibitors acts by binding to the
substrate bind before reaction can take place.
target enzyme's active site in a rapid, revers-
Solving this kinetic scheme for simple
ible, and noncovalent fashion. The net result
Michaelis-Menten kinetics leads to Equation
is that the active site is blocked and the sub-
17.16.
strate is prevented from binding. Accordingly,
in designing inhibitors of this type, optimiza-
tion of the noncovalent binding forces be-
tween the inhibitor and the active site of the
enzyme is of paramount importance.
2.3.1 Types of Rapid, Reversible Inhibitors. Here, Ki, sometimes called the inhibition
Binding of these inhibitors follows simple constant, is the equilibrium constant for the
dissociation of the enzyme-inhibitor complex, (a) ‘

and is described by Equation 17.17.
[EI[II
K. = - (17.17)
[E-I]
Competitive inhibitors do not change the /

value of V, which is reached when suffi- 0 1/PI
ciently high concentrations of the substrate
are present so as to completely displace the
inhibitor. However, the affinity of the sub-
(b)
strate for the enzyme appears to be de-
creased in the presence of a competitive in-
hibitor. This happens because t h e free
v
enzyme E is not only in equilibrium with the
.
enzyme-substrate complex E S, but also
with the enzyme-inhibitor complex E I. . \\k
\
Competitive inhibitors increase the appar-
ent KM of the substrate by a factor of (1 + vml
[Il/Ki). The evaluation of the kinetics is
again greatly facilitated by the conversion of
Equation 17.15 into a linear form using Line- (C)
weaver-Burk, Eadie-Hofstee, or Hanes-Woolf

plots, as shown in Fig. 17.7.
2.3.1.2 Uncompetitive Inhibitors. Uncom-
petitive inhibitors do not bind to the free en-
zyme. They bind only to the enzyme-substrate
complex to yield an inactive E S I complex
(Equation 17.18). 0 IS1
Figure 17.7. (a) Lineweaver-Burk, (b) Eadie-Hof-
S stee, and (c) Hanes-Woolf plots exhibiting competi-
E E . S + E + P
A
tive inhibition patterns. The dashed line indicates

the reaction in the absence of inhibitor, whereas the
(17.18)
It I solid lines represent enzymatic reactions in the
presence of increasing concentrations of inhibitor.
E.S.1
Uncompetitive inhibition is rarely ob-

served in single-substrate reactions but is
frequently observed in multisubstrate reac-
tions. An uncompetitive inhibitor can pro-
vide information about the order of binding
of the different substrates. In a bisubstrate-
catalyzed reaction, for example, a given in- As with a competitive inhibitor, the appar-
hibitor may be competitive with respect to ent KMfor the substrate decreases by a factor
one of the two substrates and uncompetitive of (1+ [I]/Ki)because the formation of E S I
with respect to the other. The linear plots will use up some of the E . S, thereby shifting
for classical uncompetitive inhibition pat- .
the equilibrium further in favor of E S forma-
terns are described by Equation 17.19 and tion. However, uncompetitive inhibitors also
are illustrated in Fig. 17.8. decrease V,, by the same factor because
Simple Michaelis-Menten kinetics of non-

competitive inhibitors are described in Equa-
tion 17.21.
-
From Equation 17.21 it is clear that non-
competitive inhibitors have an effect only on
V,, decreasing it by a factor of (1 + [IIIKJ,
consequently giving the impression of reduc-
ing the total amount of enzyme present. As
with an uncompetitive inhibitor, a portion of
the enzyme will always be bound in the non-
productive enzyme-substrate-inhibitor com-
-
plex E S I, causing a decrease in maximum
e
velocity, even at infinite substrate concentra-

tions. However, because noncompetitive in-
hibitors do not affect substrate binding, the
KMvalue of the substrate remains unchanged.
Linear plots for noncompetitive inhibition are
shown in Fig. 17.9.
Figure 17.8. (a) Lineweaver-Burk, (b) Eadie-Hof- Again, this type of inhibition is rarely seen
stee, and (c) Hanes-Woolf plots exhibiting uncom- in single-substrate reactions. It should also be
petitive inhibition patterns. The dashed line indi- noted that, frequently, the affinity of the non-
cates the reaction in the absence of inhibitor, competitive inhibitor for the free enzyme, and
whereas the solid lines represent enzymatic reac- the enzyme-substrate complex, are different.
tions in the presence of increasing concentrationsof These nonideally behaving noncompetitive in-
inhibitor. hibitors are called mixed-type inhibitors, and
they alter not only V,, but also KMfor the
substrate. Further discussion of inhibitors of
some of the enzyme remains in the E S I . . this type may be found in Segel(38).
form, even at infinite substrate concentration. Sometimes steady-state kinetics are insuf-
2.3.1.3 Noncompetitive Inhibitors. Classi- ficient to analyze the mechanism of inactiva-
cal noncompetitive inhibitors have no effect on tion for a given inhibitor. For example, irre-
substrate binding and vice versa, given that they versible enzyme inhibitors that bind so tightly
bind randomly and reversibly to different sites to the enzyme that their dissociation rate (kOfl)
on the enzyme. They also bind with the same is effectively zero also exhibit noncompetitive
affinity to the free enzyme and to the enzyme- inhibition patterns. They act by destroying a
substrate complex. Both the enzyme- inhibitor portion of the enzyme through irreversible
.
complex E I and the enzyme-substrate-inhibi- binding, thereby lowering the overall enzyme
. .
tor complex E S I are catalytically inactive. concentration and decreasing V., The ap-
The equilibria are outlined in Equation 17.20. parent KM remains unaffected because irre-
E
I
+ noncompetitive
+ irreversible
inhibitor
Figure 17.10. Plot showing dependency of V,, on

the total enzyme concentration,[El,,,,. An irrevers-
ible inhibitor will titrate a fraction of the enzyme
[Elinact.
situations. Such analyses may include more

in-depth steady-state kinetics, as well as pre-
steady-state kinetics, and testing for irrevers-
ible inhibition. Irreversible covalently binding
enzyme inhibitors are discussed extensively
later in this chapter.
2.3.2 Dixon Plots. Another linear method

for plotting inhibition data, the Dixon plot, is
shown in Fig. 17.11 (74). In this method the
initial velocity is measured as a function of
inhibitor concentration at two or more fixed
substrate concentrations. By plotting llv
Figure 17.9. (a) Lineweaver-Burk, (b) Eadie-Hof- against [I] for each substrate concentration,
stee, and (c) Hanes-Woolf plots exhibiting noncom- the different types of inhibition can easily be
petitive inhibition patterns. The dashed line indi- distinguished. Further, in cases of competitive
cates the reaction in the absence of inhibitor, or noncompetitive inhibition, the value of Ki
whereas the solid lines represent enzymatic reac- may be determined from the x-axis value at
tions in the presence of increasing concentrations of which the lines intercept. Overall, the Dixon
inhibitor. plot is probably the simplest and most rapid
graphical method for obtaining a Ki value.
versible inhibitors do not influence the disso-
ciation constant of the enzyme-substrate 2.3.3 IC,, Values. The potencies of en-
complex. A simple experiment to distinguish zyme inhibitors evaluated using rapid screen-
between a reversible noncompetitive inhibitor ing techniques are often reported in terms of
and irreversible inhibitor is shown in Fig. IC,, values rather than Ki values. An
17.10, and a comprehensive review describing value is the inhibitor concentration that is re-
the kinetic evaluation of irreversibly binding quired to halve the activity of the enzyme, that
enzyme inhibitors is available (73). Allosteric is, that concentration that leads to 50% en-
effectors may also show noncompetitive ki- zyme inactivation. It is important to recognize
netic patterns by rendering the enzyme in the that an IC,, value is not a constant, except in
- .
E S I complex less active than that in the the case of noncompetitive inhibition, and is
.
E S complex. Again, additional analyses are dependent on the substrate concentration
often required in these less well defined used in the experiment. IC,, values are com-
hibition. It should also be noted that the IC,,

value can be no less than half the concentra-
tion of the enzyme, a factor that becomes im-
portant if the inhibitor is very potent or if high
concentrations of enzyme are employed.
For a competitive inhibitor, a Ki value may
be obtained using the relationship described
by Equation 17.22.
IC,, = ~ ~+ (g) 1 (17.22)
Provided that a reasonable substrate con-

centration ( 5 0 . 1 K,) is employed for the ex-
periment, the IC,, value may be a reasonable
approximation of the true K,. Equation 17.22
indicates that substrate concentrations
greater than about 0.1-fold of the KM value
will lead to an underestimation of the Ki value,
an underestimation that becomes quite signif-
icant at high substrate concentrations.
The dependency of the IC,, value on the
substrate concentration for uncompetitive in-
hibitors is given in Equation 17.23.
IC,, = ~ ~+ (s) 1 (17.23)
In this instance it is at high concentrations

of the substrate that the Ki value is compara-
ble to the IC,, value, and a significant under-
estimation will occur at lower substrate
concentrations.
From these two equations it is clear that,
for preliminary screening when the type of in-.
Figure 17.11. Dixon plots for (a) competitive, (b) hibition is unknown, substrate concentrations
uncompetitive, and (c) noncompetitive inhibitors. close to the KM value should be used. This
The solid lines represent enzymatic reactions in the minimizes the deviation of the IC,, value from
presence of increasing concentrations of substrate. the Ki value to, in the cases of competitive and
The dashed line represents the reaction at infinite uncompetitive inhibitors, a factor of 2. If nec-
substrate concentration.
essary, a Dixon plot can be used to provide a
quick indication of the Ki and the type of inhi-
monly determined by keeping the concentra- bition (38, 7 4 ) . It should be noted that the re-
tion of the substrate and the enzyme constant lationship between IC,, and Ki requires the
and incrementally varying the concentration initial velocity to be linearly dependent on the
of the inhibitor. This simple experimental ap- concentration of inhibitor. In the cases of
proach makes it relatively easy to screen large mixed-competitive and irreversible inhibitors,
numbers of potential inhibitors. Industrial the dependency of the inhibitor concentration
high-throughput screens often employ half- and the initial velocity is nonlinear. There-
log increments, and the value of IC,, provides fore, in those cases, the use of the IC,, value is
a ready means of comparing the extent of in- limited.
2.3.4 Examples of Rapid Reversible Inhibi- tions are required for inhibition, and their in-
tors. Competitive inhibitors are often similar hibition is readily overcome by any buildup of
in structure to one of the substrates of the substrate. However, they are often useful probes
reaction they are inhibiting. Inhibitors of this for determining enzyme specificity and even
type are sometimes called substrate analogs mechanism. Phenylethanolamine N-methyl-
and their binding affmity (K,) usually approx- transferase (PNMT) catalyzes the terminal
imates that of the substrate. One of the first step in epinephrine (adrenaline) biosynthesis,
reactions inhibited by a substrate analog was the conversion of norepinephrine to epineph-
that catalyzed by succinate dehydrogenase rine (Equation 17.25), with concomitant con-
(Equation 17.24).
version of S-adenosyl-L-methionine (SAM,
- AdoMet) to S-adenosyl-L-homocysteine (SAH,
- succinate
02C-CH2-CH2-C02 AdoHcy).
dehydrogenase S-Adenosyl-L-homocysteine (10) (Fig.
succinate
17.12),the product of the reaction, and 2-(2,5-
dichlorophenyl)cyclopropylamine (11)are an-
alogs of S-adenosyl-L-methionine and norepi-
nephrine, respectively. Using these inhibitors
it was possible to ascertain the binding order
of the two substrates (75). Kinetic analyses
fumarate showed that SAH was a competitive inhibitor
of SAM and a noncompetitive inhibitor of nor-
This reaction is competitively inhibited by epinephrine, whereas (11)was a competitive
malonate (-00CCH2C00-) that has, like inhibitor of norepinephrine and an uncom-
succinate, two carboxylate groups. It is there- petitive inhibitor of SAM. This indicates that
fore able to bind to the enzyme's active site the binding of substrates is ordered, with SAM
but, with only one carbon atom between the binding first. If norepinephrine bound first, it
carboxylates, further reaction is impossible. would be expected that SAH would be an un-
Substrate analogs are rarely useful as en- competitive inhibitor and (11)would be non-
zyme inhibitors, given that large concentra- competitive with respect to SAM. If a random
Norepinephrine PNMT
Epinephrine
slow process. Other inhibitors, known as

tight-binding inhibitors, bind their target en-
zyme with such high affinity that the popula-
tion of free inhibitor molecules is significantly
depleted when the enzyme-inhibitor complex
is formed. Often, tight-binding inhibitors also
have a slow onset of action, and are termed
slow-tight-binding inhibitors. What these
three types of inhibitors have in common is
that, generally, the major assumptions of
Michaelis-Menten kinetics do not hold true.
As with rapid reversible inhibition, for
slow-binding inhibition to take place a signif-
icantly larger concentration of inhibitor than
enzyme is required. However, reaching equi-
librium slowly is incompatible with the as-
sumption of Michaelis-Menten kinetics that
Figure 17.12. Inhibitors of phenylethanolamine inhibitors bind much more quickly than the
N-methyltransferase. enzyme turns over. Unlike rapid reversible
and slow-binding inhibitors, both tight-bind-
binding mechanism were in operation, it ing and slow-tight-binding inhibitors are ef-
would be expected that both inhibitors would fective at concentrations comparable to that of
be competitive with either substrate. More de- the enzyme. At that point, the inhibitor con-
tail on similar uses of reversible inhibitors centration is no longer independent of the en-
may be found elsewhere (76). zyme concentration, as assumed for Michae-
lis-Menten kinetics. A summary of the
2.4 Slow-, Tight-, and Slow-Tight-Binding properties of reversible enzyme inhibitors is
Inhibitors shown in Table 17.5. Although we give a brief
Not all reversible inhibitors have an instanta- overview of these types of inhibitors, excellent
neous effect on the rate of an enzymatic reac- and more in-depth descriptions of slow-,
tion. Some inhibitors, known as slow-binding tight-, and slow-tight-binding inhibitors have
enzyme inhibitors, can take a considerable appeared elsewhere (71, 77-80).
time to establish the equilibrium between the
free enzyme and inhibitor, and the enzyme- 2.4.1 Slow-Binding Inhibitors. Two differ-
inhibitor complex. This time period may be on ent mechanisms have been suggested to ratio-
the scale of seconds, minutes, or even longer. nalize the slow-binding behavior of competi-
The enzyme-inhibitor complexes have slow off tive inhibitors (71, 78, 80). In the one-step
(dissociation) rates, but the on (association) mechanism A, the direct binding process of the
rates may be either slow or fast. Hence, the inhibitor to the enzyme is slow (Equation
term slow binding does not necessarily indi- 17.26); that is, the magnitude of k,[I] is small
cate a slow binding of inhibitor to enzyme but relative to k,[S] and k,, the rate constants for
rather the fact that reaching equilibrium is a the conversion of substrate to product.
Table 17.5 Classes of Reversible Inhibitors

Ratio of Inhibitor to Enzyme Rate at Which Equilibrium
Inhibitor Class Necessary for Inhibition is Attained E + I E . I
Rapid, reversible 1 %E Fast
Tight binding 1.-E Fast
Slow binding 1%-E Slow
Slow-tight binding I- E Slow
For slow-tight-binding inhibitors, k-, is

.
very small and formation of the E I complex
is essentially irreversible. Use of Equation
17.28 ensures that depletion of free enzyme
The slow on rate (k,) has been attributed and free inhibitor by formation of the E I .
to the inhibitor encountering some barrier complex is taken into account.
to binding a t the active site. The inhibitor In mechanism B, the more common mech-
has to overcome this barrier by correct align- anism for slow-binding inhibition (go),the ini-
ment. Once aligned properly, it binds so tial equilibrium between the enzyme, inhibi-
tightly that it is released very slowly from tor, and the E I complex is fast. However,
the enzyme, making the overall equilibrium there is a subsequent slow rearrangement to
process extremely slow. The equilibrium dis- form the final, more stable enzyme-inhibitor
.
sociation constant for the E I complex Ki, .
complex (E I*) (Equation 17.29).
derived from Equation 17.26, is given by
Equation 17.27.
This is the same equilibrium as that for a

rapid reversible inhibitor (Equation 17.17).
From Equation 17.27, it should be noted that,
if Ki is very small (as with a tight-binding in-
hibitor) and [I] is varied in the region of Ki, Here the dissociation constant for the ini-
even if the on rate (k,) is diffusion controlled, -
tial E I complex is still k-,lk,, but there is .
both k,[I] and k-, will be very small. Thus, the also a dissociation constant for the formation
onset of inhibition for a tight-binding inhibi- .
of the E I* complex. The second dissociation
constant is given by Equation 17.30.
tor can appear to be slow, even though k, is in
the range expected for rapid reversible inhib-
itors (78). It is possible to carry out kinetic
analyses of tight-binding inhibitors. This can
be done either by including a preincubation
step, to allow sufficient time for the enzyme To observe the slow onset of inhibition and
and inhibitor to reach equilibrium, or by car- .
the E I complex, Ki* must be smaller than Ki
and k-, smaller than k,. However, if k-, is
rying out the reaction at very high concentra-
considerably smaller than k,, then the forma-
tions of both substrate and inhibitor. More de-
tailed discussion of these methods, with
.
tion of the E I* complex will be effectively
irreversible (i.e., the inhibitor is of the slow-
appropriate references, can be found in a re- tight-binding variety). Under those circum-
cent volume by Copeland (71). stances it will again be necessary to take de-
If the slow-binding inhibitor described by pletion of free enzyme and free inhibitor into
Equation 17.26 also binds very tightly, it is account when determining Ki and Ki* (78).
referred to as a slow-tight-binding inhibitor. The slow rearrangement step has been cor-
For inhibitors of this type, Ki is given by Equa- related with conformational changes of the en-
tion 17.28, where [E,] represents the total enzyme following initial binding of the inhibitor.
zyme concentration (in all forms) present in It is possible that the enzyme in its transition
solution. state conformation may be better equipped to
Approaches to the Rational Design of Enzyme inhibitors
A good comparison of rapid reversible and

[I] =0 Increasing [I] slow-binding inhibition can be found in a recent
study on the inhibition of arginase, an enzyme
that catalyzes the hydrolysis of L-arginine to
yield L-ornithineand urea (Equation 17.31).
Time
Figure 17.13. Reaction progress curves in the

presence of increasing concentrations of a slow-
binding inhibitor.
accommodate the inhibitor. A slow change to

reach this optimal conformation will lead to
tighter binding of the inhibitor and even slower
release from the enzyme. An alternative sugges-
tion is that the slow-binding process is linked to
a requisite displacement of water molecules
from the active site (81).Initially the inhibitor
binds loosely to the enzyme, but upon release of
water molecules the gain in entropy leads to a
.
more stable E I* complex.
One way of quickly identifying a potential Urea
slow-binding inhibitor is to examine the
progress of the reaction at increasing concen-
trations of inhibitor. Under initial velocity Arginase competes with nitric oxide syn-
conditions (Section 2.2.11, an enzyme-cata- thase (NOS) for arginine and, in doing so,
lyzed reaction will exhibit a linear increase in helps regulate NOS. As a consequence, inhib-
the amount of product formed over time. A itors of arginase may have therapeutic use in
reaction progress plot for a reaction carried treating NO-dependent smooth muscle disor-
out in the presence of a rapid reversible inhib- ders, including erectile dysfunction (82). A se-
itor will also be linear. However, a slow-bind- ries of arginine analogs were prepared and
ing inhibitor will initially show a linear rela- tested as inhibitors of arginase (83). Three ex-
tionship, although this will change as the amples are shown in Fig. 17.14. One of these,
inhibitor binds, resulting in a biphasic plot. Nu-hydroxy-L-arginine(12), is a competitive
Typical biphasic progress curves for a reaction inhibitor of arginase at both pH 7.5 and pH 9.5
in the presence of increasing concentrations of with Ki values of 2 and 1.6 pM, respectively.
a slow-binding inhibitor are shown in Fig. The two boronic acid derivatives, 2(S)-amino-
17.13. The initial burst of the reaction, the 6-boronohexanoic acid (13) and S-(2-borono-
linear section of the graphs, can be described ethyl)-L-cysteine (14), were also competitive
by competitive Michaelis-Menten kinetics. inhibitors a t pH 7.5 with Ki values of 0.25 and
The higher the concentration of the inhibitor, 0.31 pM, respectively. However, at pH 9.5, the
the shorter the initial linear section of each boronic acid derivatives both became slow-
curve and the slower the subsequent final binding inhibitors, apparently binding by
steady-state rate, as observed in the asymptotes mechanism B and with lowered Ki values of
in Fig. 17.13. If the inhibitor concentration is 8.5 and 30 nM, respectively. It was suggested
small, the substrate might be too depleted to that, at low pH, the trigonal form of the bo-
permit observation of steady-state rates. ronic acid derivative predominates, and that
2 Rational Design of Noncovalently Binding Enzyme inhibitors
(14) (16)
Figure 17.14. (a) Competitive and (b) slow-binding inhibitors of arginase.
this species binds with one hydroxyl, coordi- the active site of arginase as tetrahedral spe-
nating to one of the two requisite manganese cies at alkaline pH (82). Of course, compound
ions. At pH 9.5 the tetrahedral species is the (12) is unable to form the tetrahedral species
major form and this initially binds also with and is a competitive inhibitor at all times.
one hydroxyl coordinated to a manganese ion. Leucine arninopeptidase (LAP) is a metal-
Then, in a second, slower step, a water mole- loenzyme that has been inhibited in a slow-
cule that bridges the two active-site manga- binding manner. This exopeptidase catalyzes
nese ions is displaced by a second hydroxyl the hydrolysis of N-terminal amino acids, par-
group on the boronic acid (83). Support for ticularly those with a leucine at the N-termi-
this mechanism is provided by crystal struc- nus, although it does have a broad specificity
tures, showing both (15) and (16)are bound in (Equation 17.32).
1
leueine aminopeptidase
(17) (18)
Figure 17.15. Slow-tight-binding inhibitors of leucine aminopeptidase.
Bestatin (17) (Fig. 17.15) and amastatin coagulation pathway, which cleaves pro-
(18)have been identified as slow-tight-binding thrombin forming thrombin that, in turn, pro-
inhibitors of LAP from porcine kidney, with Ki motes blood clotting (Equation 17.33).
values in the low nanomolar range (84). Later, Inhibitors of Factor Xa activity offer poten-
bestatin was shown to be a slow-bindinginhib- tial as anticoagulants and several irreversible
itor of LAP employing mechanism B, with a Ki inhibitors of Factor Xa have been developed.
value of 0.11 pit4 and a Ki*value of 1.3 nM. One of the few tight-binding reversible inhib-
Values of 1.5 X lo-' s-' and 2 X lo-, s-' itors of Factor Xa is BnS02-D-kg-Gly-kg-ke-
were obtained fork, and k-, (Equation 17.291, tothiazole (19).
respectively (85). It was assumed that the in- The inhibitor could be displaced from Fac-
hibition of bovine lens leucine aminopeptidase tor Xa by substrates and, based on steady-
(blLAP) by amastatin would also proceed by state assumptions, the dissociation constant
mechanism B. This prediction was supported for (19) was found to be 14 pM (87). However,
by an X-ray crystallography study of the the reaction progress curves indicated a slow-
amastatin-blLAP complex (86), which sug- binding process, probably by mechanism B.
gested that (18)(and, by analogy, 17) initially Stopped-flow fluorescence studies, combined
binds to a Zn2+atom in a groove in the active with kinetic analysis, showed that the isomer-
site. The slow step in binding was seen as a . .
ization step (E I + E I*) is unusually fast
subsequent coordination to a second Zn2+ and that the formation of E I is, at least, par-
atom located deeper in the active site (86). tially rate limiting.
It is difficult to find clear-cut examples of In some instances the type of inhibition has
slow-binding inhibition occurring by mecha- been found to be isozyme specific. For exam-
nism A. However, the inhibition of Factor Xa ple, inducibly expressed isozymes (iNOS) and
by a peptidyl-a-ketothiazole was found to be constitutively expressed isozymes (cNOS) of
unusual because it appeared that the forma- nitric oxide synthase (NOS) all catalyze the
.
tion of E I was partially rate limiting. Factor conversion of L-arginine to L-citrulline and ni-
Xa is a trypsinlike protease found in the blood tric oxide (Equation 17.34).
0 0 0 0 0 0
II II II II II II
aHN-CH-C-NH-CH-C-NH-CH2-C-NH-CH-C-NH-CH-C-NH-CH-C-NH~
I I I I I
CH(CH3) CH2 CH2 CH(CH3) CH2
I I I I I
CH2 CH2 CH2 CH2 OH
I I I I
CH3 COOH CH2 CH3
I
NH
II
0 0
II
0
0
II
II +
0
I1
0
II
1Factor Xa
aHN-CH-C-NH-CH-C-NH-CH2-C-NH-CH-COz-+H3N-CH-C-NH-CH-C-NHa
I I I I I
CH(CH3) CH2 CH2 CH(CH3) CH2
I I I I I
CH2 CH2 CH2 CH2 OH
I I I I
CH3 COOH CH2 CH3
I
NH
I
The inhibition of human iNOS by N43-

(aminomethyl)benzyl)acetamidine (20) (Fig.
17.16) was found to proceed by mechanism B,
with an overall Kdof <7 nM. Conversely, in-
fb nitric oxide hibition of constitutive isoforms of the human
H3B7NHCO,
NH2 synthase
enzyme was found to be rapidly reversible,
with Ki values in the micromolar range (88).
L-arginine This is in contrast to results obtained for the
arginine analog, L-h@-nitroarginine (21),
(17.34) which was found to be a rapid reversible inhib-
itor of mouse macrophage iNOS, with a Ki of
4.4 pA4, and a slow-binding inhibitor of brain
NHz + *NO cNOS with a Kd (assuming mechanism A) of
"
15 nM (89).
coz Many more examples of these types of in-
L-citrulline Nitric oxide hibitors can be found in the review by Morri-
son and Walsh (78).
HN A NH2 H3"f\/\NH /A NHN02
co2-
Figure 17.16. Inhibitors of ni-
(20) (21 tric oxide synthase.
Approaches to the Rational Design of Enzyme lnhibitors
II II II II
HO-P-0-P-0- HO-P-COO- HO-P-CHp-COO-
Figure 17.17. Pymphosphate
I I I
0- 0- 0-
analogs used to inhibit DNA
polymerase. Pyrophosphate (PPi) (22)
2.5 lnhibitors Classified on the Basis of

Structure/Mechanism
As with any reaction, an enzyme-catalyzedreac-
tion must proceed from the ground state
-
through a transition state before products are
formed. In addition, there are often some high- DNA polymerase
energy intermediates along the pathway. + dGTP ~92'
(17.35)
Knowledge and understanding of an enzyme's
mechanism permits the identification of the
high-energy intermediates and the prediction of
the structures of the transition states. Armed
with that knowledge, it is possible to design en-
zyme inhibitors based on the structures of the
various intermediates along the reaction path-
way. Inhibitors designed in this manner are oc-
casionally referred to as mechanism-based in-
hibitors. However, for purposes of this chapter, DNA polymerase catalyzes the transfer of a
we will reserve that term for the covalently bind- complementary deoxynucleoside monophos-
ing inhibitors described in Section 3. phate moiety from its triphosphate (dNTP) to
the 3' hydroxyl of the primer terminus, with
2.5.1 Ground-State Analogs. The ground subsequent release of pyrophosphate (PP,, eq.
state of an enzymatic reaction consists of the 17.35). Initially, phosphonoformate (22) and
substrates and the products. Compounds that phosphonoacetate (23) were identified as in-
mimic the substrate of an enzymatic reaction hibitors of HSV DNA synthesis (92). Detailed
have been examined earlier (Section 2.3) and kinetic studies (931,using DNA polymerase in-
are not discussed again here. There are many duced by avian herpes viruses, showed that
examples of enzymatic reactions that are in- phosphonoacetate (23) was a noncompetitive
hibited by some or all of the reaction products. inhibitor of the four dNTPs. At low levels of
Both epinephrine and S-adenosyl-L-homocys- dNTPs it was a noncompetitive inhibitor of
teine, for example, are inhibitors of phenyleth- the substrate DNA, becoming uncompetitive
anolamine N-methyltransferase (Equation at saturating dNTP levels. It was also found
17.35). In much the same way - as described that (23) was a competitive inhibitor of pyro-
earlier for substrate analogs, product analogs phosphate, with a Ki value in the low micro-
can also be used to obtain information about molar range, in the dNTP-PP, exchange reac-
the binding mechanism of enzymes (90). tion catalyzed by a turkey virus DNA
Phosphonoformate (22) (Fig. 17.17) is an polymerase (93). The inhibition patterns were
antiviral agent that is used clinically in the identical to those observed using pyrophos-
treatment of herpes simplex virus (HSV) and phate as an inhibitor. Therefore it was con-
human cytomegalovirus (HCMV) (91). It acts cluded that (23) acted as an analog of pyro-
as a product analog, blocking the pyrophos- phosphate and competed for the same binding
phate-binding site, in the reaction catalyzed site (93). Later, both (22) and (23) were con-
by DNA polymerase (Equation 17.35). It is firmed as acting as pyrophosphate (i.e., prod-
also effective, using the same mechanism, uct) analog inhibitors of isolated HSV DNA
against HIV reverse transcriptase (91). polymerase (94).
2.5.2 Multisubstrate Analogs. A large num- more tightly than substrate analog inhibitor
ber of enzymatic reactions involve the simul- because it has ( 1 ) the entropic advantage of
taneous binding of two or more substrates at reduced molecularity and (2)an additive bind-
the active site. The bound substrates must be ing contribution from each of the substrates it
in close proximity to each other and positioned mimics. For example, when two single-sub-
in such a way as to facilitate covalent bond strate analog inhibitors bind separately, but
formation or the transfer of a functional group next to each other, two sets of translational
from one substrate to another. Multisubstrate and rotational entropies are lost. However,
analog inhibitors mimic the simultaneous when a bisubstrate analog inhibitor binds it
binding of two or more substrates at the active loses only a single set of translational and ro-
site of the enzyme. The advantage of this, for a tational entropies (57, 60). Further, let us as-
bireactant system, is shown in Equation sume that the bisubstrate analog binds to the
17.36. same two sites as two single-substrate analog
inhibitors. In that case there will be a gain in
entropy from the release of water molecules
from each substrate-binding site, as well as
the favorable enthalpic contributions from the
formation of hydrogen bonds, buried salt
bridges, and so forth in each site. These favor-
able free-energy contributions will be the
same for a bisubstrate analog as for the two
individual inhibitors binding simultaneously.
On the other hand, compared to the binding of
a single-substrate analog, the multisubstrate
There are two ways the two substrates, A analog inhibitor gains favorable binding en-
and B, may bind to the enzyme to form an thalpies and entropies from the additional
-
E A B complex. First, and most likely, they binding site(s),while still losing only one set of
bind individually (in either a random or an translational and rotational entropies. Thus
ordered fashion) with dissociation constants the binding of a multisubstrate analog should
of KA and K,. Second, the substrates may be very tight, without needing any assistance
come together, positioned in such a way as to from transition-state complementarity.
facilitate their subsequent reaction with a dis- Inhibitors that combine two substrates are
sociation constant of KBi.This reactive com- termed bisubstrate analogs, whereas those
plex A B then binds to the enzyme with a
e combining three substrates are termed trisub-
dissociation constant of KMs.In general, the strate analogs and so on, with the former be-
-
formation of A B is entropically unfavorable. ing the most common. The design of a bisub-
However, a bisubstrate analog, designed to strate analog inhibitor ordinarily requires the
mimic A . B, can often be prepared by co- development of two single-substrate analog
valently connecting the corresponding sub- inhibitors of reasonable affinity. The two sin-
strates or substrate analogs with a suitable gle-substrate inhibitors are then connected by
linker group. Linking the two groups effec- an appropriate linker, and the optimal length
tively overcomes the unfavorable entropic of the linker is determined experimentally.
barrier. It has been calculated that an ideal Under normal circumstances, the Ki value for
bisubstrate analog inhibitor can bind up to 10' a bisubstrate analog inhibitor can be expected
times more tightly than the product of the to approximate the product of the Ki values of
substrate-binding constants (i.e., l/KBimay be the two substrate analogs. A guide to areason-
as high as lo-' M).This figure is based on ably achievable Kifor a bisubstrate analog also
entropic considerations and also assumes a may be obtained from the product of the KM
perfect fit of the bisubstrate analog inhibitor values of the individual substrates. For exam-
to the two binding sites on the enzyme (57). ple, if two substrates of an enzymatic reaction
Where does this high affinity come from? A have binding constants in the millimolar
multisubstrate analog inhibitor will bind range, a bisubstrate analog would be expected
glycinamide ribonucleotide N-formylglycinamide ribonucleotide

GAR TFase
___)
tetrahydrofolate
to have a Kivalue in the micromolar range. tion. Several general reviews on multisub-
Note also that, if the enzyme binds substrates strate analog inhibitors have appeared (96-
in a random manner, then a multisubstrate 98), and multisubstrate analogs also receive
inhibitor should exhibit competitive inhibi- some discussion in reviews on transition-state
tion patterns with each substrate it mimics analogs (99-101).
because the binding of the inhibitor should be Glycinamide ribonucleotide transformy-
mutually exclusive with that substrate. If the lase (GAR TFase) catalyzes the transfer of a
enzyme employs an ordered mechanism, then formyl group from N1O-formyltetrahydrofo-
the inhibitor should be competitive with the late to glycinamide ribonucleotide (Equation
first substrate to bind and uncompetitive with
17.37). This is a crucial step in de novo purine
other substrates.
biosynthesis, which is essential for cell divi-
The multisubstrate analog approach to en-
zyme inhibition has the additional advantage sion, and GAR TFase has become a target en-
in that it provides a high degree of specificity. zyme for the deveIopment of antineopIastic
The combination of two or more substrates agents.
will usually produce a unique structure, un- Inglese et al. (102) were able to synthesize
likely to bind to other enzymes that may uti- the bisubstrate inhibitor p-thioGARdidea-
lize any one of the substrates. This approach zafolate (P-TGDDF) (24) (Fig. 17.18). This
has even been used to design isozyme-specific compound combines nearly all the features of
inhibitors (95). It should also be noted that the both substrates, linked by a stable thioether
distinction between a transition-state analog bridge, and was found to inhibit GAR TFase
(Section 2.5.3) and a multisubstrate analog in- with a Kivalue of 250 pM (102). P-TGDDF
hibitor is often quite arbitrary. In fact many acted as a slow, tight-binding inhibitor (Sec-
inhibitors described as transition-state ana- tion 2.4) and the Kivalue was about three
logs are often actually analogs of high-energy times lower than the product of the K, values
reaction intermediates that, in turn, may have of the substrates. More recently, the crystal
structures somewhat akin to those of multi- structure of the complex between BW1476U89
substrate analog inhibitors. However, multi- (25) and GAR TFase was obtained (103).
substrate analog inhibitors are intended to BW1476U89 is another multisubstrate analog
mimic the combined substrates in their and has a Kivalue of about 100 pit4 (104). The
ground-state forms and do not require any structure confirms that the inhibitor binds in
contribution from transition-state stabiliza- those sites identified previously as substrate-
Figure 17.18. Bisubstrate

H
analog inhibitors of GAR-
(25) TFase.
binding sites, and provides a starting point for making ATCase also a target for potential an-
development of even more potent transition- ticancer agents.
state analogs. N-Phosphonoacetyl-L-aspartate (PALA) (26)
The condensation of carbamyl phosphate (Fig. 17.19) was initially designed as a transi-
and L-aspartate, catalyzed by aspartate trans- tion- state analog inhibitor of ATCase (105). It
carbamoylase (ATCase), produces N-carba- was found to have a Kivalue of 27 nM,a value
myl-L-aspartate (Equation 17.38). This is one that is considerably lower than the KMvalues
of the early steps in de novo pyrimidine bio- of 27 pM and 17 rnM for carbamyl phosphate
synthesis, also a requirement for cell division, and L-aspartate, respectively (105). PALA was
0- 0 COOH 0- 0 COOH
I I1 I I II I
0=P-CH2-C-NH-CH-C~2-~O~~ O=P-0-C-NH2 CH2-CH2-COOH
I I
0- O& COOH 0- 0- COOH

I I: I I I I
O=P-0-C---NH-CH-CH2-COOH O=P-0-P-NH-CH-CH2-COOH
I I I I1
0- NH2 0- 0
(29) (30)
Figure 17.19. Putative transition state, substrate, and inhibitors of aspartate transcarbamylase.
0- analog (301, perhaps leading to its weaker-

I than-expected binding.
O=P-0- COOH
I I The statins are a group of cholesterol-low-
0 CH2 ering agents that have become some of the
I + I
largest selling drugs in the world. They lower
O=C H2N-CH
I I
serum cholesterol levels by competitively
inhibiting 3-hydroxy-3-methylglutaryl-coen-
carbamyl phosphate baspartate zyme A (HMG-CoA) reductase, a key enzyme
in cholesterol biosynthesis (Equation 17.39).
COOH
I
aspartate
0 CH2
I
7
transcarbamvlase k-HN-CH
I
+pi
CoAS OH
COOH
(17.39)
found to inhibit cell growth in vivo (106) and, HMGCoA
eventually, underwent clinical trials as an an- + 2H' ------+
reductase
ticancer agent (107).
PALA provides an example of the difficul-
ties in distinguishing between a multisub- mevalonic acid
strate analog and a transition-state analog. As
shown in Fig. 17.19, in effect PALA (26) com-
bines two fragments, an analog of carbamyl + CoASH + 2NADP'
phosphate (27) and succinate (28). The tight
binding of PALA also suggested it was a poten- Several statin inhibitors of HMG-CoA re-
tial transition-state analog. However, succi- ductase are shown in Fig. 17.20. They consist
nate has a Ki value of 90 fl,and the product of rigid, hydrophobic groups connected to an
of the Ki values of succinate and carbamyl HMG-like group that, in inhibitors such as
phosphate is 24 nM,which is almost identical mevastatin (compactin) (31),simvastatin (9)
to the Ki value of PALA (105).As shown in Fig. (Fig. 17.11, and the dichlorophenol derivative
17.19, the transition-state structure (29) for (32) is present in the form of a lactone. In vivo,
the ATCase-catalyzed reaction is tetrahedral. the lactone is converted to the free acid, as
The pyrophosphate analog (30) was expected shown in Fig. 17.20 for mevastatin (33). More
to provide a much better mimic of the transi- recently developed statins, such as fluvastatin
tion state, yet its Ki value of 0.24 f l was ten- (34) and atorvastatin (35),are prepared as the
fold higher than that of PALA (108). It is not free acids. These inhibitors have Ki values in
clear why there is this discrepancy, but a re- the low nanomolar range (110), significantly
cent X-ray structure of the ATCase-PALA lower than the KM value of the substrate
complex identified several groups that are po- HMG-CoA, which is in the micromolar range
sitioned to bind to a tetrahedral transition (110,111). Given that these inhibitors did not
state (109). Two of these, the side chain of appear to be transition-state analogs, Naka-
Gln137 and the backbone carbonyl of Pro266, mura and Abeles (112,113) conducted a num-
were positioned to interact with the amino ber of experiments to determine the basis
group of the putative transition state (29). of the enhanced aMinity of, in particular, (31)
However, these groups would not be expected and (32).
to interact so well to the analogous oxygen Both mevastatin (31)and (32) were found
atom of the pyrophosphate transition-state to bind to the hydroxymethylglutarate portion
HO
COO-
H3C
CI
"O y 'coo-
Figure 17.20. Statin inhibitors of HMGCoA-redudase.
of the active site, but not the NADPH region, some cases (e.g., mevastatin), the hydrophobic
whereas only (31) bound to the coenzyme A group overlaps the CoASH site and in others,
portion. D,L-Mevalonateand D,L-3,5-hydroxy- such as the dichlorophenol group of (32),it
valerate, used as analogs of the upper portion does not (112). The structure of the statin is
of the statins, were both poor inhibitors, with analogous to that of a bisubstrate inhibitor, in
Ki values in the millimolar range; however, that there is linked binding to two distinct
analogs of the hydrophobic decalin region of binding sites on the enzyme, leading to greatly
mevastatin showed no inhibitory effect (112). enhanced inhibition of the enzyme. For mev-
Given that the Ki value for mevastatin is al- astatin, the entropic advantage provided by
most eight orders of magnitude lower than linking the mevalonate and decalin portions
that of D,L-3,5-hydroxyvalerate, it is clear that together is estimated to be approximately 5 x
the hydrophobic lower portion (and its cova- lo4 M (113). This is quite a reasonable en-
lent link) must play a significant role in the hancement, given that the theoretical maxi-
binding of (31)and, by implication, the bind- mum is 10' M (57),and it has been suggested
ing of all the statins. Presumably, the upper that such a "hydrophobic anchor" is responsi-
portion of the inhibitor is necessary for speci- ble for the enhanced binding of some inhibi-
ficity and the hydrophobic region for binding tors of alcohol dehydrogenase and adenosine
affinity. The hydrophobic region must be rel- deaminase (113).
atively nonspecific because a variety of hydro- Although this explanation appeared quite
phobic groups (Fig. 17.20) are accepted. In reasonable, it was thrown into doubt when X-
H ~ N - A S ~ - A ~ ~ - V ~ ~Tyr-
- Ile-His-Pro-Phe-His-Leu- COO-
Angiotensin I
I converting
angiotensin
enzyme
(ACE) (17.40)
+
H ~ N - A S ~ - A ~ ~ - V ~ ~~~r-11e-
- is-pro- he- coo + HsN-His-Leu- COO-
Angiotensin I1
ray structures of HMG-CoA reductase com- ACE, which had been isolated from a South
plexed with both substrates and products American pit viper (117).
were obtained (114, 115). These structures At that time the structure of ACE was un-
showed that, if the statins bound so the HMG- known, although it had been identified as a
like groups bound the HMG-binding pocket of zinc metalloprotease. It was surmised that its
the active site, the bulky hydrophobic groups mechanism and active site may resemble that
of the statins would clash with the residues of another metalloprotease, carboxypeptidase
lining the narrow pocket into which part of the A, whose X-ray structure was known. (R)-2-
coenzyme A bound (115). However, recently, Benzylsuccinic acid (36) (Fig. 17.21) had been
Istvan and Deisenhofer have obtained X-ray identified as a potent inhibitor of carboxypep-
structures of HMG-CoA reductase bound to tidase A, and it was suggested that (36) resem-
six individual statins, including (9), (31), (34), bled the collected products of the hydrolysis
and (35) (116). This study showed the sub- reaction (Fig. 17.21). In other words, (36) was
strate-binding pocket rearranges to accommo- a biproduct analog and, not unexpectedly, it
date the statins, that the statins do bind to the was found to bind with an affinity resembling
HMG-binding region, that a shallow hydro- the combined afhities of the two products
phobic groove now accommodates the hydro- (118). Carboxypeptidase A appeared to have
phobic groups, and that none of the NADP(H)- three main interactions with (36). Two sub-
binding pocket is occupied (116). In toto, the strate-binding sites bound the phenyl group
structural studies supported all interpreta- and one carboxylate, and the Zn2+ ion, usually
tions made some 15 years earlier based on ki- coordinated to the carbonyl of the amide bond
netic studies, and provided definitive evidence being cleaved, was now bound to the second
for a hydrophobic anchor enhancing the bind- carboxylate. Combining those suggestions
ing of the mevalonate portion of the statins. with studies with viper venom peptides, indi-
The evolution of the angiotensin convert- cating that a C-terminal proline was effective
ing enzyme (ACE) inhibitors is an illuminat- in inhibiting ACE, a number of carboxyal-
ing story in the development of enzyme inhib- kanoylproline derivatives were tested as ACE
itors as therapeutic agents. As shown in inhibitors (119). Of these, the succinyl-L-pro-
Equation 17.40, ACE catalyzes the conversion line derivative (37) was found to be the most
of angiotensin I to angiotensin 11. effective, with an IC,, value of 330 pM. Given
Angiotensin 11, itself a potent hypertensive that one carboxylate bound to the Zn2+ ion, a
agent, also stimulates the release of a second better zinc ligand, a thiol group, was substi-
hypertensive agent, aldosterone. In addition, tuted for this carboxylate, resulting in (38)
ACE catalyzes the cleavage of the nonapeptide with the IC,, value now reduced to 0.2 pM.
vasodilating agent, bradykinin (not shown). Finally, after taking into account the differ-
Therefore an ACE inhibitor was seen to have ences between the active sites of ACE and car-
the potential to limit three hypertensive ac- boxypeptidase A, captopril(39) was prepared.
tions. This premise was validated by in vivo Captopril was found to be a competitive inhib-
results with teprotide, a peptide inhibitor of itor of ACE, with a Ki value of 1.7 nM,and was
CH2 CH2
I carboxypeptidase A I I
-02C-C-H -02C-C-H -02C-C-H
I H20
I I
NH NH2 CH2
I 0
C=O HO\C//
+
I
I I HO
R
R
(39)
1
COOH
0
RO
/
\
C-CH-
-
CH~
I
H
CH3 0
'
N-CH-C-N
" 3COOH
0
\
C-CH-N-CH-C-N
/ -- H
RO
CH~ COOH
I
Figure 17.21. (a) Biproduct analog inhibitor of carboxypeptidaseA and (b) several ACE inhibitors.
the first ACE inhibitor to be marketed. It was also possible that enalaprilat acts as a transi-
not long before attempts were made to make tion-state analog (Section 2.531, thereby ac-
capropril more productlike, with the resultant counting for its performing as a slow-tight-
development of enalaprilat (i.e. enalaprilat) binding inhibitor (121). Following enalapril,
(40). Enalaprilat was found to be a slow-tight- many more ACE inhibitors have been devel-
binding inhibitor (Section 2.4) of ACE with a oped mainly aimed at increasing oral bioavail-
Ki value below 1 nM,(120), but it was poorly ability, removing side effects, or improving
absorbed orally. However, the ethyl ester metabolism.These include ramipril (42), the
(enalapril), (41) acted as a prodrug, had good ester prodrug of ramiprilat (43), with 10 times
oral activity, and was marketed. Note that it is better bioavailability than that of enalapril.
Ramaprilat was also shown to be a slow-tight- zyme inhibitors. Such compounds, referred to
binding inhibitor of ACE, operating by mech- as transition-state analogs, can theoretically
anism B, with Ki*(Equation 17.30) of 7 pM have ratios of the binding constants of inhibi-
(122). A more detailed discussion of the devel- tor to substrate (Ki/Ks)on the order of lop8to
opment of the ACE inhibitors is available 10-14. In addition, transition-state analogs
(121). may have the further advantage of reduced
molecularity, as outlined earlier (Section
2.5.3 Transition-State Analogs. As a chemi- 2.5.2) for multisubstrate analog inhibitors.
cal reaction proceeds from substrates to prod- Several reviews on the theory and general as-
ucts, it will pass through one or more transi- pects of transition-state analog inhibitors are
tion states. The energy barrier imposed by the
available and are recommended for a more
highest energy transition state controls the
complete understanding of this topic (37, 96,
overall rate of the reaction. Enzymes bring
about rate enhancements of 1010-1015 (123) 99, 100, 124, 125).
by lowering this energy barrier. They do this The design of a good transition-state mimic
by having a greater affinity to the structure of is quite challenging. It requires, at the least,
the transition state than to the structures of sufficient knowledge of the mechanism of the
either substrates or products. Although an en- target enzyme to predict transition-state
zyme may have good affinity for its substrate, structure(s). This is why transition-state ana-
as evidenced by a low dissociation constant logs are sometimes (but not in this review)
(K,, Equation 17.411, for the Michaelis (E S) referred to as mechanism-based inhibitors. A
complex, the enzyme can further stabilize the detailed knowledge of the true energy profile,
inherently unstable transition state, for exam- including details such as the existence of dis-
ple, by forming extra electrostatic or hydrogen tinct chemical steps, high-energy intermedi-
bonds, by providing more effective hydropho- ates, and their associated transition states, is
bic interactions, or by using structural re- also useful (126). Further, by definition, the
arrangements to exclude solvent, thereby transition state is unstable, often highly
strengthening existing electrostatic contacts. charged, and possesses partially broken/
formed covalent bonds. Designing a stable
compound that will closely mimic a transition
state is impossible. However, the Hammond
postulate states that the transition state be-
tween a reactant and a high-energy reaction
intermediate wiIl resemble the intermediate
rather than the reactant. It is possible to de-
sigdsynthesize an analog of a high-energy in-
termediate. Indeed, the majority of so-called
transition-state analogs are actually analogs
of high-energy reaction intermediates. Al-
Simple transition-state theory states that though a clear distinction exists, the design
the rate of an enzyme-catalyzed reaction is process is, for all practical purposes, the same.
correlated with the rate of a noncatalyzed re- It should also be noted that an enzyme is
action by the same factor as the affinity of an designed to initially recognize the features of
enzyme for the transition state to the affinity its substrates. Often substrate binding brings
of an enzyme for a substrate (Equation 17.41) about a conformational change in the enzyme
(99). that will then maximize the'attractive forces
Therefore, the magnitude of enzymatic between the enzyme and transition state. The
catalysis (k,/k,) is related to the enhanced transition-state analog may not possess those
binding of the transition state to the enzyme features of the substrate that facilitate rapid
(K,IK,). Compounds that can take advantage binding, even though its affmity for the en-
of this enhanced binding to the transition zyme is extremely high. Although some tran-
state can prove to be potent and selective en- sition-state analogs bind rapidly to enzymes,
CBz- NHCH2 C-R

I1
Figure 17.22. (a) Thermolysin-catalyzed hydrolysis of peptide analogs showing putative transition
state, (b) phosphonamidate peptide analog, and (c) fluoroalkane peptide analog.
others bind slowly and show the properties of quantitatively the correlation between the en-
the slow-binding inhibitors described earlier hanced rates of enzymatic reactions and the
in Section 2.4. tight binding of transition-state analogs.
Slow binding, tight binding, or structural In an attempt to develop stringent criteria
similarity to the assumed transition-state for the distinction between transition- state
structure are not, in themselves, sufficient cri- and ground-state analogs, Bartlett and Mar-
teria to establish that an inhibitor is a true lowe (127) overcame some of these inherent
transition-state analog (127). Methotrexate difficulties by comparing the binding affinities
(7), for example, is an extremely high-affinity of a series of substrate analogs with those of
(K,= 58 pM), slow-binding inhibitor of dihy- the corresponding transition-state analogs.
drofolate reductase (128). On the surface, it One consequence of Equation 17.41 is that, if
would appear that methotrexate could be clas- there is a change in structure of a substrate
sified as a transition-state analog. However, that alters kc,JKM without altering the non-
crystallographic studies have shown that enzymatic rate of reaction, then an analogous
methotrexate binds with its pterin ring in the structural change in the transition-state
opposite orientation to that of the substrate, mimic should bring about a similar change in
dihydrofolate (129, 130). To distinguish be- K,. Put simply, there should be a linear rela-
tween a high-affinity, ground-state analog and tionship between the values of Kifor the tran-
a putative transition-state analog requires a sition-state analog and k,,JKM for the corre-
careful appraisal. There is a fundamental dif- sponding substrate. Bartlett and Marlow
ference between the entropy change of a uni- (127) designed a series of dipeptide analog
molecular enzymatic reaction and that of a substrates (44) (Fig. 17.22) for thermolysin in
multimolecular solution reaction (131). In ad- which the structural variation was remote
dition, the appropriate rate constant for the from the reactive center and therefore un-
nonenzymatic reaction is often either not likely to affect the nonenzymatic reaction
available or hard to obtain (132, 133). These rate. The reaction catalyzed by thermolysin is
factors combine to make it difficult t o evaluate proposed to proceed by the tetrahedral transi-
Table 17.6 Correlation of Ki Values for Inhibitors of Thermolysin with K , and K&,, Values
for the Corresponding Substratesa
0. G - 0 H CH&H(CH&
Y7
\
\C1
\C-R Inhibitor Data
Corresponding Substrate Data
CBz-NHCH2 NH
II
R = ~-Ala
R = NH,
R = Gly
R = L-Phe
R = L-Ala
R = L-Leu
~ \ c ~ H 2 c H ( c H ' i 2
CBz-NHCH2 C-R
II
R = Gly
R = L-Ala
R = L-Leu
R = L-Phe
"Data are from Ref. 127.
tion state (451, and with their long P-0 nase. Inhibitors of this enzyme have been used
bonds, the phosphonamidate compounds (46, as immunosuppressants and are also potential
Table 17.6) were expected to a d as transition- antitumor agents, whereas lack of adenosine
state analog inhibitors. It was found that the deaminase results in severe combined immu-
Ki values for the putative transition-state an- nodeficiency disease (SCID).
alog inhibitors correlated linearly with the Adenosine deaminase (ADA), which cata-
KMlkCatvalues of the corresponding sub- lyzes the conversion of adenosine to inosine
strates, although no correlation was found be- (Equation 17.42), is an extremely proficient
tween Ki and KM (Table 17.6). The fact that enzyme, providing a rate enhancement of
substrate binding (K,) was relatively unaf- more than 12 orders of magnitude (123). The
fected by a change at a remote site was not enzyme-catalyzed reaction is thought to pass
unexpected, but the observation that the bind- through an unstable hydrated intermediate
ing of the phosphonamide inhibitors was (48) (Fig. 17.23), with a KT (Equation 17.41)
greatly affected suggests that these inhibitors in the region of 10-l7 M (123). Clearly, even a
were, indeed, transition-state rather than crude analog of (48) would have the potential
ground-state analogs. Conversely, the Ki val- to be an extremely powerful inhibitor of ADA.
ues for a series of fluoroalkene isosteres of the The structures of several inhibitors of ADA
same substrates (47) (Fig. 17.22) correlated are shown in Fig. 17.23. Of these, the antibi-
strongly with KMbut not KM/kCat(Table 17.6), otics coformycin (49) and (R)-deoxycoformy-
indicating that the latter inhibitors were cin (pentostatin, 50) were found to be potent
ground-state analogs (134). This approach has ADA inhibitors, with Ki values of 1 X 10-l1 M
also been used to confirm that phosphonic acid (136) and 2.5 x 10-l2 M (137), respectively.
peptides were transition-state analog inhibi- The KMfor adenosine is around 30 pJ4 (138,
tors of pepsin (135). 139),whereas the Ki of the product, inosine, is
One of the most popular targets for design M. Thus, the two antibiotics show at
of transition-state analogs is adenosine dearni- least lo6-foldgreater affinity for ADA than for
H OH H OH
HN
I I I
Ribose Ribose 2'-Deoxyribose
(48) (49) (50)
H OH
Hz0
k q = 1 . 1 x107
I I I
Ribose Ribose Ribose
(53) (54) (55)
Figure 17.23. Reaction intermediate and inhibitors of the readion catalyzed by adenosine deaminase.
respectively, are clearly ground-state analogs.

The differences in binding affinities of (R)-
and (S)-deoxycoformycin translate to a differ-
ence in binding energy of almost 10 kcal/mol
and provide an estimate of the energy that can
be applied to substrate distortion in formation
Ribose of the transition-state complex (139).
Purine ribonucleoside (53) was initially
adenosine
thought to bind to ADA as a ground-state an-
OH alog, with an apparent Ki of 3 p M (138, 140).
.,TN,
I
N NI
+NH,
However, it was observed that the structure of
the ligand and the enzyme were perturbed
when purine riboside bound to ADA. 13C-
NMR spectroscopy showed that the ADA-
bound purine riboside was sp3 hybridized at
Ribose C-6 (141).The NMR and W spectra suggested
that it was the hydrated form of purine ribo-
inosine
side (54) that was binding to ADA and, using
the unfavorable equilibrium constant for hy-
the substrate, suggesting that they are acting dration in solution (1.1 X lop7M, Fig. 17.231,
as transition-state analogs. By contrast, (5')- a true Ki value of 3 x 10-l3 M could be calcu-
deoxycoformycin (51) and &ketodeoxycofor- lated for (54) (142). Given the low concentra-
mycin (521, with Ki values of 33 and 40 a, tion of the free hydrate in solution, and the
Figure 17.24. (a) Putative transition state for the dihydroorotase reaction, and (b) boronic acid
transition-state analogs.
rapid onset of inhibition, it appears that pu- hibitor (pentostatin) has proved to be of ther-
rine riboside (53) itself initially binds and is apeutic benefit.
then rapidly converted to (54) in the active site Inhibitors of pyrimidine and purine biosyn-
(141). This result, along with the high affini- thesis are used as antineoplastic agents. As a
ties of (R)-coformycinand (R)-deoxycoformy- consequence, dihydroorotase, which catalyzes
cin, argues that the reaction proceeds by a ste- the third step of de novo pyrimidine biosynthe-
reospecific, direct attack of water rather than sis, the conversion of carbamyl aspartate to
the double-displacement mechanism that also dihydroorotate (Equation 17.43), is a target
had been proposed (141). More recently, an for therapeutic intervention.
X-ray study on adenosine deaminase, which
had been crystallized in the presence of purine
-
ribonucleoside, confirmed that it was the hy-
drated species of purine ribonucleoside that
I
H$Oc) dihydroorotase
was present in the active site (143). Further, a
triad of a zinc atom, a histidine residue, and an
aspartic acid residue ensured that the binding
was stereospecific, with the 6R isomer (55)be-
ing favored. carbamyl aspartate
The adenosine deaminase story, in many
ways, provides a perfect example of the gen-
eral principles of enzymatic catalysis and the
utility of enzyme inhibitors. ADA is an ex-
tremely efficient catalyst, producing a rate en-
hancement of 12 orders of magnitude. 6R-Hy-
droxy-1,6-dihydropurine riboside (55) has an
affinity for ADA about 8 orders of magnitude
greater than that for substrates or products; dihydroorotate
that is, it expresses a substantial fraction of
the free energy of binding that separates the The reaction is thought to proceed through
transition state from the ground state in an the tetrahedral-activated complex (56) (Fig.
enzymatic reaction. Evidence of the extraordi- 17.24), which is a highly charged, unstable
nary ability of an enzyme to discriminate be- sp 3 carbon species (144, 145). At around neu-
tween stereoisomers is provided by the lO7- tral pH, compound (57), a boron-containing
fold difference in binding affinities of the analog of carbamyl aspartate, rearranges to
8R-OH (50) and 8s-OH (51)stereoisomers of the stable, tetrahedral boronic acid derivative
2'-deoxycoformycin. Inhibitors were used to (58). The affinity of (58) for dihydrooro-
differentiate among several potential reaction tase (Ki= 5 a) was found to be 10-fold
mechanisms for ADA and, finally, an ADA in- greater than that of the carbamyl aspartate
KM= 50 pM, indicating that (58) was proba- tane derivative (63), in which the chair-chair
bly acting as a transition-state analog (145). conformation is fixed, was about the same as
Tetrahedral boronic acid structures are sta- that of (61). Taken together, this implied that
ble, unlike the analogous sp3 carbon species, the reaction proceeded through a chairlike
and boronic acid derivatives of substrate pep- transition state (147).
tides have proved to be quite potent inhibitors This approach was later refined by Bartlett
of a variety of proteases, particularly serine and Johnson, who suggested that IC5dKMra-
proteases (146). tios of 7 for compound (61) and 12 for com-
Chorismate mutase catalyzes the conver- pound (63) indicated that these inhibitors
sion of chorismate to prephenate (Equation were not particularly good transition-state an-
17.44). This reaction is unusual, in that it is alogs (148). In an attempt to improve potency,
the only pericyclic [3,3] sigmatropic rear- and to further define the stereochemistry of
rangement (Claisen rearrangement) that is the transition state, they synthesized several
catalyzed by an enzyme. compounds including the exo- and endo-car-
boxy unsaturated oxabicylic ethers, (64) and
(65), respectively (148). The exo-compound
-chorismate
mutase
(64) was not significantly better than its satu-
rated carbocyclic analog (61), but the endo-
derivative (65) bound chorismate mutase
some 100-fold more tightly than did chorismic
acid under the same conditions, with a Ki
value of 120 nM (148,149). Later, monoclonal
antibodies elicited against (65) were found to
chorismate be effective catalysts for the conversion of cho-
0 (17.44)
rismate to prephenate, with rate enhance-
ments of 200-fold in one case (150) and 10,000-
fold in another (151). In both instances it was
suggested that the rate enhancement was at-
tributable to increased binding of the transi-
tion state by the antibody (150, 151).
X-ray structures are now available of the
complexes of (65) with two chorismate mutase
prephenate enzymes (152, 153), as well as with the less-
efficient catalytic antibody (154). Although
Although chorismate mutase does provide each active site was found to employ a differ-
a rate enhancement of 2 x lo6 (1471, this uni- ent constellation of interactions with (651,the
molecular reaction readily occurs without en- dissociation constants for the binding of (65)
zyme, under mild conditions. The reaction to the three proteins were strikingly similar,
was expected to pass through a chairlike tran- ranging from 0.6 to 3 pM(153,154). However,
sition state (59)(Fig. 17.25) but early molecu- the micromolar affinity of (65) for both en-
lar orbital calculations indicated that the boat- zyme and antibody is considerably weaker
like transition state (60) was not out of than might be expected for a good mimic of the
the question (147). In an attempt to define transition state, and the antibody is not a par-
the transition-state structure, several com- ticularly efficient catalyst. Wiest and Houk
pounds, each designed to mimic a putative (155) have calculated that the bond lengths for
transition state, were synthesized and tested the breaking and forming bonds in the transi-
as chorismate mutase inhibitors (147). The tion state are considerably longer than those
enzyme was found to be inhibited by the exo- for (65), and the neutral inhibitor does not
carboxy nonane (61), with an apparent Ki mimic the charge separation that builds up in
value of 3.9 x lop4M. Conversely, the endo- -
the transition state. Although the two en-
carboxy nonane (62) did not inhibit the enzyme-active sites have evolved to complement
zyme. The apparent Ki value of the adaman- the larger, polarized transition state, the anti-
Figure 17.25. (a) Putative

transition states for the choris-
mate-prephenate rearrange-
ment and (b) structuresof poten-
-
tial transition-stateanalogs.
body has no residues positioned to stabilize --SH group of cysteine, and the --COOH
the polar-transition state. Further, the active groups of aspartic and glutamic acid residues.
site is smaller and makes more van der Wads Other nucleophilic groups include the E-amino
contacts with the inhibitor, again features group of lysine and the imidazole ring of histi-
likely to impede catalysis. These features pro- dine. In some cases the -NH, and --COOH
vide evidence for the innate difficulties associ- groups of the enzyme's N- and C-termini, re-
ated with designing both good transition-state spectively, are also active-site nucleophiles,
analogs and efficient catalytic antibodies. whereas enzymic cofactors may also provide
targets for covalently binding inhibitors. Argi-
3 RATIONAL DESIGN OF COVALENTLY nine is the only common amino acid that has
BINDING ENZYME INHIBITORS an electrophilic side chain and it also can be
modified with suitable nucleophilic agents.
For the purposes of this chapter we have di- Kyte has recently provided an excellent over-
vided covalently binding enzyme inhibitors view of the general area of active-site modifi-
into categories according to Table 17.4. cation and labeling (156).
Pseudoirreversible inhibitors are discussed The first group of covalently binding en-
separately and the others are, in order of in- zyme inhibitors, the chemical modifiers, are
creasing specificity, chemical modifiers, afKn- small organic molecules, generally eledro-
ity labels, and mechanism-based inhibitors. philes, that are used to modify the enzyme's
The targets for these inhibitors are the chem- side chains in such a way as to produce a stable
ically reactive groups found within the en- covalent bond. These are often used to study
zyme's active site. These groups, in the major- enzyme inactivation and to identify residues
ity of cases, are nucleophiles such as the --OH potentially involved in binding and catalysis.
groups of serine, threonine, and tyrosine, the Some of the commonly used reagents are
3 Rational Design of Covalently Binding Enzyme Inhibitors 75 5
Table 17.7 Commonly Used Reagents for Chemical Modification

Residue
Targeted Reagent Other Residues Labeled
Lysine Acetic anhydride These reagents can also react with
Isothiocyanates the N-terminal amino group
Trinitrobenzenesulfonate (TNBS)
Cyanate
Histidine Diethylpyrocarbonate (DEPC) DEPC should be used at neutral
pH to minimize reaction with
lysines, cysteines, and tyrosines
Cysteine Iodoacetamide, iodoacetate Iodoacetamide has the potential to
p -Hydroxymercuribenzoate modify histidines and lysines
Methyl methanethiosulfonate
Ellman's reagent (DTNB)
N-Ethylmaleimide
Arginine Phenylglyoxal Phenylglyoxal can react with lysine
Butanedione Butanedione should be used in the
dark to prevent reaction with
tryptophans, histidines, and
tyrosines
Tyrosine Tetranitromethane Chloramine T also modifies
Chloramine T histidines and methionines
Tryptophan N-Bromosuccinimide
2-Hydroxy-5-nitrobenzyl bromide
Serine Diisopropylfluorophosphate
Halomethyl ketones
Aspartic Acid Carbodiimides
Glutamic Acid Trimethyl oxonium fluoroborate
Isoxazolium salts
listed in Table 17.7. These compounds are zyme's active site in a noncovalent fashion,
chemically reactive and may lead to the modi- like rapid reversible inhibitors. However,
fication of both catalytic and nonessential res- upon formation of the enzyme-inhibitor com-
idues. As a consequence, experimental design plex (E I), they react by various mechanisms
e
(such as choice of reagent and reaction condi- with one or more amino acid residues in close
tions, use of substrate protection, etc.) is of proximity in the enzyme's active site. This re-
utmost importance in carrying out and inter- sults in covalent bond formation between the
preting chemical modification studies. Al- enzyme and the inhibitor (E-I) (Equation
though inhibitors of this type are not the 17.45).
prime focus of this chapter (and are not dis-
cussed further), it should be noted that most of
the kinetic equations that apply to affinity la-
bels also apply to chemical modifiers, and
there are a number of texts available that
cover this topic (40, 157, 158).
Although the organic modifiers are usually Usually the inhibitor contains an electro-
not specific for a given enzyme, the second philic moiety that labels amino acids contain-
group, the affinity labels, have a degree of ing nucleophilic groups. However, in some
specificity built in. Sometimes described as ac- cases, a nucleophilic species may be formed,
tive-site directed, irreversible inhibitors, af- which can react either with arginine or with
finity labels are usually substrate or product any tightly bound organic or inorganic low
analogs that contain an additional chemically molecular weight cofactors possessing electro-
reactive moiety. They first bind to the en- philic sites. Unlike the mechanism-based in-
hibitors described below, affinity labels do not active enzyme (E).In some instances the reac-
require activation by catalysis at the enzyme's tion may occur between the reactive species
active site. Most often, the covalent bond for- and the enzyme's cofactor, again resulting in
mation occurs by an S,2 alkylation-type inactivation of the enzyme.
mechanism, Schiff base formation, or acyla- It should also be noted that the activation
tion (156, 159). of a mechanism-based inhibitor by its target
Affinity labels, some of which have become enzyme is, formally, an example of metabolic
successful therapeutic agents, are often used activation. However, there is a clear distinc-
to identify catalytically important residues. In tion between the activation of a mechanism-
some cases, by examining the pH dependency based inhibitor described above and the meta-
of the rate of inactivation, it is possible to de- bolic activation of a prodrug. In the latter case,
termine the pK, of the labeled residue. Again, an inactive precursor is metabolized in the
there are a number of excellent reviews on this body (either chemically or enzymatically) to
topic (160-163), including a complete volume metabolites that possess the desired activity.
in the Methods in Enzymology series (159). For example, Acyclovir (3a)must be metabol-
Recently, Pratt (164) and Krantz (165) ically converted to the triphosphate (3b)and
have suggested that any inactivator that uti- released into the medium before it will inhibit
lizes an enzyme's mechanism, in the broadest viral DNA polymerase. Further discussion on
sense, should be described as a mechanism- prodrugs may be found in volume 2, chapter
based inhibitor. Although this is not unrea- 14.
sonable, we have, for the purposes of this
chapter, adopted the more narrow view of Sil- 3.1 Evaluation of the Mechanism of
verman (166). In this view, mechanism-based Inactivation of Covalently Binding
inhibitors (also called suicide substrates, Tro- Enzyme Inhibitors
jan horse inactivators, enzyme-induced inacti- The inherent complexity of the inactivation
vators, k , inhibitors, and latent inactivators) mechanisms of covalently binding enzyme in-
are described as unreactive compounds, the hibitors makes it necessary to evaluate their
structure of which usually resembles that of a proposed modes of action carefully. An over-
substrate or product of the target enzyme, and view of the criteria for the study of irreversible
that undergo a catalytic transformation by the inhibitors is provided below.
enzyme to species that, before release from the
active site, inactivate the enzyme. Thus, these 3.1.1 Criteria for the Study of Affinity La-
compounds usually contain a latent, reactive bels. The evaluation of affinity labels is based
functional group that gets activated during on the fulfillment of the following criteria:
the normal catalysis of the enzyme. Upon for-
mation of the initial reversible enzyme-inhib- 1. Irreversible inactivation. Inactivation by
.
itor complex E I, the enzyme starts its nor- affinity labels leads to irreversible covalent
bond formation between the enzyme and
mal catalytic cycle, leading in a usually rate-
determining step to the formation of a highly the inhibitor. Unlike the complex between
.
reactive species, E I' (Equation 17.46). and enzyme and a rapid, reversible inhibi-
tor, the covalent enzyme-inhibitor complex
is no longer in equilibrium with free en-
zyme and inhibitor. Therefore, exhaustive
dialysis or gel filtration of the covalent en-
zyme-inhibitor complex cannot lead to the
recovery of free, active enzyme. However,
such experiments do not allow distinction
The reactive species can either react with among tight-binding, noncovalent inhibi-
one of the enzyme active-site amino acid resi- tors, affinity labels, and mechanism-based
dues, to form a covalent bond between the en- inactivators.
zyme and the inhibitor (E-I"), or be released 2. Time- and concentration-dependent inacti-
into the medium to form product (P) and free vation showing saturation kinetics. The
3 Rational Design of Covalently Binding Enzyme Inhibitors
0L \
Time
Figure 17.26. Pseudo first-order inactivation ki-
netics of an active-site directed irreversible inhibi-
tor.
Figure 17.27. Kitz and Wilson plot.
scheme described by Equation 17.45 is formation of the initial reversible enzyme-

analogous to that described for a simple en- inhibitor complex (k,) is significantly
zyme-catalyzed reaction (Equation 17.9). greater than the rate of formation of the
This scheme can be described by Equation covalent enzyme-inhibitor complex (k,).
17.47, which is analogous to the Michaelis- Consequently, a higher concentration of in-
Menten equation (Equation 17.10). hibitor will not lead to an increased rate of
inactivation. The KI value represents the
concentration of inhibitor leading to the
half-maximum rate of inactivation (in anal-
ogy to a KMvalue), and kinactis the maxi-
mum rate of inactivation at the point of
saturation (in analogy to kcat). To deter-
According to Equation 17.47, an affinity lamine the K, and kinactvalues, the enzyme is
bel should exhibit time- and concentration-de- incubated at various subsaturating concen-
pendent inactivation. The rate of inactivation trations of the inhibitor, from which the
is proportional to low concentrations of inhib- half-life of inactivation at each inhibitor
itor, whereas at high inhibitor concentrations concentration is deduced. Using Kitz and
saturation occurs and no further increase in Wilson plots (168), the half-life of inactiva-
the rate of inactivation is observed. A typical tion at each inhibitor concentration is plot-
pseudo first-order plot of log enzyme activity ted against l/[I]. A typical plot is illustrated
vs. time is illustrated in Fig. 17.26. In some in Fig. 17.27. The y-intercept represents
cases nonlinear plots may be obtained, partic- the half-life of inhibition at infinite inhibi-
ularly for mechanism-based inhibitors (166, tor concentration, with kinact equal to
167). 0.693/tl,,. KI can be determined from the
3. Saturation kinetics and determination of x-intercept, which is equal to -l/KI. If no
K, and kin,,. To distinguish the rate and saturation occurs with a tested inhibitor,
binding constants of rapid reversible inhib- the curve will intercept at the origin of the
itors (k, and K,, respectively) from the graph, implying that kinactis much faster
rate and binding constants of irreversible than the formation of the initial reversible
inhibitors, the terms kinact and KI have enzyme-mechanism-based inhibitor com-
been used. To determine kinactand K,, sat- plex. If this is observed, one might use a
uration kinetics must be obeyed. Satura- lower temperature or a different pH to
tion is reached when all of the free enzyme lower kinact.It should also be noted that, in
is converted to the reversible enzyme-in- general, if the affinity label is reacting with
.
hibitor complex E I. At that point, the rate an ionizable group that is involved in catal-
of inactivation is independent of k,/k- ysis, then the pH dependency of kinacJK1
(Equation 17.45),assuming that the rate of should mirror that of kcat/KM.
Increasing [S]
constant [I]
Figure 17.28. Usinga substrate to protect an

enzyme from inactivation by an active site-di-
rected irreversible inhibitor. Time
4. A bindingstoichiometry of 1:l of inhibitor -

enzyme is protected as the E S complex. A
to the enzyme's active site. In general, typical plot of the log of enzyme concentra-
complete inactivation of an enzyme re- tion vs. time at different substrate concen-
quires the binding of one mole of inhibi- trations is shown in Fig. 17.28.
tor per mole of enzyme active site. Excep- 6. Verification of covalent bond formation. In
tions can be certain multimeric enzymes many cases it can be difficult to differenti-
that are inactivated by binding of only ate between a covalently binding enzyme
one-half mole of inhibitor -per mole of inhibitor and a very tight but nonco-
enzyme subunit, a phenomenon called valently binding inhibitor. Although
half-site reactivity. The stoichiometry of strongly denaturating conditions may not
binding is usually determined by incubat- lead to the release of tight, noncovalently
ing- an excess of radiolabeled inhibitor bound inhibitors, the covalent linkage be-
with the enzyme to ensure complete irre- tween an enzyme and its inhibitor can
versible inactivation, followed by either sometimes be quite labile to nucleophiles
exhaustive dialysis or gel filtration. The and extremes of pH. A frequently used
binding stoichiometry of the obtained en- method to determine the covalently modi-
zyme-inhibitor complex in the absence of fied amino acid residue of an enzyme's ac-
free inhibitor is then examined for its tive site is peptide mapping. Enzyrne-inhib-
-
radiolabel and ~ r o t e i ncontent. More re- itor complexes, usually prepared from
cently, developments in high-resolution radioactive labeled inhibitor, are treated
mass spectrometry have allowed the under mildly denaturing conditions with
determination of binding stoichiome- an appropriate protease. Subsequently, the
try without the need for radiolabeled peptide fragments obtained are usually re-
inhibitor. solved by high-pressure liquid chromatog-
5. Substrate protection. Ligands of the en- raphy and isolated. Analysis of the labeled
zyme, either substrates or reversible inhib- peptides can be accomplished by Edman
itors, should greatly decrease the rate of degradation and/or mass spectrometry. A
modification by the affinity label. Both af- good description and example of this
finity labels and mechanism-based inhibi- method can be found elsewhere (169). Al-
tors should be active-site directed, thereby ternatively, electrospray ionization mass
competing with the substrate for the same spectrometry has been used as a tool to de-
binding site on the enzyme. This can be termine the accurate mass of the proteins
tested by incubating the enzyme with in- and enzyme-inhibitor complexes. In a
creasing amounts of substrate at constant study by Knight et al. (1701,this method
inhibitor concentrations. As the substrate was successfully used to distinguish be-
concentration is increased, the rate of inac- tween covalent and noncovalent complexes
tivation will become slower because, under because the latter did not survive the ex-
initial velocity conditions, a portion of the perimental conditions.
3.1.2 Criteria for the Study of Mechanism- (171).In these cases, as with affinity labels,
Based Inactivators. In addition to the require- nonspecific covalent modification of resi-
ments described above for an affinity label, a dues other than those located in the active
mechanism-based inhibitor should also dem- site cannot be excluded. A second test for a
onstrate the following: metabolically activated affinity label is to
add an additional aliquot of fresh enzyme
1. Occurrence of a catalytic step. The major to the incubation buffer. The fresh enzyme
difference between the mechanism of inac- should be inactivated at a higher rate than
tivation of mechanism-based inactivators that of the first equivalent of enzyme be-
vs. that of any other type of inhibitor is the cause there is more reactive species present
obligate involvement of a catalytic step, in solution. By contrast, the mechanism-
that is, step 2 in Equation 17.46. Initially, based inhibitor should show no difference
the mechanism-based inhibitor binds re- in rate until the concentration of inhibitor
.
versibly to form the E I complex. The en- is depleted. It should also be noted that the
zyme then starts its normal catalytic cycle, observation of such rate increases necessi-
resulting in the conversion of the inhibitor tates that the reactive species is relatively
into a reactive species (I r). If the reactive stable and is not immediately quenched by
species is electrophilic,it may react with an the incubation buffer.
active-site nucleophile, much like an affin- Additional tests such as the addition of
ity label. If the reactive species is nucleo- nucleophilic scavengers (e.g., thiols such as
philic, it may react with an electrophilic dithiothreitol or P-mercaptoethanol) can
species on the enzyme, probably an oxi- provide further evidence for the presence of
dized cofactor. Finally, a radical species a free, reactive electrophilic species. The
may be generated that has the potential to scavengers should quench all of the free re-
react with an enzyme radical, or generate active species, thereby protecting the en-
one by hydrogen atom abstraction. The ex- zyme from inhibition. Unfortunately, this
periments necessary to provide evidence method cannot exclude the possibility that
for a catalytic step are obviously strongly a nucleophilic thiol may even attack the
dependent on the individual catalytic bound reactive species at the active site of
mechanism involved. The experiments the enzyme (which would also give rise to
may include spectrophotometric detection protection from inactivation). However,
of oxidized or reduced cofactor, observing the use of a bulky thiol, such as reduced
C- H bond cleavage by monitoring the re- glutathione, should limit that possibility.
lease of tritium, or the detection of some An alternative scenario occurs wherein the
component of cleaved inhibitor (such as released reactive species returns and reacts
fluoride ion as in some examples shown faster with an active-site nucleophile than
below). with the added thiol. Clearly this is a com-
2. No release of the activated species before en- plex problem and, consequently, it is advis-
zyme inactivation. For a mechanism-based able to use several different tests to avoid
inactivator to retain its high specificity misleading conclusions.
during inactivation, release of the reactive 3. Partition ratio. The partition ratio is the
species from the active site must not be ratio of product release to enzyme inactiva-
part of the normal mechanism of inactivation and is a measure of the efficiency of the
tion. A time-dependent increase in the rate mechanism-based inhibitor. Formally, it
of inactivation points to the release of an refers to the ratio k d k , (Equation 17.46).
activated species before inactivation. This The most efficient inactivators will have
increase in the rate of inactivation is partition ratio of zero. In those cases, the-
brought about by the accumulation of free oretically, every enzymatically processed
reactive species in solution. Inhibitors gen- inhibitor molecule will result in the inacti-
erated in this manner have been termed vation of a molecule of enzyme. Even
metabolically activated affinity labels though the partition ratio is independent of
equilibrium dialysis of the enzyme with ra-

diolabeled inactivator, followed by deter-
mination of the amount of radiolabeled
metabolites produced per radiolabeled en-
zyme. Perhaps the simplest method is for
cases where the rate of product formation
(i.e., k,, = k , in Equation 17.46) can easily
be measured. In this instance, both k,, and
kin,, are measured directly, with kc,Jkinad
being the partition ratio (166, 167).
A more detailed discussion of the re-
quirements for mechanism-based inhibi-
Figure 17.29. Determination of the partition tion can be found in a recent review by Sil-
ratio.
verman (166).
3.2 Affinity Labels
the initial concentration of inhibitor, it will
depend on factors such as the rate of diffu- AMinity labels are potentially good drugs, al-
sion of the reactive species from the active though the presence of a reactive functional
site, its reactivity, and the proximity of the group can make them somewhat nonselective
target for covalent bond formation. A num- and prone toward reaction with other proteins
ber of different methods have been used to and metabolites. If the affinity label is highly
determine the partition ratio. For example, selective toward its target enzyme and has a
if, under the experimental conditions, the great affinity for the enzyme's active site, this
rate of inactivation is relatively fast com- drawback can be overcome kinetically. Once
pared to the chemical stability of the en- the inhibitor is bound, the unimolecular reac-
zyme or the inhibitor, the partition ratio tion between the inhibitor and an amino acid
can be determined kinetically by titration residue in close proximity is entropically quite
of the enzyme activity. The titration mea- favorable compared to a bimolecular reaction
sures the number of inhibitor molecules re- between two free molecules in solution. This
quired to completely inactivate the en- proximity effect has resulted in rate enhance-
zyme. In an experiment of this type, ments as great as 10' (172) and means that a
increasing amounts of inhibitor are added reagent that is, in itself, only weakly active,
to a known, fixed amount of enzyme, and may be highly reactive when it is reformulated
the reaction is allowed to go to completion. as an affinity label. More in-depth discussion
After gel filtration or dialysis, a plot of the on this topic can be found elsewhere (39, 173,
amount of inhibitor per enzyme active site 174).
and the remaining enzyme activity is The design of a potent affinity label re-
drawn (Fig. 17.29). The intercept with the quires the study of the initial requirements for
x-axis represents the minimum number of the inhibitor to bind to the active site. Next,
equivalents of inhibitor necessary to inac- regions of bulk tolerance are determined that
tivate the enzyme completely (turnover are useful for the introduction of a reactive
number). A turnover number of 6, such as functional group. In some cases, it might be
that shown in Fig. 17.29, indicates that on advantageous to place the reactive group at
average 5 equivalents of inactivator are the end of a spacer arm, particularly if no nu-
converted to product and only every sixth cleophilic amino acid residue is in close prox-
equivalent of inhibitor leads to irreversible imity to the reactive group. However, not only
covalent bond formation (i.e., the partition the location and orientation, but also the size
ratio equals the turnover ratio minus 1). and inherent reactivity of the reactive func-
Unfortunately, there are a number of fac- tional group are critical for its potential as an
tors associated with this method that may affinity label.
lead to misleading results (166). Another Perhaps the archetypical example of an af-
method for determining partition ratios is finity label is TPCK (66) (Fig. 17.30). This
NH
I chymotrypsin
o=s=o -
Figure 17.30. Inhibition of chymotrypsin by TPCK.
compound was designed to mimic substrates provide a point of covalent attachment (175).
of chymotrypsin such as the tosyl-L-phenylal- TPCK was shown to irreversibly inhibit chy-
anine methyl ester (Equation 17.48), thereby motrypsin (it is still employed today to remove
providing a basis of affinity for the chymotryp- chymotrypsin from trypsin preparations) by
sin-active site. specifically labeling a histidine residue (175),
In addition to mimicking a substrate, it later identified as His57 (176). After the suc-
contains the halomethyl ketone moiety, to cess of TPCK, chloromethyl ketones became
methyl N-tosyl-L-phenylalanine
extremely popular for the inactivation of pro- a-phenylglycidate to the enzyme, the electro-
teases. By incorporating part of the sequence philic epoxide group would be subject to attack
of the physiological substrate into the halo- by the nucleophile responsible for a-proton
methyl ketone, it was possible to obtain selec- abstraction in the normal catalytic cycle. Fur-
tive inactivation of individual proteases (177). ther confirmation is provided by the X-ray
This selective inactivation also meant that structure of (S)-atrolactate bound to the race-
chloromethyl ketones became widely used as mase (181), which reveals that Lys166 has
probes for the binding requirements and been pushed away by the a-methyl group of
chemically reactive residues in the active sites (S)-atrolactate (which is positionally equiva-
of serine proteases, in particular. Replace- lent but much larger than the a-proton in (S)-
ment of the chloromethyl ketone moiety by a mandelate). In both structures the positions of
diazomethyl group provided a specific inacti- the remaining active-site residues are almost
vation of cysteine proteases (172). The use of identical.
TPCK has not been restricted to chymotryp-
sin, as elegantly demonstrated in a recent re-
port on the inhibition of human aldehyde de- mandelate
hydrogenase (178). As a group, proteases

remain major targets for therapeutic inter-
vention, and peptide-based affinity labels are
still playing a major role in drug design (179).
The interconversion of (R)-mandelate and
(23)-mandelateis catalyzed by mandelate race-
mase (Equation 17.49). The reaction can be
reversibly inhibited by the substrate analog
atrolactate (671, (Fig. 17.31). Because of its
structural similarity to both (67) and the sub-
strate and, given the reactivity of the epoxide
group to nucleophiles, (R,S)-a-phenylglyci- Perhaps the best-known affinity labeling
date (68) was synthesized as a potential f i n - reagent is aspirin (73) (Fig. 17.31), a member
ity label of mandelate racemase. The com- of the class of drugs known as the nonsteroidal
pound was found to be an irreversible anti-inflammatory drugs (NSAIDS), and
inhibitor, fitting all the criteria described in whose activity was initially reported to result
section 3.1.1 (180). Later it was established from its inhibition of prostaglandin biosynthe-
that (5')-a-phenylglycidate (S-aPGA) did not sis (182, 183). Prostaglandins are involved in
irreversibly inactivate the enzyme, binding the inflammatory response and can cause
noncovalently and with less affinity than headache and vascular pain in humans.
R-aPGA (69). As shown in Figure 17.31, the Prostaglandin synthase, which catalyzes
epoxide ring of R-aPGA potentially is subject the first step in the arachidonic acid cascade, is
to attack at either of two carbons. Attack at a heme protein and possesses two activities.
the distal endocyclic carbon atom of (69) (path As illustrated in Equation 17.50, a cyclooxy-
a) will result in the formation of (70), whereas genase activity is used in the conversion of
attack at the a-carbon (path b) will yield (71). arachidonic acid to the bicyclic endoperoxide
The crystal structure of the inactivated com- PGG,, whereas a peroxidase activity catalyzes
plex revealed that nucleophilic attack of the the subsequent reduction of PGG, to prosta-
€-amino group of Lys166 resulted in adduct glandin H,. The latter serves as a branch point
(72), which is consistent with attack on the in the production of various prostaglandins as
distal carbon of the epoxide ring (181). This well as thromboxane A, and prostacyclin
structure confirmed the original design (PGIJ.
premise of Fee et al. (1801, wherein it was Aspirin (acetylsalicylicacid) was ultimately
thought that the distal oxirane carbon occu- confirmed as an inhibitor of prostaglandin
pied the position similar to the a-proton in synthetase (184). Incubation of [ a ~ e t ~ l - ~ H ]
mandelate. Therefore, on binding of the aspirin showed that one acetyl group was in-
3 Rational Design of Covalently Binding Enzyme Inhibitors 763
by the mechanism shown in Fig. 17.32. The

X-ray structure of bromoaspirin (74) inacti-
vated prostaglandin synthase has been solved,
and the bromoacetylation of Ser530 confirmed
(188). The structure was similar to that of
flurbiprofen (75), complexed with prostaglan-
din synthase (189). Aspirin is the only NSAID
known to inactivate prostaglandin synthase
through covalent modification, and the bromi-
nated aspirin analog was also determined to be
a potent irreversible inhibitor. Conversely,
flurbiprofen (75), another NSAID, has been
classified as a slow-tight-binding inhibitor
Nuc CO; (190) and was expected to induce a conforma-
tional change upon binding. However, there
/
OH HO Ph were no significant differences between the
two X-ray structures, and it is yet to be deter-
mined whether the binding of aspirin also in-
duces a conformational change in the enzyme.
Although affinity labels have played a ma-
jor role in characterizing the active sites of a
large number of proteases, they have also
proved to be particularly useful in mapping
nucleotide-binding sites (161, 163, 191). Nu-
merous compounds, some of which are shown
in Fig. 17.33, have been designed to be analogs
Figure 17.31. (a) Inhibitors of mandelate race-
mase, (b) potential products from nucleophilic at- of the various nucleosides and nucleotides.
tack on the epoxide of R-a-phenylglycidate,and (c) Perhaps the best known of these is 5'- p-fluo-
addud formed by attack of Lys166. rosulfonylbenzoyl adenosine (5'-FSBA) (76),
which was designed to be an analog of ADP or
corporated per mole of enzyme (185,186). En- ATP (77). It has both the adenosine and ribose
zymatic digest of the labeled enzyme provided moieties, as well as a carbonyl group adjacent
evidence that a serine residue, later identified to the 5' position. The latter mimics the first
as Ser530, was acetylated (186,187), probably phosphoryl group of the purine nucleotides. If
arachidonic acid
peroxidase
(17.50)
H
thromboxane A2
prostaglandin Hz
COOH
Figure 17.32. (a) Inactivation of prostaglandin H2 synthase by aspirin, and (b) inhibitors cocrys-
tallized with prostaglandin synthase.
the molecule is arranged in an extended con- 3.3 Mechanism-Based lnhibitors

formation, the reactive sulfonyl fluoride group
would be found in a position analogous to that Mechanism-based inactivators have great po-
occupied by the y-phosphoryl group of ATP. tential as drugs because they are designed to
This was initially used to explore the regula- be specific toward their target enzyme. Fur-
tory site of glutamate dehydrogenase (192) thermore, because these compounds are unrer
and the active site of pyruvate kinase (193). It active until activated within their target en-
has now been employed to label the NAD and zyme, they are expected to show little or no
ATP sites of more than 50 proteins (163). cellular toxicity. The design of mechanism-
Modifications to 5'-FSBA have provided the based inhibitors requires an understanding of
fluorescent probe (78) as well as the bifunc- the binding specificity requirements for the
tional affinity label (79), which has a photoac- ligand-recognition site of the enzyme, to pro-
tivatable azido group as well as the electro- mote the formation of the initial noncovalent
philic fluorosulfonyl moiety (163). The
bromodioxybutyl compound (80)contains the
.
enzyme-inhibitor complex E I (Equation
17.46). In addition, the choice of an appropri-
adenine, ribose, and 5'-monophosphate of
ate latent functional group requires knowl-
adenosine monophosphate (AMP). It is also
edge of the catalytic mechanism of the target
water soluble and negatively charged at neu-
tral pH. As described above, a bromomethyl enzyme with its normal substrate. Finally, co-
ketone group will react with a number of valent bond formation by the activated inhib-
nucleophiles, whereas the dioxo group can po- itor (1') will strongly depend on its inherent
tentially react with arginine residues. This re- chemical reactivity, and its proximity to a sus-
agent has a structural similarity to adenylo- ceptible amino acid residue or cofactor. A
succinate (81) and was used to identify a number of excellent reviews and monographs
critical arginine residue in the active site of have appeared on the general design of mech-
adenylosuccinate lyase, an enzyme whose de- anism-based inhibitors (166, 167, 194-201).
ficiency in humans leads to severe mental re- The following examples have been chosen to
tardation and autism (163). emphasize both the potential for the use of
(79)
0 0
0
11 SCH2C- CCH2Br COO-
-0-P-0-CH2 I
I NH-CH-CH2-COO-
0-
HO OH
0
(80)
I1
-0-P-0-CH2
I
0-
HO OH
(81
Figure 17.33. ATP (77),adenylosuccinate (81),and representative affinity label analogs.
mechanism-based inhibitors as drugs and phosphate (PLP)-dependent enzymes have-

the diversity of their mechanisms of inacti- been found to be most susceptible to mecha-
vation. nism-based inhibitors (202). To some extent
Of all the classes of enzvmes. the ~vridoxal this is because the mechanism of catalvsis bv
the PLP-dependent enzymes is extremely well agents, by providing an increase in the concen-
characterized, making the design process tration of GABA in the brain.
somewhat easier. The initial steps in the GABA aminotransferase (GABA transami-
mechanism for a PLP-dependent enzyme are nase, GABA-T) catalyzes the conversion of
shown in Equation 17.51. y-aminobutyric acid to succinic semialdehyde
with the subsequent transfer of an amino
group to pyruvate (Equation 17.52).
GABA-T
+ -coo
H3N -
GABA
/
pyruvate L-alanine
succinic
semialdehyde
GABA-T is a PLP-dependent enzyme and a

wide variety of mechanism-based inhibitors
have been described for this enzyme (202,
203). These include inhibitors bearing an un-
saturated moiety, a leaving group, as well as
those forming a stable complex with the cofac-
tor (202, 203). Vigabatrin (8), currently used
as an antiepileptic drug, provides an excellent
example of this approach. The proposed mech-
anism is shown in Fig. 17.34.
As with the normal mechanism of the en-
zyme, the inactivation starts with Schiff base
formation with the enzyme-bound pyridoxal
phosphate, followed by removal of an a-proton
The first step involves Schiff base forma- by an active-site base to form the reactive elec-
tion by the amino group of the substrate read- trophilic intermediate (82). This then parti-
ing with pyridoxal phosphate to form an aldi- tions between hydrolysis of the Schiff base
mine. This is followed by loss of a functional linkage, resulting in the keto product (83)and
group (R,, Equation 17.51), usually by ab- enzyme reactivation, and Michael-type addi-
straction by an active-site base, to form a res- tion of an enzyme active-site nucleophile, re-
onance-stabilized carbocation. sulting in a stable covalently bonded enzyme
y-Aminobutyric acid (GABA) is one of the adduct (84).
major inhibitory neurotransmitters in the Ornithine decarboxylase (ODC), another
mammalian central nervous system. A de- PLP-dependent enzyme, catalyzes the rate-
crease in the concentration of GABA had been limiting step in the biosynthesis of poly-
shown to lead to convulsions. Therefore it was mines, i.e., the conversion of ornithine to pu-
suggested that inhibitors of GABA transami- trescine (Equation 17.53).
nase, the enzyme responsible for the break- The enzyme is a target for drugs against
down of GABA, may act as antiepileptic African sleeping sickness caused by Typano-
COO- COO-
PMP + - -
Figure 17.34. Inactivation of GABA transaminase by vigabatrin (PMP = pyridoxamine phosphate).

For clarity, substituents on the pyridine ring are omitted.
?COO
NH2
NH~
- ornithine
decarboxylase
+
-
5a-reductase
NADPH
testosterone
(17.54)
putrescine
soma brucei. One of the currently used drugs,

eflomithine (a-difluoromethylomithine, DFMO)
(85), is a mechanism-based inhibitor of ODC.
The inactivation of ODC by DFMO involves dihydrotestosterone
the decarboxylation of DFMO by the enzyme,
with subsequent stoichiometric binding of a Dihydrotestosterone, rather than testos-
reactive species to the enzyme (204). The pro- terone, had been implicated in endocrine dis-
posed mechanism for inhibition of T. brucei orders such as acne, enlargement of the pros-
ODC is outlined in Fig. 17.35. tate, and male pattern baldness, and it was
DFMO initially forms a Schiff base with suggested that 5a-reductase was an attractive
PLP (86), then, following decarboxylation, a therapeutic target. Initially, finasteride (91)
fluoride ion is eliminated, thereby generating (Fig. 17.36) was developed as a potent revers-
the electrophilicconjugated imine (87).Attack ible inhibitor of 5a-reductase with a Ki in the
by the nucleophilic thiol group of Cys360 and low nanomolar range (206). Closer examina-
subsequent elimination of a second fluoride tion revealed that finasteride appeared to be a
anion yields a second conjugated imine (88). slow-binding, high-affinity inhibitor of the hu-
The second imine then undergoes a transaldi- man prostate (type 2) 5a-reductase, with a Ki
mination reaction with the amino group of of less than 1 nM (207). Finasteride is cur-
Lys69. The enamine formed in this reaction rently the drug of choice in the treatment of
(89) may then undergo an internal cyclization benign prostatic hyperplasia, and it is now
to yield a cyclic imine (90). This is the main thought that finasteride is, in fact, a mecha-
product formed by the alkylation of ODC by nism-based inhibitor (208, 209), which acts
DFMO (204). Recently, the X-ray structure of through an enzyme-bound NADP-dihydrofin-
the DFMO-inactivated T. brucei ODC K69A asteride adduct (Fig. 17.36).
mutant was solved (205). This structure was In this mechanism, reduction of finasteride
of the second conjugated imine (88)complexed (91) leads to the formation of an enolate (92)
with ODC and showed that, within a single and the subsequent formation of an adduct
active site, the decarboxylated DFMO bridges with NADP' (93) (where PADPR = phos-
two subunits, forming a Schiff base with the phoadenonsine diphosphoribose). The dissoci-
PLP on one monomer and a covalent bond to ation constant for the enzyme-inhibitor com-
Cys360 on the second monomer. plex is less than 10-l3 M, and the partition
The enzyme steroid 5a-reductase is an ratio for the enzyme-catalyzed formation of
NADPH-dependent enzyme that catalyzes the dihydrofinasteride (94) is less than 1.07 (208).
conversion of testosterone to dihydrotestos- Clearly, finasteride is an extremely efficient
terone, a more potent androgen (Equation mechanism-based inhibitor. As shown in Fig.
17.54). 17.36, the NADPt-finasteride adduct will
*
T 0
Lys69
I '1
C-o
P-
H ~ N ~ C H F - i J
COO-
+ ykF2
NH2 - -
fNH
H2N yJ3 CHF
Figure 17.35. Inactivation of ornithine decarboxylase by eflornithine. For clarity, substituents on

the pyridine ring are omitted.
non-enzymatic
slow
0
H H
(94) (93)
Figure 17.36. Inhibition of steroid 5a-redudase by finasteride (PADPR = phosphoadenosine

diphosphoribose).
eventually dissociate and form dihydrofinas- tance to vancomycin (211, 212). As a conse-
teride (94), although the half-life of 14 days quence, VanX has become a prime drug target
also points to the effectiveness of finasteride for overcoming vancomycin resistance and a
as a steroid 5a-reductase inhibitor. number of transition-state analogs have been
Enzymes involved in steroid biosynthesis prepared (213,214).
have proved to be good targets, both for ther- The enzyme was also shown to process
apeutic intervention and for mechanism- dipeptides with bulky C-terminal amino
based inactivators (2). Aromatase, for exam- groups (213) and, using this knowledge, a
ple, catalyzes the final, rate-limiting step in novel mechanism-based inhibitor was re-
estrogen biosynthesis (Equation 17.55). Aro- cently developed (215). Its mechanism is
matase has proved susceptible to mechanism-
shown in Fig. 17.37.
based inhibitors such as formestane and ex-
(95) is a dipeptide-like analog of D-Ala-D-Ala
emestane. These are now both used in the
and is readily accepted by VanX. Cleavage of
treatment of breast cancer (210).
In the last decade there has been a consid- the peptide bond and elimination of D-alanine
erable increase in the occurrence of antibiotic- results in the formation of the metastable 2-p-
resistant microbial pathogens. Vancomycin, difluoromethylthioglycine (961, which spon-
one of the last resort antibiotics for treating staneously decomposes, yielding ammonia,
some gram-positive bacterial infections, inhib- glyoxylic acid, and p-difluoromethyl thiophe-
its peptidoglycan synthesis by binding the no1 (97). Elimination of a fluoride ion results
terminal D-alanyl-D-alanine(D-Ala-D-Ala) in the electrophilic 4-thioquinone fluoro-
dipeptide from pentapeptide precursors of methide (98), which irreversibly alkylates the
Enterococcus cell walls. VanX is a zinc-depen- enzyme (99). Interestingly, the turnover of
dent D-Ala-D-Ala dipeptidase (Equation 17.56), the analog was faster than that of D-Ala-D-Ala
which has been implicated in high-level resis- itself. However, the partition ratio of 7500 in-
aromatase
___)
testosterone estradiol
OH
formestane exemestane
the covalent bond formed between the enzyme

and the inhibitor is reversible (Equation
17.57). As with the affinity labels, initially
they bind to the enzyme's active site in a non-
covalent fashion to form an enzyme-inhibitor
complex E . I but, unlike an affinity label, the
pseudoirreversible inhibitor generally pos-
sesses unreactive functional groups. As with
the mechanism-based inhibitor, the enzyme
then starts the catalytic cycle and an active-
site residue, usually one involved in covalent
catalysis (60), reacts with the inhibitor, with-
out producing a highly reactive species, and
dicated that one of the reactive intermediates forms a covalent bond.
must be relatively long-lived (215).
More detailed examples of approaches used
to design mechanism-based inhibitors may be
found in excellent reviews by Silverman (166,
167) and by Ator and Ortiz de Montellano
(216).
The covalently bound inhibitor mimics the
normal covalent reaction intermediate occur-
3.4 Pseudoirreversible lnhibitors
ring during the normal reaction mechanism.
Pseudoirreversible inhibitors are the least However, the covalent adduct is far more sta-
common of the covalently binding enzyme in- ble, with half-lives on the order of several
hibitors. They have some features in common hours to days. The free enzyme may then, de-
with both affinity labels (Section 3.2) and pending on the lability of the E-I' bond, be
mechanism-based inhibitors (Section 3.3) but regenerated by hydrolysis or reversal of the
they have one distinguishing feature; that is, covalent bond. The utility of a pseudoirrevers-
H
J2!
3 COO-
Figure 17.37. Mechanism-based inhibition of VanX.
ible inhibitor will be determined by a combi- active enzyme is regenerated. In the first class,
nation of the rate of formation of the covalent exemplified by inhibitors of acetylcholinester-
enzyme inhibitor adduct and the half-life for ase, the enzyme is regenerated as the covalent
reactivation. E-I' bond is hydrolyzed (i.e., k, % k-,). As
As may be expected, criteria for the study of shown in Equation 17.58 , acetylcholinester-
pseudoirreversible inhibitors are very similar ase catalyzes the hydrolysis of acetylcholine,
to those for both affinity labels and mecha- yielding choline and acetate.
nism-based inhibitors. However, because of Acetylcholine is a neurotransmitter that
the inherent reversibility of pseudoirrevers- relays nerve impulses across the neuromuscu-
ible inhibitors, it may be more difficult to lar junction. Acetylcholinesterase (AcChE)
obtain structural evidence for the covalent en- rapidly breaks down acetylcholine, thereby
zyme inhibitor adduct. Further, determina- lowering its concentration in the synaptic cleft
tion of the rate of reactivation and character- and ensuring that nerve impulses are of a fi-
ization of the products of the recovery process nite length. As shown in Fig. 17.38, a nucleo-
will also be of major importance in designating philic serine residue reacts with the substrate
an inhibitor as pseudoirreversible. to form an acetyl-serine intermediate (100)
Pseudoirreversible inhibitors can be bro- with concomitant release of choline. This in-
ken into two classes, depending on how the termediate is then rapidly hydrolyzed by wa-
0 CH3 0 CH2
II +/ acetylcholinesterase II + /
H3C-C-0-CH2CH2N-CH3 H3C-C-0- + HO-CH2CH2N-CH3
\ H2O \ (17.58)
CH3 CH3
acetylcholine acetate choline
AcChE
S
1 AcChE
I
0
II II
I
Ser
I
Ser
I
0
I Figure 17.38. (a) Mecha-
nism of reaction, (b) irrevers-
ible inhibitors, and (c,d)
pseudoirreversible inhibitors
(109) of acetylcholinesterase.
ter, producing acetate and regenerated en- 2.5.3). The aMinity of the inhibitor for AcChE
zyme. Agents such as parathion (101) and could be decreased (with a concomitant in-
sarin (102) have found utility as insecticides crease in the value of k,,), by sequentially re-
and nerve gases, respectively, because they reducing the number of fluorine atoms into the
act with the enzyme to form the active-site methyl group adjacent to the ketone (220). Fi-
serine-phosphate esters, (103) and (104). nally, it should be noted that the two classes of
These esters are hydrolyzed extremely slowly pseudoirreversible inhibitor can be differenti-
by water, making the inhibition effectively ir- ated by examining the decomposition products
reversible (i.e., both k-, and k, are very of the inhibition reaction. When hydrolysis is
small), although the inhibition can be over- required for enzyme regeneration, cleavage
come with high concentrations of strong nu- products, such as substituted carbamates, will
cleophiles such as hydroxylamine. be in evidence. Conversely, the trifluoro-
More recently, it has been established that methyl ketones will not be broken down by
inhibitors of acetylcholinesterase may play a AcChE and no decomposition products will be
role in the memory enhancement in patients observed.
with Alzheimer's disease (217). Unlike (101)
and (102), carbamate inhibitors such as phy-
sostigmine (105) and rivastigmine (106) are 4 CONCLUSIONS
classified as pseudoirreversible inhibitors be-
cause they react with AcChE to form a car- Enzyme inhibitors have long played an impor-
bamylated serine (107). By comparison with tant role in medicine, pharmacology, and basic
the serine-phosphate ester, the carbamylated research. The advances in DNA technology
serine is rapidly hydrolyzed, thereby regener- have enabled cloning and overexpression of
ating AcChE. For example, reactivation of the large numbers of enzymes, and the ap-
physostigmine-inactivated enzyme is rapid, proaches described in this chapter have al-
with a t,,, of less than 40 min (218). Rivastig- ready led to the development of novel thera-
mine, a more useful therapeutic agent, is con- peutic agents. However, in the postgenomics
siderably longer acting, with a half-life of more era, large numbers of new targets have been
than 10 h (217, 219). Overall, for pseudoirre- identified. Although the drug discovery pro-
versible inhibitors of this type, the effective- cess moves toward structure-based drug de-
ness and duration of the "irreversible" inhibi- sign as its prime tool, even with high-through-
tion will be controlled by the chemical nature put crystallography, not all target proteins
of the groups transferred to the active-site nu- will be readily accessible. The evolution of al-
cleophile, making it readily amenable to ma- gorithms that can predict enzyme function
nipulation. and mechanism will ensure that the rational
In pseudoirreversible inhibitors of the sec- design of enzyme inhibitors not only comple-
ond class, the enzyme is regenerated by the ments structure-based approaches but contin-
inhibitor simply dissociating from the en- ues to play a stand-alone role in the discovery
zyme; that is, the binding is covalent but re- of novel therapeutics.
versible (k-, * k,). This class can also be ex-
emplified by an AcChE inhibitor. For example,
the trifluoromethyl ketone (108) binds to REFERENCES
AcChE as a slow-binding inhibitor (Section 1. M. Sandler and H. J. Smith, Eds., Design of
2.4.1) with a Ki value of 0.06 nM,and a koff Enzyme Inhibitors as Drugs, Oxford Univer-
value of 6.7 X s-I (220). A linear corre- sity Press, Oxford, 1989.
lation was observed between Ki values of a se- 2. M. Sandler and H. J. Smith, Eds., Design of
ries of fluoromethyl ketones and the V,,IKi Enzyme Inhibitors as Drugs, Vol. 2, Oxford
value for the corresponding substrate (220). University Press, Oxford, 1994.
This suggests (127) that the tetrahedral ad- 3. P. Krogsgaard-Larsen, T. Liljefors, and U.
duct (log), in effect, mimics the transition Madsen, Eds., A Textbook of Drug Design and
state (or a high-energy intermediate), thereby Deuelopment, 2nd ed., Harwood Academic,
accounting for the high affinity (Section Amsterdam, 1996.
References
4. H. J. Smith, Ed., Smith and Williams' Intro- 25. W . L. Washtien, Mol. Pharmacol., 25,171-177
duction to the Principles of Drug Design and (1984).
Action, Harwood Academic, Amsterdam, 1998. 26. D. R. Seeger, J. Am. Chem. Soc., 71, 1753
5. G. Gregoriadis, Trends Biotechnol., 13, 527- (1949).
537 (1995). 27. D. A. Matthews, R. A. Alden, J. T.Bolin, S . T .
6. I. A. Bakker-Woudenberg, G. Storm, and M . C. Freer, R. Hamlin, N. Xuong, J. Kraut, M. Poe,
Woodle, J. Drug Target., 2,363371 (1994). M. Williams, and K. Hoogsteen, Science, 197,
7. J. Kreuter, Adv. Drug Del. Rev., 47, 65-81 452-455 (1977).
(2001). 28. B. Lippert, B. W . Metcalf, M. J. Jung, and P.
Casara, Eur. J. Biochem., 74,441-445 (1977).
8. F. C. Neuhaus and J. L. Lynch, Biochemistry,
3,471-480 (1964). 29. M. Cziraky, Pharmacoeconomics, 14 (Suppl.
3), 29-38 (1998).
9. R. R. Rando, Biochem. Pharmacol., 24, 1153-
1160 (1975). 30. A. Endo, M. Kuroda, and K. Tanzawa, FEBS
Lett., 72,323-326 (1976).
10. J. N. Delgado and W. A. Remers, Eds., Wilson
and Gisvold's Textbook of Organic Medicinal 31. V . W . Rodbell, Adv. Lipid Res., 14, 1 (1976).
and Pharmaceutical Chemistry, 10th ed., Lip- 32. A. W. Alberts, J. Chen, G. Kuron,V . Hunt, J.
pincott-Raven, Philadelphia, 1998. Huff, C. Hoffman, J. Rothrock, M. Lopez, H.
11. N. Rastogi and H. L. David, Res. Microbiol., Joshua, E. Harris, A. Patchett, R. Monaghan,
144,133-143 (1993). S. Currie, E. Stapley, G. Albers-Schonberg, 0.
Hensens, J. Hirshfield, K. Hoogsteen, J.
12. A. G. Gilman, T . W . Rall, A. S. Nies, and P. Liesch, and J. Springer, Proc. Natl. Acad. Sci.
Taylor, Eds., Goodman and Gilman's the USA, 77,3957-3961 (1980).
Pharmacological Basis of Therapeutics, 8th
33. A. P. Lea and D. McTavish, Drugs, 53, 828-
ed., Pergamon, New York, 1990, p. 985.
847 (1997).
13. H. J. Schaeffer,L. Beauchamp, P. de Miranda,
34. H. S. Yee and N. T . Fong, Ann. Pharmacother.,
G. B. Elion, D. J. Bauer, and P. Collins, Nature,
32, 1030-1043 (1998).
272,583-585 (1978).
35. S. Y u , K. Sugahara, K. Nakayama, S. Awata,
14. P. A. Furman, M . H . S t Clair, and T . Spector,
and H . Kodama, Metabolism, 49, 1025-1029
J. Biol. Chem., 259,9575-9579 (1984).
(2000).
15. G. B. Elion, P. A. Furman, J. A. Fyfe, P. de 36. A. Hestnes, 0.Borud, H. Lunde, and L. Gjess-
Mianda, L. Beauchamp, and H. J . Schaeffer, ing, J. Ment. Defic. Res., 33,261-265 (1989).
Proc. Natl. h a d . Sci. USA, 74, 5716-5720
(1977). 37. G. R. Stark and P. A. Bartlett, Pharmacol.
Ther., 23,45-78 (1983).
16. J. P. Durkin and T . Viswanatha, J. Antibiot.
(Tokyo), 31,1162-1169 (1978). 38. I. H. Segel, Enzyme Kinetics: Behavior and
Analysis of Rapid Equilibrium and Seady-
17. S. J. Cartwright and A. F. Caulson, Nature, State Enzyme Systems, Wiley-Interscience,
278,360-361 (1979). New York, 1975.
18. R. Labia, V. Lelievre, and J. Peduzzi, Biochim. 39. E. Shaw in P. D. Boyer, Ed., Chemical Modifi-
Biophys. Acta, 811,351357 (1980). cation by Active-Site Directed Reagents, Aca-
19. C. Reading and T . Farmer, Biochem. J., 199, demic Press, New York, 1970, pp. 91-147.
779-787 (1981). 40. R. L. Lundblad, Chemical Reagents for Protein
20. R. L. Charnas and J. R. Knowles, Biochemistry, Modification, 2nd ed., CRC Press, Boca Raton,
20,3214-3219 (1981). FL, 1991.
21. A. R. English, J . A. Retsema, A. E. Girard, J . E. 41. H. F. Hixson, Jr. and A. H. Nishikawa, Arch.
Lynch, and W . E. Barth, Antimicrob. Agents Biochem. Biophys., 154,501-509 (1973).
Chemother., 14,414-419 (1978). 42. K. I. Skorey, N. A. Johnson, G. Huyer, and
22. K. P. Fu and H . C. Neu, Antimicrob. Agents M. J .Gresser, Protein Express. Purifi, 15,178-
Chemother., 15,171-176 (1979). 187 (1999).
23. P. S. Mezes, A. J. Clarke, G. I. Dmitrienko, and 43. F . A. Norris and P. W . Majerus, J . Biol. Chem.,
T . Viswanatha, FEBS Lett., 143, 265-267 269,8716-8720 (1994).
(1982). 44. M. Knockaert, N. Gray, E. Damiens, Y . T .
24. D. G. Brenner and J. R. Knowles, Biochemis- Chang, P. Grellier, K. Grant, D. Fergusson, J .
try, 23,5833-5839 (1984). Mottram, M. Soete, J. F. Dubremetz, K. Le
Roch, C. Doerig, P. Schultz, and L. Meijer, 61. D. A. Dougherty, Science, 271,163-168 (1996).
Chem. Biol., 7,411-422 (2000). 62. N. S. Scrutton and A. R. Raine, Biochem. J.,
45. J. S. Fowler, R. R. MacGregor, A. P.Wolf,C. D. 319, 1-8 (1996).
Arnett, S. L. Dewey, D. Schlyer, D. Christman, 63. S. K. Burley and G. A. Petsko, Adv. Protein
J. Logan, M. Smith, H. Sachs, et al., Science, Chem., 39,125-189 (1988).
235,481-485 (1987).
64. J. P. Gallivan and D. A. Dougherty, Proc. Natl.
46. D. V . Santi and G. L. Kenyon in M. E. W o l f f , Acad. Sci. USA, 96,9459-9464 (1999).
Ed., Burger's Medicinal Chemistry: Ap-
65. J. L. Sussman, M. Harel, F. Frolow, C. Oefner,
proaches to the Rational Design of ~ n z ~ Zn-
m e
A. Goldman, L. Toker, and I. Silman, Science,
hibitors, Wiley-Interscience, New York, 1980,
253,872-879 (1991).
pp. 349-391.
66. A. Ordentlich, D. Barak, C. Kronman, N. Ariel,
47. A. Muscate, C. L. Levinson, and G. L. Kenyon
Y . Segall, B. Velan, and A. Shafferman,J. Biol.
in M. Howe-Grant, Ed., Kirk-Othmer Encyclo-
Chem., 270,2082-2091 (1995).
pedia of Chemical Technology, 4th. ed, John
Wiley & Sons, New York, 1994, pp. 644-671. 67. A. Cornish-Bowden and C. W . Wharton, En-
zyme Kinetics, IRL Press, Oxford, 1988.
48. A. Patel, H. J. Smith, and J. Stiirzebecher, in
ref. 4, pp. 261-330. 68. W .W . Cleland in D. S. Sigman, and P. D. Boyer,
Eds., T h e Enzymes, 3rd ed., Vol. XIX, Aca-
49. P. Veerapandian, Ed., Structure-Based Drug
demic Press, San Diego, 1990, pp. 99-158.
Design, Marcel Dekker, New York, 1997.
69. T . Palmer, Understanding Enzymes, Prentice
50. J . E. Ladbury and P. R. Connelly, Eds., Struc-
HallEllis Horwood, London, 1995.
ture-Based Drug Design: Thermodynamics,
Modeling and Strategy, Springer-Verlag, Ber- 70. D. L. Purich, Ed., Contemporary Enzyme Ki-
lin, 1997. netics and Mechanism, 2nd ed., Academic
Press, San Diego, 1996.
51. E. M. Gordon and J. F . Kerwin, Jr., Eds., Com-
binatorial Chemistry and Molecular Diversity 71. R. A. Copeland, Enzymes: A Practical Intro-
in Drug Discovery, Wiley-Liss, New York, duction to Structure, Mechanism and Data
1998. Analysis, 2nd ed., Wiley-VCH, New York,
52. K. Gubernator and H.J. Bohm, Eds., Struc- 2000.
ture-Based Ligand Design, Wiley-VCH, Wein- 72. R. Eisenthal and A. Cornish-Bowden, Bio-
heim, Germany, 1998. chem. J., 139, 715-720 (1974).
53. J. P. Dirlam, L. J. Czuba, B. W . Dominy, R. B. 73. C. L. Tsou, Adv. Enzymol. Relat. Areas Mol.
James, R. M. Pezzullo, J. E. Presslitz, and Biol., 61, 381-436 (1988).
W .W .Windisch, J. Med. Chem., 22,1118-1121 74. M. Dixon, Biochem. J., 55, 170-171 (1953).
(1979).
75. R. G. Pendleton and I. B. Snow, Mol. Pharma-
54. J. C. Dearden and K. C. James i n T . J. Perun col., 9, 718-725 (1973).
and C. L. Propst Eds., Computer-Aided Drug
Design: Methods and Applications, Marcel 76. H. J. Fromm, i n ref. 70, pp. 207-227.
Dekker, New York, 1989, pp. 168-207. 77. J. F. Morrison, Trends Biochem. Sci., 7, 102-
55. T. Hogberg a n d U.Norinder, in ref. 3, pp. 95- 105 (1982).
129. 78. J. F. Morrison and C. T . Walsh, Adv. Enzymol.
56. Y . C. Martin, P. Willett, and S. R. Heller, Eds., Relat. Areas Mol. Biol., 61,201-301 (1988).
Designing Bioactive Molecules: Three Dimen- 79. S. E. Szedlacsek and R. G. Duggleby in D. L.
sional Techniques and Applications, American Purich, Ed., Methods in Enzymology,Vol. 249,
Chemical Society, Washington, DC, 1998. Academic Press, New York, 1995, pp. 144-180.
57. W . P. Jencks, Adv. Enzymol. Relat. Areas Mol. 80. M. J. Sculley, J. F. Morrison, and W . W . Cle-
Biol., 43, 219-410 (1975). land, Biochim. Biophys. Acta, 1298, 78-86
58. W . P. Jencks, Proc. Natl. Acad. Sci. USA, 78, (1996).
4046-4050 (1981). 81. D. H. Rich, J. Med. Chem.28,263-273 (1985).
59. J. Kyte, Structure in Protein Chemistry, Gar- 82. J. D. Cox, N. N . Kim, A. M. Traish, and D. W .
land Publishing, New York, 1995, pp. 147-196. Christianson, Nut. Struct. Biol., 6, 1043-1047
60. A. R. Fersht, Structure and Mechanism in Pro- (1999).
tein Science: A Guide to Enzyme Catalysis and 83. D. M. Colleluori and D. E. Ash, Biochemistry,
Protein Folding, Freeman, New York, 1999. 40,9356-9362 (2001).
References
84. S. H.Wilkes and J . M. Prescott, J. Biol. Chem., 107. R. F. Morton, E. T . Creagan, S. A. Cullinan,
260,13154-13162(1985). J. A. Mailliard, L. Ebbert, M. H. Veeder, and
85. A. Taylor, C. Z . Peltier, F. J. Torre, and N. M. Chang, J. Clin. Oncol., 5, 1078-1082
Hakarnian, Biochemistry, 32,784-790(1993). (1987).
86. H. Kim and W . N. Lipscomb, Biochemistry, 32, 108. N. M . Laing, W . W . Chan, D. W . Hutchinson,
8465-8478(1993). and B. Oberg, FEBS Lett., 260, 206-208
87. A. Betz, P. W .Wong, and U. Sinha, Biochemis- (1990).
try, 38,14582-14591(1999). 109. L. Jin, B. Stec, W . N. Lipscomb, and E. R.
88. E. P. Garvey, J. A. Oplinger, E. S. Furfine, R. J. Kantrowitz, Proteins., 37,729-742(1999).
Kiff, F. Laszlo, B. J . Whittle, and R. G. 110. A.Corsini, F.M. Maggi, and A. L. Catapano,
Knowles, J. Biol. Chem., 272, 4959-4963 Pharmacol. Res., 31,9-27(1995).
(1997). 111. K. M. Bischoff and V. W . Rodwell, Biochem.
89. E. S. Furfme, M . F. Harmon, J. E. Paith, and Med. Metab. Biol., 48, 149-158(1992).
E. P. Garvey, Biochemistry, 32, 8512-8517 112. C. E. Nakarnura and R. H. Abeles, Biochemis-
(1993). try, 24,1364-1376(1985).
90. B. F. Cooper and F. B. Rudolph, in ref. 70, pp. 113. R. H. Abeles, Drug. Dev. Res., 10, 221-234
183-205. (1987).
91. B. Oberg, Pharmacol. Ther., 40, 213-285 114. E. S. Istvan, M. Palnitkar, S. K. Buchanan, and
(1989). J. Deisenhofer,EMBO J., 19,819-830(2000).
92. L. R. Overby, E. E. Robishaw, J . B. Schleicher, 115. E. S. Istvan and J. Deisenhofer, Biochim. Bio-
A. Reuter, N. L. Shipkowitz, and J. C.-H. Mao, phys. Acta, 1529,9-18(2000).
Antimicrob. Agents Chemother., 6, 360-365
(1974). 116. E. S. Istvan and J . Deisenhofer, Science, 292,
1160-1164(2001).
93. S. S. Leinbach, J . M . Reno, L. F. Lee, A. F.
Isbell, and J . A. Boezi, Biochemistry, 15, 426- 117. M . A. Ondetti and D. W . Cushman in R. L.
430(1976). Soffer, Ed., Biochemical Regulation of Blood
Pressure,Wiley, New York, 1981, pp. 165-186.
94. B. Eriksson, A. Larsson, E. Helgstrand, N. G.
Johansson, and B. Oberg, Biochim. Biophys. 118. L. D. Byers and R. Wolfenden, Biochemistry,
Acta, 607,53-64(1980). 12,2070-2078(1973).
95. F. Kappler and A. Hampton, J. Med. Chem., 119. D. W . Cushman, H. S. Cheung, E. F. Sabo, and
33,2545-2551(1990). M . A. Ondetti, Biochemistry, 16, 5484-5491
(1977).
96. R. Wolfenden, Annu. Rev. Biophys. Bioeng., 5,
271306(1976). 120. H . G. Bull, N. A. Thornberry, M . H . Cordes,
A. A. Patchett, and E. H. Cordes, J. Biol.
97. A. D. Broom, Fed. Proc., 45,2779-2783(1986). Chem., 260,2952-2962(1985).
98. A. D. Broom, J. Med. Chem., 32,2-7(1989). 121. R. B. Silverman, The Organic Chemistry of
99. G.E.Lienhard, Science, 180,149-154(1973). Drug Design and Drug Action, Academic Press
100. A. Radzicka and R. Wolfenden, in ref. 70, pp. Inc., San Diego, 1992, pp. 162-175.
229-257. 122. P. Buenning, J. Cardiovascular. Res., 10
101. M . Mader and P. A. Bartlett, Chem. Rev., 97, (Suppl. 7): S31S35,1987.
1281-1301(1997). 123. A. Radzicka and R. Wolfenden, Science, 267,
102. J. Inglese, R. A. Blatchly, and S. J. Benkovic, 90-93 (1995).
J. Med. Chem., 32,937-940(1989). 124. R. Wolfenden, Transition States of Biochemi-
103. C. Klein, P. Chen, J. H. Arevalo, E. A. Stura, A. cal Processes, Plenum, New York, 1978.
Marolewski, M . S. Warren, S. J. Benkovic, and 125. V. L. Schrarnm,Annu. Rev. Biochem., 67,693-
I. A.Wilson, J. Mol. Biol., 249,153-175(1995). 720(1998).
104. E. C. Bigharn, W . R. Mallory, S. J . Hodson, 126. P. A. Bartlett, Y . Nakagawa, C. Johnson, S.
D. S. Duch, R. Ferone, and G. K. Smith, Het- Reich, and A. Luis, J. Org. Chem., 53, 3195-
erocycles, 35,1289-1307(1993). 3210(1988).
105. K.D. Collins and G. R. Stark, J. Biol. Chem., 127. P. A. Bartlett and C. K. Marlowe, Biochemis-
246,6599-6605(1971). try, 22,4618-4624(1983).
106. E. A. Swyryd, S. S. Seaver, and G. R. Stark, 128. J. W . Williams, J. F. Morrison, and R. G.
J. Biol. Chem., 249,6945-6950(1974). Duggleby, Biochemistry, 18,2567-2573(1979).
D. A. Matthews, R. A. Alden, J. T . Bolin, D. J. 151. D. Hilvert, S. H. Carpenter, K. D. Nared, and

Filman, S. T . Freer, R. Hamlin,W . G. Hol, R. L. M. T . Auditor, Proc. Natl. Acad. Sci. USA, 85,
Kisliuk, E. J. Pastore, L. T . Plante, N. Xuong, 4953-4955 (1988).
and J. Kraut, J. Biol. Chem., 253,6946-6954 152. Y . M. Chook, H. Ke, and W . N. Lipscomb, Proc.
(1978). Natl. Acad. Sci. USA, 90,8600-8603 (1993).
P. A. Charlton, D. W . Young, B. Birdsall, J. 153. A. Y . Lee, P. A. Karplus, B. Ganem, and J.
Feeney, and G. C. K. Roberts, J. Chem. Soc. Clardy, J. Am. Chem. Soc., 117, 3627-3628
Chem. Commun., 922-924 (1979). (1995).
K. Schray and J. P. Klinman, Biochem. Bio- 154. M. R. Haynes, E. A. Stura, D. Hilvert, and I. A.
phys. Res. Commun., 57,641-648 (1974). Wilson, Science, 263,646-652 (1994).
L. Frick, R. Wolfenden, E. Smal, and D. C. 155. 0. Wiest and K. N. Houk, J. Org. Chem., 59,
Baker, Biochemistry, 25,1616-1621 (1986). 7582-7584 (1994).
L. Frick, J. P. MacNeela, and R. Wolfenden, 156. J. Kyte, Mechanism in Protein Chemistry, Gar-
Biorg. Chem., 15,100-108 (1987). land Publishing, New York, 1995, pp. 314-344.
P. A. Bartlett and A. Otake, J. Org. Chem., 60, 157. J. Eyzaguirre ed., Chemical Modification of
3107-3111 (1995). Enzymes: Active Site Studies, Ellis Horwood
Ltd., Chichester, 1987.
P. A. Bartlett and M. A. Giangiordano, J. Org.
Chem., 61,3433-3438 (1996). 158. B. I. Kurganov, N. K. Nagradova, and 0. I.
Lavrik, Eds., Chemical Modification of En-
S. Cha, R. P. Agarwal, and R. E. Parks, Jr., zymes, Nova Science, New York, 1996.
Biochem. Pharmacol., 24,2187-2197 (1975).
159. W . B. Jakoby and M.Wilchek, Eds., Methods in
R. P. Agarwal, T . Spector, and R. E. Parks, Jr., Enzymology, Vol. 46, Academic Press, New
Biochem. Pharmacol., 26,359-367 (1977). York, 1977.
R. Wolfenden, J. Kaufman, and J. B. Macon, 160. T. Imoto and H. Yamada, in ref. 157, pp. 279-
Biochemistry, 8,2412-2415 (1969). 318.
139. V . L. Schramm and D. C. Baker, Biochemistry, 161. R. F. Colman, Subcell. Biochem., 24, 177-205
24,641-646 (1985). (1995).
140. L. C. Kurz and C. Frieden, Biochemistry, 22, 162. B. V . Plapp, i n ref. 70, pp. 259-289.
382-389 (1983). 163. R. F. Colman, FASEB J., 11,217-226 (1997).
141. L. C. Kurz and C. Frieden, Biochemistry, 26, 164. R. F. Pratt, Bioorg. Med. Chem. Lett., 2,1323-
8450-8457 (1987). 1326 (1992).
142. W . Jones, L. C. Kurz, and R. Wolfenden, Bio- 165. A. Krantz, Bioorg. Med. Chem. Lett.,2, 1327-
chemistry, 28,1242-1247 (1989). 1334 (1992).
143. D. K.Wilson, F. B. Rudolph, and F. A. Quiocho, 166. R. B. Silverman, in ref. 70, pp. 291-333.
Science, 252, 1278-1284 (1991). 167. R. B. Silverman, Mechanism-Based Enzyme
144. R. I. Christopherson and M. E. Jones, J. Biol. Inactivation: Chemistry and Enzymology, CRC
Chem., 255,33584370 (1980). Press, Boca Raton, FL, 1988.
145. D. H. Kinder, S. K. Frank, and M. M. Ames, 168. R. Kitz and I. B. Wilson, J. Biol. Chem., 237,
J. Med. Chem., 33,819-823 (1990). 3245-3249 (1962).
169. D. D. Buechter, K. F. Medzihradszky, A. L.
146. C. A. Kettner and A. B. Shenvi, J. Biol. Chem.,
Burlingame, and G. L. Kenyon, J. Biol. Chem.,
259,15106-15114 (1984).
267,2173-2178 (1992).
147. P. R. Andrews, E. N. Cain, E. Rizzardo, and 170. W . B. Knight, K. M. Swiderek, T . Sakuma, J.
G. D. Smith, Biochemistry, 16, 4848-4852 Calaycay, J. E. Shively, T . D. Lee, T . R. Covey,
(1977). B. Shushan, B. G. Green, R. Chabin, et al., Bio-
148. P. A. Bartlett and C. R. Johnson, J. Am. Chem. chemistry, 32,2031-2035 (1993).
SOC.,107,7793-7794 (1985). 171. S. D. Nelson, J. Med. Chem., 25, 753-765
149. P. A. Bartlett, Y . Nakagawa, C. R. Johnson, (1982).
S. H . Reich, and A. Luis, J. Org. Chem., 53, 172. G. D. Green and E. Shaw, J. Biol. Chem., 256,
3195-3210 (1988). 1923-1928 (1981).
150. D. Y . Jackson, J. W . Jacobs, R. Sugasuwara, 173. R. B. Baker, Design of Active-Site Directed Ir-
S. H. Reich, P. A. Bartlett, and P. G. Schultz, reversible Enzyme Inhibitors, Wiley-Inter-
J. Am. Chem. Soc., 110,4841-4842 (1988). science, New York, 1967.
References
174. 5. A. Katzenellenbogen, Ann. Rep. Med. 199. T . M. Penning, TIPS, 212-217 (1983).
Chem., 222-233 (1974). 200. C. T .Walsh, Annu. Rev. Biochem., 53,493-535
175. G. Schoellmann and E. Shaw, Biochemistry, 2, (1984).
252-255 (1963). 201. M. G. Palfreyman, P. Bey, and A. Sjoerdsma,
176. E. B. Ong, E. Shaw, and G. Schoellmann, Essays Biochem., 23,28-81(1987).
J. Biol. Chem., 240, 694-698 (1965). 202. M. J. Jung and C. Danzin, in ref. 1, pp. 257-293.
177. C. Kettner and E. Shaw in L. Lorand, Ed., 203. B. Fr~lund,in ref. 3, pp. 264-266.
Methods in Enzymology, Vol. 80, Academic 204. R. Poulin, L. Lu, B. Ackermann, P. Bey, and A. E.
Press, New York, 1981, pp. 826-842.
Pegg, J. Biol. Chem., 267, 150-158 (1992).
178. M. Dryjanski, L. L. Kosley, and R. Pietruszko,
205. N. V. Grishin, A. L. Osterman, H. B. Brooks,
Biochemistry, 37,14151-14156 (1998).
M. A. Phillips, and E. J. Goldsmith, Biochem-
179. K. von der Helm, B. D. Korant, and J.C. Chero- istry, 38,15174-15184 (1999).
nis, Eds., Proteases as Targets for Therapy, 206. T . Liang, M. A. Cascieri, A. H. Cheung, G. F.
Springer-Verlag, Berlin, 2000.
Reynolds, and G. H. Rasmusson, Endocrinol-
180. J. A. Fee, G. D. Hegeman, and G. L. Kenyon, ogy, 117,571-579 (1985).
Biochemistry, 13,2533-2538 (1974).
207. B. Faller, D. Farley, and H. Nick, Biochemis-
181. J. A. Landro, J. A. Gerlt, J.W . Kozarich, C. W . try, 32,5705-5710 (1993).
Koo,V . J. Shah, G. L. Kenyon, D. J. Neidhart, 208. H. G. Bull, M. Garcia-Calvo, S. Andersson,
S. Fujita, and G. A. Petsko, Biochemistry, 33, W . F. Baginski, H. K. Chan, D. E. Ellsworth,
635-643 (1994). R. R. Miller, R. A. Stearns, R. K. Bakshi, G. H.
182. J. R. Vane, Nat. New Biol., 231,232-235 (1971). Rasmusson, R. L. Tolman, R. W . Myers, J. W .
183. J. B. Smith and A. L. Willis, Nut. New Biol., Kozarich, and G. S. Harris, J. Am. Chem. Soc.,
231,235-237 (1971). 118,2359-2365 (1996).
134. G. J. Roth, N. Stanford,and P.W . Majerus,Proc. 209. B. Azzolina, K. Ellsworth, S. Andersson, W .
Natl. h a d . Sci. USA, 72,3073-3076 (1975). Geissler, H. G. Bull, and G. S. Harris, J. Ste-
185. M . Hemler and W . E. Lands, J. Biol. Chem., roid Biochem. Mo2. Biol., 61, 55-64 (1997).
251,5575-5579 (1976). 210. V . C. Njar and A. M . Brodie, Drugs, 58, 233-
186. F. J. Van der Ouderaa, M. Buytenhek, D. H. 255 (1999).
Nugteren, and D. A. Van Dorp, Eur. J. Bio- 211. P. E. Reynolds, F. Depardieu, S. Dutka-Malen,
chem., 109,l-8 (1980). M . Arthur, and P. Courvalin, Mol. Microbiol.,
187. G. J. Roth, E. T . Machuga, and J. Ozols, Bio- 13,1065-1070 (1994).
chemistry, 22,4672-4675 (1983). 212. D. E. Bussiere, S. D. Pratt, L. Katz, J. M. Sev-
188. P. J. Loll, D. Picot, and R. M. Garavito, Nut. erin, T . Holzman, and C. H . Park, Mol. Cell., 2,
Struct. Biol., 2,637-643 (1995). 75-84 (1998).
189. D. Picot, P. J. Loll, and R. M. Garavito,Nature, 213. Z.W u , G. D.Wright, andC. T .Walsh, Biochem-
367,243-249 (1994). istry, 34,2455-2463 (1995).
190. L. H. Rome and W . E. Lands, Proc. Natl. Acad. 214. Z. W u and C. T . Walsh, Proc. Natl. h a d . Sci.
Sci. USA, 72,4863-4865 (1975). USA, 92,11603-11607 (1995).
191. R. F. Colman in T . E. Creighton, Ed., Protein 215. R. Araoz, E. Anhalt, L. Rene, M. A. Badet-Den-
Function: A Practical Approach, 2nd ed., Oxford isot, P. Courvalin, and B. Badet, Biochemistry,
University Press, Oxford, 1997, pp. 155-183. 39,15971-15979 (2000).
192. P. K. Pal, W . J. Wechter, and R. F. Colman, 216. M. A. Ator and P. R. Ortiz de Montellano in
J. Biol. Chem., 250,8140-8147 (1975). D. S. Sigrnan and P. D. Boyer, Eds., The En-
193. J. L. Wyatt and R. F. Colman, Biochemistry, zymes, 3rd ed., Vol. XM, Academic Press, San
16, 1333-1342 (1977). Diego, 1990, pp. 214-282.
194. R. H. Abeles, Pure Appl. Chem., 53, 149-160 217. J. Grutzendler and J. C. Morris, Drugs, 61,
(1980). 41-52 (2001).
218. E. Perola, L. Cellai, D. Lamba, L. Filocamo,
195. T . I. Kalman, Drug. Dev. Res., 1,311428 (1981).
and M. Brufani,Biochim. Biophys. Acta, 1343,
196. C. T . Walsh, Tetrahedron, 38,871-909 (1982). 41-50 (1997).
197. C. T . Walsh, TIPS, 254-257 (1983). 219. R. J. Polinsky, Clin. Ther., 20,634-647 (1998).
198. R. H. Abeles, Chem. Eng. News., 61, 48-56 220. K. N. Allen and R. H. Abeles, Biochemistry, 28,
(1983). 8466-8473 (1989).
CHAPTER EIGHTEEN
Chirality and Biological Activity

ALISTAIRG. DRAFFAN
GRAHAM R. EVANS
JAMES A. HENSHILWOOD
Celltech R&D Ltd.
Granta Park, Great Abington, Cambridge, United Kingdom
Contents
1 General Introduction, 782
1.1Introduction, 782
1.2 Definition of Chirality, 783
1.3 Pharmacology, 785
1.4 Protein Binding and Metabolism, 786
2 Chromatographic Separations, 787
2.1 Small-Scale HPLC Examples, 788
2.2 Chromatographic Diastereoisomer
Separation, 788
2.3 Preparative HPLCISMB, 789
2.4 Conclusions, 792
3 Classical Resolution, 793
3.1 Separation of the Active Pharmaceutical
Ingredient, 793
3.2 Separation of Intermediates to Single
Enantiomer Active Pharmaceutical
Ingredient, 797
3.3 Crystallization-Induced Asymmetric
Transformation, 798
4 Nonclassical Resolution, 799
4.1 Preferential Crystallization, 799
4.2 Enrichment of Enantiomeric Excess by
Crystallization,800
4.3 Resolution by Direct Crystallization,802
5 Enzyme-Mediated Asymmetric Synthesis, 804
5.1 Amide Bond Formation, 804
5.2 Transesterification and Hydrolysis, 805
5.3 Oxidation and Reduction, 806
6 Asymmetric Synthesis, 807
6.1 Chiral Pool, 807
6.2 Chiral Auxiliary, 810
6.3 Chiral Reagent, 813
6.4 Chiral Catalyst, 814
Burger's Medicinal Chemistry and Drug Discovery 7 Conclusions, 820
ISBN 0-471-27090-3 0 2003John Wiley & Sons, Inc.
1 GENERAL INTRODUCTION mers as a racemate. In addition to the benefit

to the patient of a safer and more potent drug,
1.I Introduction there are numerous advantages to a company
The subtle relationship between the efficacy in developing a single isomer drug. For exam-
and chirality of a drug is an area of research ple, the cost and complexity of testing is sim-
that has grown enormously over the past 20 pler for single isomers, because the Food and
years because of the recognition that single Drug Administration (FDA), which regulates
isomer drugs can be more potent and safer the approval and sale of drugs and other prod-
than their racemic mixtures. It is intended ucts in the United States, requires that both of
that this chapter will provide the reader with the isomers within a racemate be tested. Thus,
an appreciation of the area of chirality and the overall development costs and time are
biological activity. Indeed, consideration of greatly increased because of the requirement
chirality in drug design has become ubiqui- for information on all three, the racemate and
tous because of the greater understanding of both isomers separately. The use of a single
stereoselective pharmacokinetics, pharmaco- isomer should also result in a lower dosage, at
dynamics, and receptor binding. The enabling least one-half that of the racemate. Because of
technologies of chiral synthesis and analysis both the therapeutic advantages and the
have provided the tools to drive these ad-
greater regulatory burden of proof associated
vances in the detailed understanding of the
with a racemate compared with a single iso-
differential biological activity of stereoiso-
mers. Isomers of identical constitution but dif- mer, sales of single isomer drugs have in-
fering in the arrangement of their atoms in creased to over $120 billion dollars, represent-
space are defined as stereoisomers (enanti- ing more than 30% of the total market in 2000
omers and diastereomers are subclasses). En- compared with 3% in 1980 and 9% in 1990.
antiomers consist of a pair of molecular spe- When combined with the declining figures for
cies that are mirror images of each other and new drug application approvals by the FDA,
are not superimposable. the efficient and rapid development of a single
The term chiralty is broadly used within isomer drug is imperative (1).
chemistry and drug development; however, The fundamental reason for the differen-
the terms chiralty or chiral are not always well tial activity of stereoisomers is that the major-
understood by the reader. To clarify the mat- ity of molecules that make up living organisms
ter, we can consider that two main situations are chiral, and moreover, exist in only one en-
exist. In the first case a sample consists of antiomeric form. Thus, stereoisomers will be
equal numbers of molecules having an oppo- seen by the system as different molecules and
site sense of chiralty (heterochiral molecules). will have different effects on the biological sys-
This sample is said to be chiral but racemic. tem. Typically therefore, one of the single en-
The second case occurs when a sample is made antiomers of a drug will demonstrate greater
up of molecules that all have the same sense potency and/or less side effects than the corre-
of chiralty (homochiral molecules). In this sponding racemate, and such examples are
case the sample is said to be chiral and non- given within this chapter. For example, Viga-
racemic. batrin, which is a selective GABA transami-
Recognition of the importance of chirality dase inhibitor, gained approval in 1997 as a
and biological activity has led to the position racemate. While the two stereoisomers exhibit
where the regulatory authorities will no the same pharmacokinetics, only the S-enan-
longer consider the registration of a new race- tiomer is active as an anti-epileptic. However,
mic compound. Exceptions include cases there are some limited cases, such as Tram-
where the stereoisomers interconvert in vivo adol, where a synergistic benefit associated
or where there is a specific advantage or syn- with the dosing of the racemate is claimed in
ergy associated with dosing both stereoiso- comparison with either single enantiomer (2).
1 General Introduction
also commonly used in classifying the config-

uration of sugars and amino acids (see below).
In an achiral environment, enantiomers will
behave identically, exhibiting for example, the
same melting/boiling point, lipid solubility,
Figure 18.1.
nuclear magnetic resonance (NMR), infrared
(IR), etc. However, in a chiral environment
1.2 Definition of Chirality
such as within the macromolecular compo-
The definition of chirality and its measure- nents of a living system or a chiral high
ment are described in great detail in a number performance liquid chromatograpy (HPLC)
of texts (3); however, a brief introduction to system, the enantiomers display different
the key issues is given in this section. Specifi- properties, such as a different route and rate
cally, chirality is a term referring to a property of metabolism, different biological activity,
of a molecule that is nonsuperimposable on its and different retention times in chiral HPLC.
mirror image as shown in Fig. 18.1, where A frequently quoted analogy for the differing
such a molecule is chiral. properties of enantiomers is the hand and
In the majority of cases, chirality results glove example. The left and right hand are en-
from the three dimensional orientation of four antiomers of one another, that is they are non-
different substituents around a carbon atom superimposable mirror images. If the right
forming the chiral center. In addition the ori- hand or "enantiomer" is placed in the right
entation of atoms or groups around sulfur, hand glove, or "receptor", there is a good fit.
phosphorus, and nitrogen atoms can some- Thus, in the case of a true drug and receptor
times form a chiral center. Examples of chiral there will be the desired effect. If the left hand,
drugs are numerous but include Certirizine or "enantiomer" is placed in the right hand
(11,Rotigotine (21, and Ifosfamide (3). glove, there is either a poor fit or no fit.
When a molecule contains only - one chiral As introduced previously, when a chiral
center, the two stereoisomers are known as compound is present as exactly a 1:l mixture
enantiomers. These may be referred to or la- of its enantiomers, it is referred to as a race-
beled using the configurational descriptors as mate or racemic mixture. Thus, if a single en-
either R (rectus meaning righthanded) or S antiomer undergoes racemization, the pure
(sinister meaning left handed), or alterna- enantiomer is converted to a 1:l mixture of
tively, D (dextrorotatory) or L (levorotatory). enantiomers. Perhaps the most famous exam-
The D and L configurational descriptors are ple of a chiral drug is Thalidomide. In the early
Levocertirizine (1) Retigotine (2)
u
lfosfamide (3)
Figure 18.2.
1960s,Thalidomide was widely prescribed as a

sleeping pill and as a treatment for morning
sickness, with claims that it was completely
safe. We all know of the terrible birth defects
suffered by children born to mothers who took
the drug during pregnancy. The drug was
taken as the racemate, and it has been shown
-
that the R-enantiomer is res~onsible for the
drug's anti-inflammatory activity, whereas
the S-enantiomer causes the teratogenicity.
Separation of the racemic mixture togive the
patient only the R-enantiomer is not a simple
answer to the problem. The liver contains an
enzyme that converts the R- into the S-enati-
omer, thus negating the benefit of giving the Figure 18.3.
single enantiomer (4).
As described in this chapter, there are
many reactions that can be performed by erythrose (4), and L-(+)-erythrose (S), and
chemists to create new chiral centers. When threose (6,7), and L-(-)-threose (6) and D-(+)-
these reactions are performed in such a way as threose (7), are shown, with each pair of enan-
to create one enantiomer in greater amounts tiomers being diasteromeric with the other
than the other the process is called asymmet- pair. Diastereomers can be simply defined as
ric or stereoselective synthesis. The term en- stereoisomers that are not enantiomers. The
antioselectivity refers to the efficiency with prefixes erythro- and threo- are applied to such
which the reaction produces one enantiomer. systems that contain two asymmetric carbons
This efficiency is quantitatively described as where two of the groups are identical and the
the enantiomeric excess (eel of the product, third is different. The erythro pair has the
which is the percentage by which one enantio- identical groups on the same side, whereas the
mer is produced in excess of the other. Thus a threo pair has them on opposite sides.
45:8 mixture of two enantiomers will have an Finally, as further elucidation of this rela-
enantiomeric excess of [(45 - 8)/(45 + 8)l X t i o n s h i ~where the molecule contains more
100, which equals 70%. It should be noted that than one chiral center the number of stereo-
if neither the startingmaterial or reaction sys- isomers increases. In the case of the drug gly-
tem is chiral and non-racemic, then the prod- copyrrolate which contains two chiral centers,
uct will be formed as an equal mixture of the there are four possible stereoisomers as shown
enantiomers (i.e., a racemate). in Fig. 18.4. In general, the number of possible
Glucose is perhaps the most widely avail- isomers can be calculated from the formula 2"
able chiral compound. It is a monosaccharide where n is the number of chiral centers.
and part of the sugar group (carbohydrates) The four stereoisomers can be divided as
that occur naturally. Sugars, along with shown into two pairs of enantiomers, where
amino acids, constitute a special example and the (R,R)-(8) and (S,S)-(9) stereoisomers are
are commonly classified with a D- or L-config- enantiomers of one another, and the (S,R)-
uration. In the case of sugars, the D-configura- (10) and (R,S)-(11) stereoisomers are also an
tion is given when the hydroxyl group on the enantiomeric pair. The stereoisomers that do
highest numbered chiral carbon atom is on the not have an enantiomeric relationship to one
righthand side (with the structure drawn in another, such as (R,R)-(8)and (R,S)-(11)are
the Fischer convention as shown in Fig. 18.3). known as diastereomers. Like enantiomers,
Likewise for L-configured sugars, the hydroxyl these molecules are not superimposable on
group is on the lefthand side. In the case of the one another, but unlike enantiomers, they do
tetrose sugars there are two enantiomer pairs not exhibit the same physical, chemical, and
as illustrated in Fig. 18.3. Here, the enantio- spectral characteristics. Thus, they have dif-
mer pairs of erythrose (4, 5), namely D-(-)- ferent meltinglboiling points, lipid solubility,
1 General Introduction
NMR spectra, retention time in HPLC or thin With the advancement in analytical and pre-
layer chromatography (TLC), and can behave parative technologies, the researcher is now
differently in chemical reactions with achiral more able to separate and study individual en-
reagents. The commercial glycopyrrolate antiomers. Pharmacological assessment of the
product contains only the threo isomers (S,R)- behavior of chiral compounds in early phase
(10)and (R,S)-(11). research is imperative for selection of the cor-
rect isomer for development.
1.3 Pharmacology When a racemate is administered, the over-
Biological systems are in the main constructed all pharmacological effect may have one of
from homochiral molecules such as L-amino three general outcomes described below.
acids or D-sugars. Such systems give rise to a
highly "chiral environment," and hence, it is 1. All activity resides in one of the isomers,
not surprising that many drugs possessing the other antipode being inactive.
asymmetric centers exhibit a high degree of 2. Both isomers have equal activity.
steroselectivity in their interactions with bio-
3. Both isomers have the same activity but
logical macromolecules. In the past 20 years or
differ in potencies.
so7pharmacological and toxicological investi-
gations have clearly demonstrated significant
differences in the biological activity of some We will briefly highlight some examples
isomeric pairs. Pharmacokinetic investiga- that help to elucidate the above general classes
tions have also led to a better understanding of with some pertinent examples. The antihyper-
racemic drug action. tensive agent a-methyldopa is an example
It is important to introduce two other where all the desired antihypertensive activity
terms that compare the pharmacological ac- is confined to a single isomer (the L-enantio-
tivity of a pair of enantiomers. The isomer im- mer). It is noteworthy that L-(a)-methyldopa
parting the desired activity is called the eu- is a prodrug, being metabolized to the isomer
tomer (in the case of Thalidomide this is the of the active metabolite, and it is this metabo-
R-enantiomer), whereas the isomer which is lite that has the required activity (5). L-Dopa is
inactive or causes unwanted side effects is marketed as the single enantiomer; during
called the distomer (this is the S-enantiomer early development it was noted that the D-iso-
for Thalidomide). Comparison of the potencies mer exhibited serious side effects such as
of the two isomers comes from the eudismic granulocytopenia (which is defined as a re-
ratio and this can be used in vitro or in vivo. duced number of blood granulocytes) (6).
plex pharmacological story surrounding War-

farin, and to this day, it is still administered as
the racemate.
P-blocking drugs such as Propranolol (14)
(Fig, 18.5) have been shown to stereoselec-
Flecainide (12) tively bind to P-receptors, and it is the S-(-)-
enantiomer that exhibits the P-blocking activ-
ity. However, in vitro, the binding of the R-and
S-enantiomers varies widely within this class
of compound depending on the structure.
Again a complex number of pharmacological
actions come into play with the plasma bind-
ing of the R-enantiomer being much greater
than it's antipode. The two enantiomers are
also stereoselectively metabolized at different
Warfarin (13) rates (R > S).Therefore, the pharmacological
dynamic outcome can vary greatly between
patients who have different P,,, compositions
(10). To add to this already complicated story,
the bioavailability of S-(-)-Propran0101 is re-
duced when given as the single enantiomer
compared with the racemate. This suggests
that the presence of R-(+)-propran0101has a
Propranolol (14) beneficial effect on the availability of the
S-(-)-enantiomer (11).
Figure 18.5.
1.4 Protein Binding and Metabolism
It has been reported that the single enanti- The enantiomers of a specific drug can bind
omers of Flecainide (12) (Fig. 18.5) have sim- stereoselectively to plasma proteins. For ex-
ilar in vitro pharmacological activity. Assess- ample, acidic drugs bind in an enantioselective
ment of the effect of each enantiomer on the manner to human serum albumin (12). There
-
action ~otentialcharacteristics in canine car- are two binding sites on albumin, site I (War-
disc Purkinje fibers gave similar electrophysi- farin) and site I1 (indole) (13). The binding of
ological effects. The plasma concentration L-tryptophan to site I1 is up to 100 times
data of the enantiomers was only very moder- greater than that of D-tryptophan (14). It has
ately enantioselective. This gave rise to the been suggested that binding to albumin can be
authors concluding that there was no advan- used as an indication of the extent of the bind-
tage in administering a single enantiomer of ing to the drug receptor. For basic drugs, a,-
Flecainide over the racemate (7). It is, how- acid glycoprotein (AAG) is used and is rela-
ever, relatively more common to find examples tively non-stereoselective (12, 15).
where the enantiomers have similar qualita- Enantioselective metabolism and clearance
tive pharmacological activity but differ in plays a prominent role in determining the
their potencies. Two classic examples are War- pharmacological effect of a drug. For example,
farin (13)(see Fig. 18.5 for the structure and a highly potent rapidly cleared enantiomer
later on in the crystallization section for it's may be of less benefit clinically than it's lower
preparation) and Propranolol (14). The po- potency antipode, which is more slowly
tency of S-(-)-Warfarin in vivo is two to five cleared. Returning to Warfarin, the S-enantio-
times greater than that of the R-enantiomer mer is eliminated mainly by 7-hydroxylation,
(8).However this difference in potency is off- whereas the R-enantiomer is eliminated by
set by the two- to fivefold greater plasma clear- ketone reduction and oxidation to 6 and 8-hy-
ance of S-(-)-Warfarin (9). These apparently droxywarfarin (16). Tramadol is a centrally
offsetting properties are only part of the com- acting analgesic with efficacy and potency
2 Chromatographic Separations
times more cost-effective, method of prepar-

ing the single enantiomers. It is noteworthy
that the 2001 Nobel prize for chemistry was
awarded to Sharpless, Knowles, and Noyori
for their pioneering work in the area of asym-
metric synthesis (18).
2 CHROMATOGRAPHIC SEPARATIONS
The separation of enantiomeric drugs and in-

termediates by chromatographic methods is a
well-developed area that is broadly utilized
from milligrams to tons (19). Initially, these
chromatographic methods were used to deter-
mine the enantiopurity of the compound ob-
tained from, for example, a separation pro-
cess. With the continuing advancement of
chiral stationary phases (CSP) and the devel-
Figure 18.6.
opment of chromatography, separation of en-
antiomers using chromatographic techniques
is increasingly seen as the method of choice
ranging between weak opioids and morphine, because of the speed at which the separation
and it is currently used as a racemic mixture can be achieved (20).
(Fig. 18.6). It is metabolized in the liver A multitude of chromatographic separation
mainly to 0-desmethyltramadol (ODT), mono- methodologies exist, all of which can be ap-
N-desmethyltramadol, and di-N,O-desmethyl- plied to the separation of enantiomers; for ex-
tramadol. Of the metabolites only the ample, liquid chromatography (LC) (211, gas
0-desmthyltramadol is pharmacologically ac- chromatography (GC) (22), high performance
tive and the (+)-ODT has ~ 2 0 times
0 greater liquid chromatography (HPLC) (23), capillary
activity for the ir. opioid receptor (17). It is electrophoresis (CE) (24), super critical fluid
thought that this metabolite contributes chromatography (SFC) (25), simulated mov-
largely to the analgesic properties of Tram- ing bed (SMB) (26), and membrane technolo-
adol, and for this reason numerous studies gies (27). Once the appropriate technique has
have been undertaken on the activity of the been chosen, at the analytical level, run time,
single enantiomers of Tramadol (17). The sensitivity, and selectivity then need to be en-
complex nature of the interaction of the single -
hanced to improve the limits of detection and
enantiomers of a drug with biological systems analysis time. At the preparative scale (milli-
described in the introduction will have an ef- gram to gram), in addition to enantioselectiv-
fect on whether a single enantiomer or race- ity, other factors such as high loading capac-
mate is taken through to development. ity, robustness, and chemical compatibility
The following sections describe the avail- are essential requirements when selecting the
able methods for the separation of enanti- CSP (28). In addition, the CSP needs to be
omers or their preparation using asymmetric readily available and provide a cheaper pro-
synthesis. The first sections involve separa- cess when compared with the chemical/biolog-
tions of enantiomers using chromatography or ical alternatives that are discussed in this
crystallization technology. These are often chapter. However a significant advantage of
considered to be the most expedient methods the method is that almost any enantiosepara-
and should deliver the single enantiomers in tion can be achieved with one of about 200
the shortest timescale. Asymmetric synthesis commercially available chiral selectors by the
has developed considerably over the last 20 techniques described above (29). In this sec-
years and now provides an alternative, and at tion, we will highlight a number of examples,
from the small semi-preparative scale to large- tant class of drugs that are potent blockers of
potential manufacturing processes. calcium currents and have found use in the
treatment of cardiac arrhythmias, peripheral
2.1 Small-Scale HPLC Examples
vascular disorders, and hypertension (35). It
HPLC is now a widely available and user- has been shown that enantiomers of chiral
friendly method employed for qualitative and DHP have opposite pharmacological profiles
quantitative analysis and is also one of the (35). One of the antipodes is a calcium entry
most expedient methods for providing the mil- activator, while the other is a calcium entry
ligram quantities of stereochemically pure blocker. The analytical and semi-preparative
material required for initial testing. Often the separation using chiral HPLC for a number of
identification of a suitable CSP to effect sepa- DHPs of the structures (Fig. 18.7) has been
ration of a specific pair of enantiomers is seen described (36). Here a number of different
as being labor intensive and requiring consid- CSP were utilized and their ability to separate
erable exverimentation. However, the avail- the above DHPs determined.
ability o f commercial databases that compile
2.2 Chromatographic Diastereoisomer
literature on LC enantioseparations makes
Separation
this process significantly easier (30). The com-
panies that supply CSPs also provide detailed Another approach to the separation of enanti-
information about a specific columns' suitabil- omers by chromatography is to prepare a di-
ity towards the separation of certain types of astereoisomer of the enantiomer to be sepa-
compounds (31). This helps to avoid a "trial rated. As discussed in the introduction to this
and error" approach towards enantiosepara- chapter, diastereomers exist if there is more
tions using chromatography. The use of col- than one chiral center, but are not enanti-
umn switchers to test a number of CSPs can omers of one another. As such they do not
also be of enormous assistance in a rational have identical physical properties. In chroma-
screening program. tography, formation of derivatives such as
One example of separation by HPLC is esters, amides, etc., often leads to better sepa-
Clenbuterol, which is an orally active, sympa- ration of the components. In the case of a race-
thomimetic agent that has specificity for P2- mate, if a chiral reagent (i.e., acid or m i n e ) is
adrenoceptors. Owing to its bronchodilator employed, then a diastereomeric mixture re-
properties, it has found use in the treatment of sults on treatment with such a derivatizing
respiratory disorders in humans and animals agent. One such example is the derivatization
(32). The two enantiomers of Clenbuterol of Pirlindole, which is a racemic anti-depres-
have been separated using a chirobiotic col- sant drug. Here the use of amino acid deriva-
umn, which consists of a macrolide-type anti- tives as chiral derivatizing agents (CDA) was
biotic stationary phase, using a mobile phase shown to enable an effective and efficient sep-
with composition of 70% MeOH, 30% acetoni- aration (37). Preparation of the L-phenylala-
trile, 0.3% acetic acid, and 0.2% triethylamine nine methyl ester (21) enabled separation of
(33). The enantiomers eluted as follows: the Pirlindole enantiomers using a medium
R-(-)-Clenbuterol (15)with a retention time liquid pressure (MPLC) method. This is high-
of 8.35 minutes and S-(+)-Clenbuterol (16) lighted in Fig 18.8, after removal of the CDA
with a retention time of 9.12 minutes. The sin- the enantiomers of -virlindole were obtained in
gle enantiomers obtained through chromatog- high optical purity. This gave several grams of
raphy were of >95% optical purity. It has been each enantiomer, which permitted a study of
shown that (-)-Clenbuterol was 100-1000 the stereochemical influence at the pharmaco-
times more -potent than (+)-Clenbuterol in logical level. The interaction with monoamine
P-adrenergic agonist bioassays (34). oxidase A (MAO-A) and B (MAO-B) with
A number of 1,6dihydropyridines (17-20), Pirlindole racemate and single enantiomers
exhibiting axial chirality (chiralty stemming using biochemical techniques (in vitro and ex
from the nonplanar arrangement of four vivo determination of rat brain MAO-A and
groups about an axis), have been separated by hL4O-B activity) was studied. In vitro, the
small-scale HPLC methods. This is an impor- MAO-A IC,, of (+)-Pirlindole, R-(-)-Pidin-
H A
DHP 1 (17) DHP 2 (18)
I I
H H
DHP 3 (19) DHP 4 (20) Figure 18.7.
dole (22), and S-(+)-Pirlindole (23) were 0.24, some polyacrylamides (Chiraspher) (4I),
0.43, and 0.18 pM, respectively. The differ- cross-linked diallyltartramide (42), and to a
ences between the three compounds were not lesser extent, cyclo-dextrin based phases.
significant, with a ratio between the two enan- Clearly for the larger scale separations, the
tiomers R-(-)IS-(+)of 2.2 in vitro (38). availability of the CSP in larger quantities is a
prerequisite. It should also be noted that at
2.3 Preparative HPLC/SMB the preparative scale, it seems that up to 90%
In the initial discovery phase of drug research, of racemic compounds tested have been re-
time is the most important factor where a suc- solved with just four different polysaccharide-
cessful process must be rapidly identified, based phases (43).
have a short run time, and have general appli- The degree of separation of the two enanti-
cability. As the phase of the project changes to omers obviously plays an important part in
full development, the process needs to be es- the CSP selection. Another equally important
tablished and cost becomes a crucial factor. parameter is the loading capacity of the sta-
Thus, on scale up of an LC method to the pre- tionary phase. The higher the loading capac-
parative level (100 mg and above), a number of ity, the greater the amount of material that
additional important aspects become relevant. can be separated (44). For example the poly-
The selection of a suitable CSP from the pleth- saccharide-based CSPs have a saturation ca-
ora available depends on the following factors: pacity of 5-100 mg/g of CSP; this is clearly
CSP availability, loading capacity and selectiv- dependent on the type of racemate that is be-
ity, throughput, and mobile phase. ingresolved. On the other hand, protein-based
The most successful and broadly applied CSPs have lower saturation capacities, of the
chiral stationary phases comprise the cellu- order 0.1-0.2 mg/g of CSP.
lose-and amylose-based phases developed by For preparative chromatography, through-
Okamoto (Chiracel and Chiralpak) (39), put can be defined as the amount of purified
brush-type phases developed by Pirkle (40), material obtained per unit of time and per unit
(21) Formation of diasteroiosrners
0 H Separate diasteroisorners
I by HPLC cleave to enantiomers
Figure 18.8.
mass of stationary phase. Several factors af- mer was shown not to exhibit these antihista-
fect this including loading capacity, column ef- mink effects. A n asymmetric synthesis (46),
ficiency, selectivity, column size, temperature, and resolution of an intermediate have deliv-
cycle time, flow rate, and the solubility of the ered the single enantiomer previously. How-
racemate. ever, for various reasons, the development of a
The mobile phase plays a crucial role in the preparative HPLC method seems to be the
separation process for at least three main rea- method of choice (47). The main reasons are
sons. The selectivity of the separation, reten- the rapid scale up and the improved economics
tion time. and solubilitv of the racemate are of this approach. Utilization of the amide(24)
directly affected by the kobile phase composi- (Fig. 18.9)gave rise to a highly efficient sepa-
tion. Other parameters such as viscosity, sol- ration using a Chiralpak AD column in a mix-
vent recovery, cost, and solvent handling ture of acetonitrileliso-propanol60:40.The ef-
properties also play a prominent role. This ficiency of the separation can be measured by
brief introduction is also applicable to the cri- the a value(2.76)or the USP resolution(8.54).
teria for CSP selection for SMB. The a value and USP resolution numbers are
An example of a drug separated by prepar- measurements of how efficient the separation
ative HPLC is cetirizine dihvdrochloride,
" a ra- is; typically the higher the number, the better
cemic drug that is a second generation antihis- the separation. This enabled the production of
tamine H,receptor antagonist. Studies on the 1.6 kg of both the (+) and (-) isomers of high
effect of racemic and R (25) and S-Cetirizine purity.
(26)
. . on nasal resistance indicated that both Like all methods for separating chiral mol-
racemic and the R-enantiomer had similar ac- ecules, chromatographic separations do suffer
tivity. The racemate and R-enantiomer inhibit from drawbacks: large quantities of expensive
histamine and induced an increase in nasal stationary phases are needed and large vol-
resistance, thus indicating the antihistaminic umes of mobile phases are used, coupled with
properties of R-Cetirizine(45).The S-enantio- the resultant high dilution of separated prod-
1
Separation, HPLC
Conversion to acid
dihydrochloride
Figure 18.9.
ucts. A number of methods have been intro- The separation of racemic mixtures is well
duced in an attempt to improve on this tech- suited to SMB technology, because these
nology, such as recycling (44). Perhaps the counter current systems can generally only
biggest advancement in recent times has been perform two-component separations at a time
the introduction and application of SMB tech- (51). A detailed description of this technique is
nology in the field of chiral separations (48). given in an excellent article by Guest (52). The
This technique was pioneered in the late SMB system generally consists of several col-
1950s by Universal Oil Products in the United umns, typically 6-12, which are connected in
States as a useful method for separation of oil series. An arrangement of pumps and valves
derivatives and sugars (49). Initially SMB are set up to maximize the stationary phase
technology was applied to very large volumes utilization, allowing for better solvent effi-
of material. For example, xylene isomers are ciency and adsorbate concentration. This
separated in thousands of ton quantities an- leads to two streams coming off the system in
nually. The application of SMB to the separa- solution, one is termed the raffinate, which is
tion of racemic mixtures has led to downsizing enriched in the less adsorbed component, and
and modifications of this technology, but the the other termed extract, which is enriched in
main principles remain the same. The use of the more adsorbed component. The complex
counter-current contact in SMB maximizes set of conditions and parameters that are re-
the driving force for mass transfer and the quired to optimize SMB chromatography has
contact between the substrate and stationary led to the design and process optimization be-
phase. This provides a more efficient use of the ing done by computer simulations (53). A
adsorbent capacity than that of a simple batch number of examples will be discussed that
system (50). highlights this growing area of chiral separa-
recovery of the solvent. This becomes even

more evident when a poorly soluble compound
is used.
The two isomers of the racemic analgesic
drug Tramadol(28) (Fig. 18.11) display differ-
0 NA 0 ing aMinities for various receptors. (- )-Tram-
H ado1 mainly inhibits the reuptake of noradren-
Aminoglutethimide (27) dine, whereas the (+)-isomer inhibits the
reuptake of serotonin. In addition, the (+I-
Figure 18.10. . isomer and its primary metabolite, the O-des-
methyl derivative, are selective agonists of p
opiate receptors (54). Tramadol has been effi-
tions. It should be noted that the scale of op- ciently separated using SMB; in addition, the
eration is dependent on the column size and resolution by crystallization is given in Sec-
can lead to a range from tens of grams to tons tion 3 of this chapter (55). Comparison be-
of separated isomers. Clearly, the larger quan- tween batch chromatography and SMB for the
tities separated imply that this technology has separation of tramadol was made. Use of 12
industrial applications. columns (100 x 21.2 mm ID), each packed
The enantiomers of aminoglutethimide with 20 g of Chiralpak AD 20-pm phase, and
(27) (Fig. 18.10) have been separated using an using a mobile phase composition of 2-propa-
SMB approach (48) (see also Section 3, Fig. nolllight petroleurn/diethylamine (5:95:0.1
18.11 for more information on aminoglute- V/V/V)with feed concentration of 20 g/L, ob-
thimide). A set of 16 columns (6 X 1.6 cm) tained a very high productivity. Thus, 680 g of
containing Chiracel OJ were used. The feed racemic tramadol could be separated per liter
concentration was 1.63% in a mixture of hex- of stationary phase (which equates to 1.2 kg of
ane:ethanol (15:85), which was used as the racemate per kilogram of stationary phase per
mobile phase. A feed rate of 0.45 mllmin and a day). The solvent consumption of 144 L/kg of
mobile phase rate of 6 mllmin gave rise to a racemate should also be noted. This gives both
production of 5.27 g of each enantiomer per (+)- and (-)-enantiomers of high optical pu-
day. The S-(-)-enantiomer was obtained as rity, with the extract of 6.33 g/L and the raffi-
the extract in solution, in a 99.8% purity, nate of 7.69 g/L. Typically, the solvent (mobile
while the R-(+)-enantiomer also in solution as phase) is readily recycled by the use of thin
the raffmate, achieved a 99.9% purity. This film evaporators, which further extends the
would lead to a productivity of 59.9 g of each economic practicality of the process.
enantiomer per kilogram of CSP per day. It
should be noted that one big advantage of
SMB over preparative chromatography are 2.4 Conclusions
the vast savings on mobile phase consump- It should be noted that all the techniques de-
tion; this is generally coupled to thin film
evaporators that allow for very high levels of
-
scribed in this c h a ~ t ecan
r be inter-linked. In
other words, if one technique, i.e., asymmetric
synthesis, failed to deliver enantiopure mate-
rial, then another technique such as crystalli-
zation can be used to push through the prod-
uct to the desired purity. As an example of this
"double" approach, the application of SMB
and crystallization to the separation of man-
delic acid is noteworthy (56). When very high
u Tramadol (28)
levels of enantiopurity are required, the effi-
ciency and cost effectiveness of SMB may not
be economical. However, if for example, a
lower enantiomeric excess can be couded- with
Figure 18.11. an enhancement by crystallization, then the
3 Classical Resolution
SMB approach becomes even more favorable.

This can lead to substantial increases in the
productivity of the SMB process. Further ex-
amples of coupling of two techniques will be
given throughout this chapter.
In summary chromatographic separations Octhreo (29) Cthreo (30)
offer an expedient method for the separation
of enantiomers on a small scale. With the de-
velopment of more efficient stationary phases
and the application of SMB, this may become
the method of choice for the separation of
racemates. Each individual case deserves Oceryfhro (31) Ceryfhro (32)
investigation by all of the techniqueslap-
proaches described in this section. Figure 18.12.
3.1 Separation of the Active Pharmaceutical

3 CLASSICAL RESOLUTION Ingredient
A number of single isomer switches, that is,
Perhaps the most widely used method for the where a drug that was previously sold as a
preparation of single enantiomers involves the racemate is developed and sold as a single iso-
classical resolution of a racemic mixture, mer, have been isolated through classical res-
which uses the formation of crystalline diaste- olution (60). This approach to a single isomer
reomeric salts. As discussed in the introduc- offers several advantages; first, the racemate
tion to this chapter, by converting a racemic is freely available and can be purchased to
mixture of enantiomers to two diastereomeric high levels of purity and quality. Second, the
salts with differing physical properties, one analytical methods will also be in place. Also,
being crystalline and the other remaining in no new synthetic development chemistry is re-
solution, the molecules can be separated and quired, and hence this is the fastest route to
simply converted back to the two separated the single enantiomers at the multigram scale.
enantiomers. With the advent of automation, Generally, this is the first method to be tried.
the classical resolution approach offers a Some of the many available examples demon-
strate the different nuances that can be ap-
speedy and through racemate separation
plied in classical resolution to provide the sin-
methodology. This enables the separation of
gle enantiomers in optimal yields and purities
small amounts of material (milligram to gram) are given in this section.
and can be directly scaled up to provide an An efficient and large scale resolution of
industrial process (kilogram to ton). A number methylphenidate (ritalin hydrochloride) using
of different approaches to this type of separa- dibenzoyl-tartaric acid has been described
tion are highlighted in the following sections, (61). Ritalin is marketed for the treatment of
where it should be noted that diastereomeric children with attention deficient disorder
salt resolutions have mistakenly been consid- (ADHD).Methylphenidate has two chiral cen-
ered to be a mysterious art. In fact, there is ters and originally was marketed as a mixture
considerable information in the literature as of two racemates, 20% DL-threo (29, 30) and
to how to perform a resolution and the physi- 80% DL-erythro (31,32) (see Fig. 18.12 for the
cal chemistry aspects associated with how to structures of all four isomers). As introduced
define and conduct the resolution to its opti- previously, the erythro-isomer is defined as the
mum capability (57-59). We will not go into case when the main chain of a molecule
this in great detail but will highlight some (drawn vertically in a Fischer projection) has
pertinent points; for greater detail, the identical or similar substituents at two adja-
reader is directed to the monograph by cent non-identical chiral centers on the same
Jacques et al. (57). side of the chain, whereas the threo isomer has
threo enantiomer (30) in solution with

4-methylmorpholine hydrochloride. The use
of 4-methylmorpholine to effect base release
in situ helps to streamline the process and to
remove 'a costly free base isolation process.
1 \ i). 4-Me-morpholine
MeOH/H2O
ii). (D)-(+)-DBTA
The D-threo-methylphenidate, (D)-(+)-DBTA,
salt is readily converted into the hydrochlo-
ride salt. It is interesting to note that recently,
Celgene and Norvatis received a FDA approv-
able letter for the use of dexmethylphenidate
for use in ADHD. This consists of only the
D-threo enantiomer (291, in comparison with
the original product, which contained all four
isomers (29-32).
.(D)-(+)-DBTA .(D)-(+)-DBTA
Chemists at Chiroscience took an alterna-
+ tive approach to the D-threo-methylphenidate
4-Me-morpholine (29) single enantiomer (63). An efficient reso-
1 i). Aq.NaOH, iPrOAc

ii). Conc. HCI/H20
lution using L-(-)-di-toluoyl-tartaric acid
(DTTA) was developed. This left the required
D-threo diastereoisomer in solution with a di-
astereomeric excess of 88%yield in 55% chem-
ical yield. Conversion of this salt to the free
base and subsequent crystallization of the hy-
drochloride salt gave >98% ee D-threo methyl-
phenidate in high purity in an overall yield of
42%. The enhancement of the ee is caused by
Figure 18.13. the eutectic point of methylphenidate hydro-
chloride, which is at 30% ee. A more detailed
the corresponding substituents on opposite description of this phenomenon will be dis-
sides. The racemic drug currently used in cussed later in this section.
therapy comprises only the pair of threo-enan- (SJNaproxen (36) is a non-steroidal anti-
tiomers (29, 30). The mode of action in hu- inflammatory drug that was introduced to
mans is not completely understood, but meth- market in 1976 by Syntex. The S-(+)-isomeris
ylphenidate presumably activates the brain about 28 times more effective than the R-(-)-
stem arousal system and cortex to produce its isomer (64). The annual sales in 1995 were
stimulant effect. In addition, there is no spe- about $1 billion; thus, a large amount of effort
cific evidence that clearly establishes the has been spent developing the synthesis of (S)-
mechanism whereby methylphenidate pro- Naproxen (65). The resolution of racemic
duces its mental and behavioral effects in chil- Naproxen (33), developed by Syntex, ap-
dren or conclusive evidence regarding how proaches the ideal case for a Pope Peachy res-
these effects relate to the condition of the olution, that is, resolution using non-stoichio-
CNS. The D-threo (29) enantiomer has, how- metric quantities of resolving agent (66).
ever, been reported to be 5 to 38 times more Here, a mixture of 1 equivalent (eq) of the
active than the corresponding L-threo enantio- racemic acid, 0.5 eq of an achiral amine base,
mer (30) (62). The resolution shown in Fig. and 0.5 eq of the chiral amine (N-alkylglucam-
18.13 uses the racemic hydrochloride salt as inel are used (Fig. 18.13). This results in the
input material. The HC1 salt is cracked to the formation of two salts: one is the insoluble (S)-
free base in situ with 4-Me-morpholine, which Naproxen chiral amine (341, obtained in 45-
then forms a salt with the resolving agent di- 47% yield and optical purity of 99%. The sec-
benzoyl-tartaric acid (DBTA).The required D- ond salt that remains in solution contains (R)-
threo-methylphenidate (29) is removed as the Naproxen and the achiral amine (35). The
crystalline salt of D-(+I-DBTA,leaving the L- insoluble salt of (S)-Naproxen (34) is removed
(R,S)-Naproxen(33)
+
Achiral arnine base
+
Chiral arnine base
O /
/
Precipitate
/ H d
\ Mother liquors
C 9 H
.R*NH2 .RNH2
/ /
\o \o
(S)-Naproxen.chiralarnine base (34) (R)-Naproxen.achiralarnine base (35)
Heat
&CO2H
/ /
'-0 \o
(S)-Naproxen(36) (R,S)-Naproxen.achiralarnine base (37)
Recycled to resolution
Figure 18.14.
by filtration. The mother liquors are then free base (40) in solution. Conversion of the
heated and the achiral m i n e base catalyzes tartrate salt to (S)-bupivacaine hydrochloride
racemization of the unwanted R-enantiomer. (39) was obtained in 35-40% overall yield
The resulting racemic mixture of the acid based on racemate input. To increase the eco-
(R,S)-(37)can then be put back into the reso- nomics of the process, a racemization of the
lution loop. Using this process, the overall unwanted R-enantiomer was required. Treat-
yield of (5')-Naproxen is >95%, based on the ment of the liquors containing the enriched
input of racemic acid. To further highlight the (Rbbupivacaine, tartaric acid, propanol, and
efficiency of this process, the N-alkylglucam- propionic acid at reflux resulted in complete
ine resolving agent is recovered in >98% per racemization in 2 h. By pertinent processing,
cycle. the racemic free base thus obtained is isolated
Racemic bupivacaine hydrochloride (38, by crystallization and can be put back into the
Marcaine) is currently used as an epidural an- resolution cycle (68). Another fine example by
esthetic during labor and as a local anesthetic chemists from Eli Lilly involves a clever reso-
in minor operations. Clinical studies have lution-racemization-recycle (R-R-R) process
shown that levo-bupivacaine (41) is less car- in the synthesis of Duloxetine (69).
diotoxic in man, making it significantly safer As discussed in Section 2 of this chapter,
than the racemate (67). Separation of the en- Tramadol is a chiral drug substance that is
antiomers was readily achieved using 0.25 eq currently used as a high potency analgesic
of D-tartaric acid. This resulted in the isolation agent. The preparation of Tramadol is shown
of a 2:l (S)-bupivacaine D-tartaric acid salt in Fig 18.16, which results in the formation of
(39) in 98% de, leaving the (R)-bupivacaine all four possible stereoisomers from the Grig-
+.0.25eq (D)-(+)-Tartaric
acid
(3-Bupivacaine.(D)-(+)-Tartaric
acid (39)
1). NaOH
2). HCl(g) in IPA
Figure 18.15.
nard reaction (70). The trans isomers (42,431 Another drug that is sold as a racemate is
form over the cis isomers (44,45) in a ratio of Etodolac (46),which is used as a non-steroidal
4 : 2 ; the currently marketed racemate con- anti-inflammatory agent (NSAID) that also
sists of only the trans isomers. It is possible to has analgesic properties; it has the ability to
take this crude reaction mixture and selec- retard the progression of skeletal changes in
tively isolate either the (+)-trans isomer (421, rheumatoid arthritis (72). It has been shown
by using di-p-toluoyl->tartaric acid [D-(+)- that the majority of therapeutic activity lies in
DTTA] resolving agent or the (-)-trans iso- the S-(+)-isomer (73). D-(-)-N-Methylglu-
mer (43) using L-(-)-DTTA. This highlights camine (meglumine) is obtained by ring open-
the high selectivity that can be achieved when ing of D-glucose with methylamine, and hence
using certain resolving agents. In the case of it is readily available and inexpensive. Scien-
Tramadol, the cis isomers (45,46) do not form tists at Chiroscience have described the use of
crystalline salts with DTTA and therefore re- meglumine to separate the enantiomers of
main in solution. This results in a highly effi- Etodolac (74). It was shown that the meglu-
cient process, where the chiral acid not only mine salt possessed suitable properties to en-
separates the single enantiomers (42 or 43) able its use as a salt for pharmaceutical admin-
but also removes other impurities (i.e., cis iso- istration. Therefore, in the case of Etodolac,
mers 44 and 45) at the same time (71). meglumine can not only be used to separate
(NlN
+
C02H
2(S)-CSA
/
Precipitate (47)yer liquors
/ \
Precipitate Mother liquors
Figure 18.16.
Figure 18.18.
the enantiomers, but it can also be used as the
pharmaceutical salt form of choice. this amino acid has found use as an interme-
In addition to the racemic drugs discussed diate compound of the HIV proteinase inhibi-
in this section, resolutions are also used in the tor L-735,525 (75). The racemic cyclic amino
isolation of key building blocks for the phar- acid (47) has been resolved with S-cam-
maceutical industry. An important class of phorsulfonic acid (CSA), which yields the S-
these intermediates are amino acids, many of isomer as the double CSA salt (48) as the pre-
which are available as the single isomer from cipitate (76). Retained in the mother liquors is
natural sources (see INTRODUCTION). The the R-isomer (49). This can neatly be racem-
use of unnatural amino acids and D configured ized to the S-isomer by mixing with S-CSA in a
ones are expected to have a greater influence suitable solvent. On seeding with pure (S,S)-
at the biological level. In the drive for molecu- diastereomeric salt, a further quantity of the
lar diversity and metabolic stability, a number desired (S,S)product (48) is obtained, leaving
of unnatural amino acids such as the non-pro- the R-isomer (49) once more in the liquors.
teinogenic piperazine carboxylic acid (47) The whole cycle can be repeated and has been
(Fig. 18.18) have been developed. Specifically, demonstrated with four complete cycles. To
complete the whole process, the resolving
agent is also readily recovered and recycled.
3.2 Separation of Intermediates to Single

Enantiomer Active Pharmaceutical Ingredient
The previous examples given for diastereo-
Etodolac (46) meric salt resolution have all involved separa-
tion of the active pharmaceutical ingredient
Figure 18.17. (API) or late stage intermediate. Whereas this
(4-Verapamilicacid (50) (R)-Verapamil(51)
(a-Verapamilicacid (52) (a-Verapamil(53)

Figure 18.19.
does offer several advantages from the point of The racemate aminoglutethimide (27) has
view of time and quality aspects, there are also been shown to be effective in the treatment of
a number of drawbacks. If, for example, a ra- hormone-dependent breast cancer (Fig.
cemization of the unwanted isomer cannot be 18.20). Further studies have shown that the
found, there would be a waste of 50% of mate- R-enantiomer is more potent than its antipode
rial. Therefore, it can often be advantageous as an aromatase inhibitor (82). The resolution
to conduct the separation at an earlier stage in of aminoglutethimide itself has been reported
the synthesis of the drug. This leads to better in the literature, using tartaric acid. This res-
atom efficiency compared with resolution of olution suffers from the formation of solid so-
the final product, resulting in a reduction of lutions (83),which require endless crystalliza-
the overall amount of waste and cost. tions to deliver the single enantiomer (84).
One such example is Verapamil, which is a Use of a suitable precursor (54) enabled sepa-
well-established treatment of cardiovascular ration of the intermediate (55),by treatment
ailments (77). S-(-)-Verapamil (51) has spe- with the alkaloid resolving agent (-)-cincho-
cific transmembrane calcium channel antago- nidine. This chiral acid was then cyclized to
nist activity, whereas its antipode (53) influ- nitroglutethimide, which on reduction, gave
ences a wider range of cell pump actions, the desired R-aminoglutethimide (56) (85). It
including those for sodium ions (78). Vera- is noteworthy that in the case of aminoglute-
pamil has been separated into its single enan- thimide, the m i n e functionality is an aniline
tiomers by resolution with expensive resolving moiety. Because of the low pK, associated with
agents, which required multiple recrystalliza- this amine (2.5-4.6), the number of acidic re-
tions to effect complete separation (79). Look- solving agents that can be employed are re-
ing into the synthetic sequence of Verapamil, duced, because they need to be of relatively
several intermediates seemed to be attractive high acidity to form a salt.
alternatives to Verapamil(80). The intermedi-
3.3 Crystallization-Induced Asymmetric
ate verapamilic acid (Fig. 18.19) was effi-
Transformation
ciently separated using a-methylbenzylamine
(a-MBA), which is an extremely cheap resolv- A number of amino acids have been separated
ing agent (81). Subsequent transformation of by resolution, in certain cases the yield of the
the easily obtained R- or S-verapamilic acid required diastereoisomer has been greater
(50 or 52), required a further three to four than 50% (86). p-Chlorophenylalanine is of
synthetic steps to yield the active pharmaceu- considerable pharmacological interest, be-
tical ingredient. cause of its ability to inhibit serotonin forma-
1 4 Nonclassical Resohtion
2 steps
I
Figure 18.20.
tion in laboratory animals (87). Both the R- ing agents. As with all screens, analysis of the
and S-enantiomers have also been used as data is often time consuming and laborious.
building blocks in the synthesis of other drugs. Bruggink et al. have shown that differential
An ingenious approach to R-p-chlorophe- scanning calorimetry (DSC) of the isolated
nylalanine methyl ester, which is based on a salts can help to quickly determine whether
one-pot resolution-racemization sequence, is the isolated salt will provide a through resolu-
highlighted in Fig. 18.21. Here, treatment of tion (91). However, with a methodical and pre-
racemic p-chlorophenylalanine methyl ester cise screening protocol, it is nearly always pos-
(57) with 0.5 eq of D-tartaric acid and 0.1 eq of sible to find a suitable resolving agent that
salicylaldehyde in methanol gave a 68% yield effects separation of the enantiomers (92).
of 98% enantiomeric purity of the 2:l R-p-
chlorophenylalanine D-tartaric acid salt (58).
4 NONCLASSICAL RESOLUTION
The reason that the absolute yield is greater
than 50% is caused by the S-enantiomer being
4.1 Preferential Crystallization
racemized in situ. The 2:l tartrate salt is crvs-
"
talline and is therefore removed from the sys- A brief description of the type of "racemic"
tem by virtue of its insolubility. This drives compounds is necessary for the reader to bet-
the equilibrium further in favor of the 2:1R-p- ter understand the principles behind the ap-
chlorophenylalanine D-tartrate salt (88). plication of crystallization methods to the sep-
While the common goal remains to be the aration of enantiomers. Three fundamental
rational design of resolving agents (89), it is types of crystalline racemates exist. In the
clear that we are still away from this actually first, the crystalline racemate is a conglomer-
happening. An alternative "family" approach ate, which exists as a mechanical mixture of
to classical resolution has been demonstrated crystals of two pure enantiomers. The second,
by Vries et al. (90). A group of similar resolv- which is the most common, consists of the two
ing agents are mixed simultaneously with the enantiomers in equal proportions in a well-
racemate. This was done to shorten the time defined arrangement within the crystal lat-
required to complete the resolving agent tice; this is termed racemic compound. The
screen. Note should be made that the families third possibility occurs with the formation of a
of resolving agents are very similar and that solid solution between the two enantiomers
the crystalline species obtained by this that coexist in an unordered manner in the
method contained more than one of the resolv- crystal. This kind of racemate is called a pseu-
(R)-pchlorophenylalanine.0.5eq(D)-tartaricacid (58)
Figure 18.21.
doracemate and is rather rare. Conglomerates ior of the two enantiomers (binary melting
have been estimated to be approximately 10% point phase diagram) or their solubility behav-
of all racemates (93). Diagrammatic represen- ior in the presence of a solvent (ternary solu-
tation of the first two types of racemate are bility phase diagram), separation of enanti-
shown in Fig. 18.22. omers can be reproduced. Phase diagrams for
By understanding the appropriate phase the three types of racemate are shown in Fig.
diagrams, which describe the melting behav- 18.23. For a full and detailed explanation of
this topic refer to the monograph of Jacques et
al. (57).
+x.>+x+
(.z.> Racemic mixture (conglomerate)
4.2 Enrichment of Enantiomeric Excess by
Crystallization
The attainment of high levels of enantiopurity
is not always possible by enzymatic or diaste-
reomeric resolutions or by asymmetric syn-
theses alone. It is however frequently possible
to prepare a pure enantiomer from a partially
resolved sample by simple recrystallization.
9+x.)+z+
For this process to proceed successfully it is
6
x necessary that the initial enantiopurity of the
mixture is greater than that of the eutectic
point in the phase diagram. By utilization of
the phase diagram, the optimal quantity of sol-
Racemic compound
vent required can be calculated. It is also pos-
Figure 18.22. sible to calculate the maximum expected yield.
4 Nonclassical Resolution
Conglomerate (-1
Racemic compound
High EE eutectic point
u(-1
Figure 18.24.
(+I Pseudo racemate
Figure 18.23. to deliver enantiopure product. Another ex-
ample of this type of compound is Warfarin
Note should also be made that in some cases (13).Chemists at Dupont (97) developed an
recrystallization reduces the enantiomeric ex- asymmetric hydrogenation approach, which
cess, which can lead to crystallization of the gave Warfarin in -80% ee. Simple crystalliza-
racemate (94). In these cases the mother li- tion in an appropriate solvent yielded optically
quors contain moderately to highly enriched pure Warfarin, thus indicating that the eutec-
material. It is therefore important to plan the tic point is below 80% ee. (See earlier section
strategy at which point the enantiomer is re- on the metabolism and binding properties of
crystallized to optical purity. This may be the Warfarin enantiomers).
from an enzymic resolution, or in the event The phase diagrams below highlight two
that an asymmetric synthesis has failed, to de- typical cases, the first where the eutectic point
liver enantiopure product. As discussed in Sec- E is close to the racemate, and the second
tion 3, the liquors from the diastereomeric res- where the eutectic approaches the single en-
olution with DTTA of 88%de can be cleaved to antiomer as shown in Fig. 18.24. In the first
the free base, and crystallization of the hydro- case, it would be preferable to crystallize the
chloride salt gives >98% ee. This is because of enriched enantiomer to optical purity, e.g.,
the fact that methylphenidate hydrochloride methylphenidate. However, in the second
has a eutectic point of 30%ee. Davies et al. (95) case, a very stable racemic compound exists,
and Winkler et al. (96) have prepared single giving rise to a high eutectic point. Here crys-
enantiomer methylphenidate (29). Their ap- tallization of enriched enantiomer mixture
proaches use an enantioselective synthesis; will only be successful at high ee. For example,
the enantiomeric excesses are 86% and 69%, verapamil hydrochloride requires that the ee
respectively, thus requiring recrystallization be greater than 98% for crystallization to yield
enantiopure product. Below this, the enantio- of acetylcholine. An increase in the level of
purity is reduced. In this case, it is advanta- acetylcholine in patients with AD has been
geous to recrystallize the diastereomeric salt shown to improve their cognitive perfor-
precursor to optical purity before proceeding mance. Galanthamine has been extracted
to final product. from botanical sources; however, several tons
of daffodil bulbs are needed to produce 1 kg of
4.3 Resolution by Direct Crystallization
product. A synthetic route has been developed
It is important to show how conglomerates are that uses a crystallization-induced chiral
identified. We have already seen that they transformation (Fig. 18.25). This crystalliza-
have specific phase diagrams as shown in Fig. tion was first reported by Barton and Kirby
18.23. Other such data that support identifi- (100) and further developed by Shieh and
cation of a conglomerate are IR, X-ray data, Carlson (101). The success of this transforma-
and observation of a spontaneous resolution tion is based on two phenomena: narwedine
or resolution by entrainment. Note should be (591, which crystallizes as a conglomerate, and
made that in 1848, Louis Pasteur separated (-)-namedine (60), which equilibrates with
the dextrorotatory and levorotatory crystals of (+)-namedine through a retro-Michael inter-
sodium ammonium tartrate. This manual mediate. This process has now been developed
sorting of crystals is also known as triage, and so that (-)-narwedine (60) is routinely ob-
by its very nature is time consuming and labo- tained in 80% yield from the racemate input,
rious. The readers are again directed towards as shown in Fig. 18.25 (102).
the Jaques et al. monograph, which lists over Recently a number of potent 5-HT, recep-
250 known examples of conglomerates (57). tor antagonists such as Ondansetron have
There are two possibilities for separation of been reported to be clinically effective for the
enantiomers by direct crystallization. The blockade of chemotherapy-induced nausea
first uses spontaneous resolution, which oc- and emesis (103). The structurally novel com-
curs when a conglomerate crystallizes. This pound (62) has also been shown to be a highly
crystallization may be followed by the me- potent 5-HT, antagonist (104); specifically,
chanical separation of the crystals of the two the R-(-)-(62) enantiomer was shown to be
enantiomers. Various techniques have been the most active. Comparison of the physical
developed that aid this separation. data of the racemate and single enantiomer
The second type of resolution by direct indicated that this structure (62) exists as a
crystallization is known as entrainment. conglomerate (104). By careful experimenta-
Here, the differences in the rate of crystallization, the best concentration, temperature, and
tion of the enantiomers in a supersaturated time for crystallization were discovered. Table
solution give rise to a separation. Strict con- 18.1 highlights the results obtained for the en-
trol of the conditions for the crystallization are trainment.
required, with the system of crystals and solu- The initial concentration of the solution
tion not being allowed to come to equilibrium was 10.0 g of (2)-(62)in 50 g of acetone. In all
and time playing an important role. The oc- runs, 10 mg of seed crystals were used. From
currence of conglomerates has been estimated the 10 runs highlighted in the 18.1, 21.0 g of
to be approximately 10% of all racemic com- R-(-1462) of >92.O% ee and 21.4 g of (S)-(+)-
pounds. We will now illustrate this phenome- (62) of >90% ee are obtained from an input of
non with some pertinent examples. 50.4 g of racemate. The table also nicely illus-
An example of use of the conglomerate Nar- trates the continuous nature of the process,
wedine (59) in the synthesis of a natural prod- which coupled with the fact that no resolving
uct Galanthamine (61) which is an Amarylli- agent, chiral auxiliary, enzyme, or catalyst is
duceae alkaloid and has been used clinically needed, underlines the economic advantages
for 30 years for neurological illnesses (98). of this type of process.
More recently it has been approved for the use The importance of amino acids as building
in the treatment of Alzheimer's disease (AD) blocks for asymmetric synthesis is well docu-
(99). Galanthamine acts to inhibit acetylcho- mented (105). A number of amino acids have
linesterase (AChE), thus increasing the levels been shown to exist as conglomerates. Shi-
4 Nonclassical Resolution
0 0
Entrainment
NMe
Me0 Me0
Figure 18.25.
raiwa et al. have described the preferential is successfully resolved using preferential
crystallization of racemic methionine hydro- crystallization. The glycidic acid-substituted
chloride (106). The obtained D- or L-methio- phenylesters were prepared; of the 30 synthe-
nine hydrochloride was, however, only -75% sized, only one exhibited conglomerate prop-
optically pure, requiring a further recrystalli- erties (109). This was the 3-(Cmethoxyphe-
zation to furnish enantiopure product. Shi- ny1)glycidic acid 4-chloro-3-methylphenyl ester
raiwa et al. have also recently disclosed the (63). Table 18.2 summarizes the physical data
resolution of (2RS, 3SR)-2-amino-3-chlorobu- collected, which is illustrative of the conglom-
tanoic acid HC1 again using entrainment erate nature of this compound.
(107). Here it was shown to be necessary to The obtained single enantiomer (- )-epox-
conduct the crystallization in an ethanol15 M ide (64) is then converted into the required
hydrochloric acid solvent mixture for optimal (+)-isomer of Diltiazem (65) in several steps,
results. By careful control of the conditions, as highlighted in Fig. 18.27.
high levels of enantiomeric excess were ob- Taxol is a natural product isolated in very
tained in the crystalline salt. low yield from Taxus brevifolia and is used in
Chemists in Japan have developed an excel- the treatment of cancer (110). The extreme
lent approach to (+)-Diltiazem, which is a chemical complexity of Taxol makes produc-
coronary vasodilator (108). An intermediate tion by total synthesis uneconomical. How-
ever, a semisynthetic approach using the nat-
urally derived 10-deacetylbaccatin I11 (66)
condensation with N-benzoyl-(2R, 3s)-3-phe-
nylisoserine (67) does provide an alternative
and economic approach (111). N-benzoyl-(2R,
3s)-3-phenylisoserine (67) is also commonly
known as the Taxol side-chain and has been
prepared in optically active form using chiral
auxiliaries or resolving agents (112). It has
been shown that the Taxol side-chain is a con-
Figure 18.26. glomerate and can therefore be cheaply and
804 Chirality and Biological Activity
Table 18.1 Resolution of (62) by Preferential Crystallization

EE of
Time Solution Amount of %EE of
Run Added (g) Seed (minutes) (%EE) Crystals (g) Rotation Solid
Reprinted from H. Harada, Tetrahedron Asymmetry, vol. 8, T . Marie, Y. Hirokawa, and S. Kato, 1997, pp. 2367-2374.
Reproduced with permission from Elsevier Science.
efficiently entrained to the single required endo enable the use of higher temperatures,
antiomer (113). pressures, and organic solvents.
Enzymes can be utilized to affect a number
5 ENZYME-MEDIATED ASYMMETRIC of transformations; the broad spectrum of re-
SYNTHESIS actions, including amide bond formation, hy-
drolysis, esterification, reduction, oxidation,
Enzymes have found frequent use in the syn- and carbon-carbon bond formation, has been
thesis of single isomer drugs from racemic or reviewed elsewhere (114).
prochiral compounds at the larger manufac-
turing scales. The use of enzymes to effect 5.1 Amide Bond Formation
chiral transformations in the medicinal chem-
istry laboratory has been far less frequent; The use of enzymes to stereospecifically form
however, the increasing availability of immo- amide bonds has been described in many texts
bilized and stabilized forms of enzymes has (115); however, the commercial availability of
made their use easier and the resultant trans- cross-linked enzyme crystals (CLECs),for ex-
formations more predictable. ample, PeptiCLEC-TR, which is an immobi-
By virtue of their complex macromolecular lized form of Thermolysin protease, has been
structure, including a highly defined active used in the synthesis of D2163 (68), a novel
site, enzymatic transformations generally matrix metalloproteinase inhibitor (116). In
proceed with a high degree of chemical selec- vitro enzyme screening identified the all-nat-
tivity and stereospecificity. Reactions are typ- ural SSS-isomer as the active product. The
ically conducted under mild conditions of tem- elegant CLEC (117) technology used in this
perature, pressure, and pH, thus minimizing example makes the enzyme stable to typical
losses caused by unwanted side reactions or organic reaction conditions and enables facile
partial racemization. The use of extremo- removal of the enzyme at the end of the reac-
philes or cross-linked enzymes such as CLECs tion by simple filtration. On this basis, it is
Table 18.2 Properties of (63) Indicating Conglomerate Nature

Solubility Solubility
Compound MP PC) (g1100 mL) THF (g/100 mL) DMF IR Spectrum
(21-4.2 123-124 14.0 13.0 Identical
(-h4.2 139-141 6.7 6.9 Identical
5 Enzyme-Mediated Asymmetric Synthesis
the role of cyclooxygenase-independent prop-

erties of the R-enantiomers in the gastrointes-
tinal toxicity of the racemates and the likeli-
hood that the use of racemates increases the
propensity of profens to alter the pharmacoki-
netics of other drugs has been described (118).
Whereas not all profens are sold as single
isomers, Naproxen is sold as the single S-
enantiomer (36) where various strategies in-
cluding crystallization, chromatographic sep-
aration, asymmetric hydrogenation and enzy-
matic hydrolysis, and esterification have been
used to prepare the single isomer (65).Specific
examples include the use of Candida cylindra-
cea lipase to enantioselectively prepare single
isomer naproxen ester with trimethyl silyl
methanol (119)and the use of Candida rugosa
lipase in an enantioselective continuous hy-
drolysis of Naproxen methyl ester (120).
Pipecolic acid is a component of a number
of active drugs, including bupivacaine (38)
and thioridazine (72) (Fig. 18.30), which has
been efficiently resolved as the racemic n-octyl
pipecolate with Aspergillus niger. The S-iso-
mer is obtained as the free acid in a 40% yield
based on the available enantiomer with a 97%
ee (121).
Propanolol(14) is a broadly used P-adren-
Figure 18.27. ergic receptor blocking agent that is sold as
the racemate. However, the majority of the
activity is associated with the S-enantiomer
(74) (see Section 2) (122). The asymmetric
anticipated that medicinal chemists will more
commonly use these enzymes in the future.
The coupling of dipeptide (69) to the pro-
tected a-thio carboxylic acid (70) was con-
ducted in organic solvent at high concentra-
tion with the desired product produced in a
few hours with high enantiospecificity.
5.2 Transesterification and Hydrolysis

A widely used technique for separating race-
mic mixture is the use of enzyme mediated
transesterification or hydrolysis. One impor-
tant example is the separation of Naproxen
(331,which is a member of the 2-arylpropionic
acid class of profens that are broadly used as
NSAIDs (see Section 2 for the separation of
enantiomers using a crystallization ap-
proach). The important association between
chirality and biological activity of this class of
drugs has been extensively researched, where Figure 18.28.
0 SCOPh
I
Figure 18.29.
synthesis of the desired S-enantiomer has 5.3 Oxidation and Reduction
been achieved by the selective acylation of the
R-enantiomer of the key intermediate (73) as In addition to the widely reported techniques of
amide bond formation, transesterification,and hy-
-
shown in Fie. 18.30.
drolysis,enzymic enantioselectiveoxidation is also
used in the synthesis of single isomer drugs. Pate1
described the efficient oxidation of benzopyran
(751,an intermediate in the synthesis of potassium
channel openers (123).The transformationwasef-
fected with a cell suspension ofMortie~llaraman-
niana with glucose over a 48-h period, the isolated
product (77) was obtained in a 76% yield with an
optical purity of 97%and a chemical purity of 98%,
as shown in Fig. 18.32.
Reduction with a variety of enzymes has
been reported (114), including bakers yeast for
the reduction of a-methyleneketones to the cor-
responding a-methylalcohol (124),a functional-
ity that is present in a number of drugs. The
reduction of an azidoketone (78) using Pichia
Thioridazine (72) angusta enzyme has been used in the synthesis
of S-salmeterol (79) (125). Salmeterol (Ser-
event) is a potent, long-acting P2-adrenorecep-
tor used as a bronchodilitor in the treatment of
asthma. Recently, Sepracor claimed that the S-
enantiomer had a higher selectivity for P2 recep-
tors and that it did not cause certain adverse
Bupivacaine (38) effeds associated with the administration of
( 2 ) -or (R)-salmeterol (126). The synthesis of
Figure 18.30. 6')-salmeterol(79) is shown in Fig. 18.33.
6 Asymmetric Synthesis
Figure 18.31.
6 ASYMMETRIC SYNTHESIS starting materials or by synthetic manipula-

tion of fermentation products) (127). The
Synthetic organic chemists have a vast array optically pure starting materials that have
of tools at their disposal when faced with the been used in drug synthesis include amino ac-
challenge of preparing a chiral compound as a ids, hydroxy acids, terpenes, alkaloids, carbo-
single enantiomer. The purpose of this section hydrates, and many more structurally diverse
is to introduce the reader to some asymmetric compounds. There are many syntheses involv-
approaches toward chiral drugs and medicinal ing clever manipulation of chiral pool starting
compounds, highlighting examples where the materials and use of these chiral centers to
stereoisomers behave differently in biological induce further asymmetry (i.e., by diastereos-
systems. There are many excellent books and elective reactions). We will briefly" consider
reviews covering methods for asymmetric syn- some examples in which all or most of the
thesis and their application to the preparation chiral centers in the target molecule originate
of pharmaceutical agents and complex natural directly from nature.
products (127). Angiotensin-converting enzymes (ACE) in-
hibitors are used mainly for the treatment of
6.1 Chiral Pool
cardiovascular disorders and are among the
The use of enantiopure starting materials biggest selling drugs worldwide (128). Enala-
from nature in the synthesis of chiral drugs is pril (80) is synthesized from the natural
not only of great historical significance but re- amino acids L-alanine and L-proline(129). Lis-
mains of critical importance to the pharma- inopril (81) incorporates a lysine derivative
ceutical industry. Consideration of the cur- (130). One of the chiral centers in Captopril
rent biggest-selling single enantiomer drugs (82) is derived from proline, but the other is
shows how important this approach is (8of the generated by chemical or enzymatic resolu-
top 10 in 1996 were obtained from chiral pool tion (131). Cilazapril(83) is a conformation-
ally restricted second generation ACE inhib-
itor developed by Roche, and the core is
synthesized from a glutamic acid derivative
and an amino acid-derived pyridazine (128,
132).
There are many other examples of drugs
based on an amino acid backbone. Stoner et al.
recently reported a synthesis of the HIV pro-
tease inhibitor ABT-378 (Lopinavir) (84)
(133). In a similar synthesis to that of the re-
lated compound, Ritonavir, key intermediate
(85)is prepared by stepwise diastereoselective
reduction of enaminone (86).This means that
the existing chiral center, derived from natu-
ral L-phenylalanine (protected to 87),controls
Figure 18.32. the formation of the two new stereocenters as
OH
- H
/// Br(CH2)60(CH2)4Ph,DMF
1
AcOH, water
Figure 18.33.
discussed for chiral auxiliaries below. Two unit available from D-pyranoses (136). Work-
acylations then complete the synthesis, with ers at Schering-Plough used this as the key
the final chiral center clearly derived from L- starting material in a concise synthesis of Sch
valine. 57939 (92), a P-lactam-based cholesterol ab-
The stereospecificity of binding at the his- sorption inhibitor (137).The condensation be-
tamine H3-receptor was investigated by pre- tween the dianion of (S)-3-hydroxy-y-butyro-
paring a series of ligands from D- or L-histidine lactone and an appropriate diary1 imine
(88)(134). It was found that compounds such proceeded with very high diastereo- and enan-
as (5')-(89)had greater affinity for the receptor tioselectivity, generating azetidinone (93)
than their R-enantiomers. In addition, replac- with a trans:& ratio of >95:5.
ing the aromatic moiety with a cyclohexyl Researchers at Abbott have been investi-
group (e.g., 90) switched the activity to ago- gating the use of pyrrolidinyl isoxazoles as nic-
nism for compounds with an amino group in otinic cholinergic channel activators (138).
the chain. Until recently, ABT-418 (97) was undergoing
Hydroxy acids are important c h i d start- clinical trials as a potential treatment for cog-
ing materials in the synthesis of many biolog- nitive impairment and decline and for Alzhei-
ically active compounds (135). (S)-3-Hydroxy- mer's disease. A short synthesis of ABT-418
y-butyrolactone (91)is a very useful synthetic was devised starting from commercially avail-
L-histidine methyl ester (88)
Enalapril R=CH3 (80)

Lisinopril R=(CH2)4NH2 (81)
Captopril (82)
Figure 18.36.
able (8)-pyroglutamic acid methyl ester (94)

(139).Acetone oxime dianion was added to the
methyl ester (94) to generate an intermediate
Figure 18.34. (95). Racemization of the chiral center was
found to occur under basic conditions; how-
ever, this was avoided by immediate treat-
\
(i) NaBH(TFA)3 (i) NaBH4,TFA
Lopinavir (87) (86) (89-93%)
Figure 18.35.
1. LDA (2 eq.)/DMF/DMPU
2. Ar 'CH=NA?
0
(S)-3-Hydroxy-y-butyrolactone (91)
'F
Sch 57939 (93) (99.5%ee)
Figure 18.37.
ment with concentrated sulfuric acid resulting 6.2 Chiral Auxiliary

in cyclization and dehydration to amide isox-
azole (96). Redudion and N-rnethylation In this approach the substrate is attached to a
yielded ABT-418 (97). The binding affinity of chiral, non-racemic unit that controls the for-
ABT-418 at neuronal cholinergic channel re- mation of one or more new chiral groups. Re-
ceptors was measured to be one order of mag- action of the coupled unit with a reagent or
nitude greater than the corresponding R-en- prochiral substrate is designed to produce one
antiomer (Ki = 4.2 versus 44 nM)(138). diastereomeric product in excess. The auxil-
Me
ABT-418 (97) (ee >99%)
Figure 18.38.
2. BuLi, THF, -78°C
HN
i 0
NaHMDS,
THF, -78" to -50°C
BrCH2C02Bu
(100) 1 *
LiOOH, THFM20 eq,dO
0
t-Bu02C
6.127
Figure 18.39.
iary is then removed (and preferably recov- variety of different reactions (140, 141). The
ered), providing the product in high enantio- use of this chiral auxiliary in the preparation
meric excess. This process is most attractive of pharmaceuticals is widespread, and there
when both isomers of the auxiliary are readily are several large-scale processes using such
available in enantiomerically pure form, and chemistry (142).
where the reaction leads to high levels of ste- Abbott reported an improved synthesis of
reoselectivity in a predictable manner. Attach- ABT-627 (98)involving an asymmetric alkyla-
ing and removing the auxiliary should be tion of the valine-derived acyl oxazolidinone
straightforward and proceed without loss of (99) (143).ABT-627 (Atrasentan)is a selective
stereochemical integrity. endothelin ETAreceptor antagonist under de-
Many auxiliaries currently in use are development for the treatment of cancer, partic-
rived from 1,Parninoalcohols (140). These are ularly prostrate cancer. Acid (100) was acti-
readily available from natural sources with lit- vated as a mixed anhydride and treated with
tle or no synthetic manipulation and can react the lithium anion of the oxazolidinone to give
in a variety of ways to form conformationally (101). Both of the following deprotonation and
well-defined (usually cyclic) auxiliary systems. alkylation steps must be controlled to give
The use of oxazolidinones in asymmetric syn- high levels of stereoselectivity. The (Z)-eno-
thesis was developed by Evans et al.,and these late (102) is favored, both kinetically and ther-
oxazolidinones have been used extensivelv in a modynamically, by the bulky iso-propyl group
Table 18.3 Stereochemical Variation

(3a, 6 ) Ki (nMversus HIV-1) IC,, (a,
cell HIV-1)
R, R Tipranavir
R,S
S, R
s, s
and is held rigid by chelation to the carbonyl their binding to Dl and D, dopamine receptors
oxygen of the oxazolidinone. The major stereo- was investigated by Cabedo et al. (146). The
isomer then results from alkylation of this synthetic route, illustrated by the preparation
chelated enolate anion from the least hindered of the (1s)-isomer involves stereoselective re-
"upper" face to yield (103) as the major prod- duction of the isoquinolinium salt (114) with
uct. There are many strategies for removal (R)-phenylglycinol (introduced in protected
and recovery of an oxazolidinone auxiliary form as 112) as the chiral auxiliary. The (1R)-
(141). In this case, hydrolysis with lithium enantiomer of (115), prepared in an analogous
peroxide provides the acid that is transformed fashion using (S)-phenylglycinol,binds to do-
into Atrasentan through a cyclization-ring pamine receptors with considerably less f i n -
contraction strategy controlled by the chirality (>lo0 p N versus Dl and 61.2 pM versus
ity present in (103). D . In contrast, stereochemical differentia-
Tipranavir (PNU-140690)is a potent third- tion was not observed at the dopamine uptake
generation HIV protease inhibitor in clinical site for these compounds.
development by Boehringer Ingelheim (under Two different chiral auxiliary approaches
license from Pharmacia). The biological activ- have been applied to the synthesis of NPS
ity of such 5,6-dihydro-4-hydroxy-2-pyrone 1407 and it's enantiomer (119) (147). NPS
sulfonamides shows considerable stereochem- 1407 is an antagonist of the glutamate NMDA
ical variation (Table 18.3) (144).The R-config- receptor that has in vivo activity in neuropro-
uration is preferred at both chiral centers (3cr tection and anti-convulsant assays. The R-en-
and 6), and Tripanavir is more than 50 times antiomer was synthesized in four steps from
as potent as its enantiomer in a cell culture (116) with the chiral center introduced by. a
assay using HIV-lI,I,-infected H9 cells. An completely stereoselective alkylation of hydra-
asymmetric synthesis (145) begins with the zone (117). The chiral auxiliary, S-(-)-1-ami-
Michael addition of an aryl cuprate (derived no-2-(methoxylmethyl)pyrrolidine (SAMP),
from commercially available Grignard reagent was introduced by condensation with alde-
105) to the unsaturated oxazolidinone imide hyde (116) and removed by catalytic hydro-
(1041, yielding the adduct as a single diaste- genolysis. In the second method, the S-enan-
reomer (106). The nitrogen protecting group tiomer was formed in a four-step sequence
was changed and an acetyl group introduced with the chiral center installed by the Michael
to give ketone (107), which undergoes a addition of chiral amine (121) (formed in one
stereoselective aldol reaction with an acety- step from the readily available cr-methylben-
lenic ketone (108). The highest diastereoselec- zylarnine) to benzyl crotonate (120). NPS
tivity was obtained for this reaction using 1407 (123) was found to be 12 times more po-
Ti(OnBu)C1, as the Lewis acid. Both of the tent than it's enantiomer (119) at the NMDA
critical asymmetric steps to form new chiral receptor in an in vitro assay.
centers are controlled by the (R)-phenyl ox- An example of the use of a terpene as a
azolidinone. The chiral auxiliary is removed chiral auxiliary is provided by the synthesis
when (109) is treated with base to form the of the anti-viral reverse transcriptase inhib-
lactone ring. This is followed by two further itor Lamivudine (148). The nucleoside ana-
steps that generate PNU-140690 (110) as a log is marketed by Biochem Pharma (now
single enantiomer. Shire Pharmaceuticals) and Glaxo Wellcome
The enantioselective synthesis of dopa- (now GlaxoSmithKline) for the treatment of
minergic benzyltetrahydroisoquinolines and HIV and hepatitis B virus infection. In the
CuBrIDMSrrHF
0 O°C/l hour
*
N(TMS)z
(105)
MgBr
N(W2
1. Ti(0 " B U ) C I ~ / C H ~ C I ~ / - ~ ~ ~ C
OH 0 / /
- - 30
(109) (de= 92%) Tipranavir (110: 3~i=R,6=R)
Figure 18.40.
production route, the glycolate derived from antio-enriched reagent system. The reaction
(-)-menthol(124) is coupled with thioacetyl proceeds through diastereomeric transition
dimer (125). The chiral auxiliary directs reac- states, resulting in the preferential formation
tion to install the desired (%)-stereochemis- of one enantiomer or diastereomer. Current
try in (126). In situ formation of chloro com- reagents can lack generality and may be diffi-
pound (127) is followed by a stereoselective cult to prepare in both chiral forms. At least
coupling reaction with trimethylsilyl cytosine one equivalent of the chiral component is re-
again directed by the (-)-menthy1 carboxy- quired, which can present economical and
late. Reductive removal of the auxiliary yields practical difficulties. Many examples are pro-
Lamivudine (129) as a single isomer that was
vided by the reduction of double bonds, espe-
found to have favorable toxicological and
cially ketones. Ketone (130) was reduced
pharmacokinetic properties to the racemate.
enantioselectively using either (+) or (-)-b-
6.3 Chiral Reagent
chlorodiisopinocampheylborane (149). Re-
duction with (-)-b-chlorodiisopinocampheyl-
In this approach, asymmetry is induced in a borane generated the alcohol (8)-(131), which
prochiral molecule or functional group by re- was transformed into the (1R,3S)-isochroman
action with a stoichiometric amount of an en- compound, (lR,3S)-(132), through a ste-
OTBDMS
-
- 0
0
Ph Ph
Ph
Ho
HO
9- <OH '4..
H
C--
t--
'I,..
JCOCH~P~
Ph
Ph Ph
(1s)-(115)(16.6 pM vs Dl (114) (78%de)
14.7 pM vs D2)
Figure 18.41.
reospecific cyclization to form the cis stereo- 6.4 Chiral Catalyst

chemistry across the ring. The enantiomer,
(1S,3R)-(132),was prepared in a directly anal- The use of a chiral catalyst represents the
ogous manner by reduction with (+)-b-chloro- ideal method for asymmetric synthesis be-
diisopinocampheylborane. This study repre- cause only small amounts of the chiral media-
sents another example of stereodifferentiation tor are required and no modifications of the
at the dopamine receptors, with nearly a 5000- prochiral substrate are necessary. In many
fold difference in D, potency observed be- systems both enantiomers of the product cgn
tween the two isomers in an in vitro assay. be prepared in a predictable and reproducible
A recent paper by scientists at Bristol-My- manner. The pharmaceutical industry is par-
ers Squibb reports the synthesis of a new class ticularly interested in the capability of new
of calcium-activated potassium channel mod- catalyst systems to operate as reliable manu-
ulators (150). The compounds were investi- facturing processes on a large scale (127,152).
gated for their ability to increase channel Substantial effort continues to be expended by
opening at large conductance (BK or maxi-K) the synthetic organic community with the goal
channels and showed a limited degree of ste- of extending the number of efficient and
reospecificity. The key step in the synthesis broadly applicable catalyst systems capable of
is the direct oxidation of the enolate derived generating high levels of enantiomeric excess
from (133) with either isomer of cam- in a wide range of substrates (127).
phorsulfonyl oxaziridine, a reagent devel- The reduction of ketones by borane cata-
oped by Davis (151). Both enantiomers of lyzed by chiral oxazaborolidines such as
the 3-aryl-3-hydroxyindol-2-ones were pre- (136), derived from the enantiomeric amino
pared with very high enantiopurity (>95% alcohols, has been applied to the synthesis of
eel using opposite enantiomers of the chiral several drug candidates (127). This system is
oxaziridine. (-)-(I341 was found to be a bet- known as the CBS (Corey, Bakshi, Shibata)
ter activator of a cloned BK channel than the reduction (1531, and Corey himself has ap-
(+)-isomer at 20 a, generating a current plied it to the synthesis of pharmaceutical
increase of 141% compared with 124% for compounds (154). A further example is pro-
(+)-(134). vided by the synthesis of MK-499 (1371, a
1
-78"C, 89% MeLi, THF
H
NH2.HCI
i-'
2. HCI, 23%
F
<
1. H2, Pt02.H20
OMe
(119) (IC50= 1.11 1M) (118) (de > 99%)
NPS 1407 (123) (ICSo= 0.089pM) (122)
Figure 18.42.
potassium channel blocker that was devel- itor, was in clinical development for the treat-
oped for the treatment of cardiac arrhyth- ment of hypertension and congestive heart fail-
mia by Merck (155). ure, and its enantiomer does not possess the
Asymmetric hydrogenations with transition same biological activity. Several catalysts and
metal catalysts have been applied to single en- conditions were screened before arriving at op-
antiomer synthesis in the pharmaceutical in- timized conditions using cationic rhodium-
dustry with considerable success. ChiroTech (R,R)-MeDuPHOS (141), which provided the
and Pfizer developed an improved synthesis of product with complete enantioselectivity and
glutarate derivative (1391, an intermediate re- avoided previously observed problems associ-
quired for the synthesis of Candoxatril (140) ated with isomerization of the enone starting
(156).The drug, a neutral endopeptidase inhib- material. The reaction could be conducted at a
Lamivudine (129)
Figure 18.43.
high substrate-to-catalyst ratio of 3500:l with- bretastatin A-4 displays antitubulin activity
out a detrimental effect on enantiomeric excess and cytotoxicity to tumor cells and is therefore
or reaction rate. In catalytic asymmetric reac- an interesting lead structure for new antican-
tions, it is clearly economically advantageous to cer drugs. The asymmetric synthesis of (S,S)-
minimize the amount of catalyst that may com- combretadioxolane (144) involved treatment
prise expensive chiral material and transition of the trans-stilbene (142) with "AD-mix-a"
metals. [containing (DHQ),-PHAL] (145), whereas
A method for the asymmetric dihydroltyla- the enantiomer (R,R)-combretadioxolane re-
tion of alkenes to yield cis-diols was developed sulted from use of AD-mix-& which contains
by the research group of Sharpless using (DHQD),-PHAL as the chiral ligand. The tu-
chiral ligands derived from the cinchona alka- bulin polymerization-inhibitory activity of
loids dihydroquinidine (DHQD) and dihydro- (S,S)-wmbretadioxolane was comparable with
quinine (DHQ) with a catalytic amount of os- combretastatin A-4 (IC,, = 4- 6 CL2M) in an in
mium tetroxide (157). Although they are vitro assay, whereas (R,R)-combretadioxolane
diastereomers, the phthalazine ligands act as was essentially inactive (IC,, > 50 CLM). In ad-
"pseudo-enantiomeric" ligands, i.e., they give dition, (23,s)-combretadioxolane was 20 times
opposite asymmetric induction in a predict- more potent than vincristine as an in vitro
able manner. This procedure was recently growth inhibitor of the multidrug-resistant cell
used to prepare both isomers of combretadi- line PC-12.
oxolane (144), a chiral analog of the natural Workers at SmithKline Beecham reported
product Combretastatin A-4 (146) (158). Com- the stereoselective synthesis of inhibitors of
Figure 18.44.
the cysteine protease cathepsin K (159). A pro-

cedure was sought to allow preparation of ei-
ther enantiomer of azido alcohol (148). This
was readily achieved by Jacobsen asymmetric
desymmetrization of the meso-epoxide (147)
using azidotrimethylsilane catalyzed by chro-
mium salen complex (149) (160). Use of the
(R,R)-salen catalyst shown generated (3S,
4R)-(1481, whereas the (S,S)-catalyst pro-
vided the (3R,4S)-azidosilyl alcohol, both with
very high enantioselectivity. After removal of
the silyl group and reduction of the azido moi-
ety, the resultant enantiomeric amino alco-
hols were transformed into diastereomers
(4s)-and ( a ) - ( 1 5 0 )by reaction with leucine,
amide formation, and oxidation. The cathep-
sin K activity for the diastereomers showed
the (4s)-isomers to be up to 40-fold more po-
tent than the corresponding (a)-(150)in an
enzyme assay.
A large scale synthesis of the drug Nelfi-
navir, an HIV protease inhibitor developed
by Agouron (now Pfizer) was reported with
the amino alcohol derived from (1481, pre-
pared using the Jacobsen procedure described
above (161).
A similar approach uses the chromium-
Salen complex (149) to open racemic terminal
Figure 18.45. epoxides in a highly efficient resolution pro-
3.136 (10mol%)
2. MeOH
Ph
Figure 18.46. MK 499 (137) (98%de; 92% yield)
cess that has been applied to the synthesis of to prepare in racemic form, and conversely,
biologically active compounds (162). As with difficult to prepare as single enantiomers by
any such resolution process, the maximum epoxidation of the corresponding alkene. (R)-
yield of enantiopure material is 50%based on 9-[2-(phosphonomethoxy)propyl]adenine (R-
starting material. Terminal epoxides are easy PMPA) is a nucleotide reverse transcriptase
"""P t-BuO
0
(138)
C02Na
[((RR)-Me-DuPHOS)Rh(COD)]BF4
Hz (5 atm)/MeOH
(COD = 1,5-cyclooctadiene)
Meow*
t-BuO
0
C02Na
(139) (>99%ee, 95% yield)
Me..,,,
mO\
0
C02H
Me
(R,R)-Me-DuPHOS (141) Candoxatril (140)
Figure 18.47.
6 Asymmetric Synthesis 819
OMe
OMe
(S,S)-(143)(s99%ee;89% yield)
OMe
Cornbretastatin A-4 (146)
Figure 18.48.
inhibitor being developed by Gilead Sciences The process has been used by academic and
and a collaborative group from the University industrial groups and is operated by Rhodia
of Washington for the treatment and pre- ChiRex on a large scale (165).
vention of HIV infection (163). The com- A wide variety of synthetic processes have
pound can be prepared through kinetic res- been rendered asymmetric through the use of
olution of propylene oxide using (S,S)-(149) a chiral catalyst. In addition to the types of
and the resultant (R)-1-amino-2-propanol reaction described above, chiral transition
(153)was transformed into (R)-PMPA (154) metal catalysts have been used to influence
in five steps (162). the stereochemical course of isomerization,
In 1997, Tokunaga et al. reported the hy- cyclization, and coupling reactions. As an ex-
drolytic kinetic resolution of racemic termi- ample, an approach towards the natural prod-
nal epoxides using a Co(II1)-Salen catalyst uct (-)-epibatidine (158) was recently re-
(164). This remarkably general process uses ported by Namyslo and Kaufmann (166).
only water as the nucleophile and provides Epibatidine is a potent analgesic and a nico-
the synthetically useful chiral epoxides and tinic receptor agonist. The synthesis involves
diols in highly enantioenriched form. The an asymmetric Heck-type hydroarylation be-
catalyst can be recycled and the reactions tween the bicyclic alkene (155) and pyridyl
conducted under solvent-free conditions. iodide (156). A number of bidentate chiral li-
(147) H ++H (3S,4R)-(148)

(98%ee;93% yield)
0 0
t-Bu t-Bu
Figure 18.49.
gands were investigated with BINAP (1591, 7 CONCLUSIONS

which were observed to give the highest enan-
tioselectivity. By using the (R)-or (8)-BINAP The ultimate focus of the endeavors of medic-
ligand, both enantiomers of (157) were acces- inal chemists is to develop a successful drug
sible with about the same level of enantio- that will cure patients. However, with the in-
selection. creased regulatory requirements within the
The continuing development of efficient competitive biotechnology and pharmaceuti-
and practical asymmetric processes will be one cal industry, the initial research to achieve
of the major driving forces in the future of this objective must be conducted in a rapid and
drug discovery and development. In particu- thorough manner. During the drug research
lar, the design of new general and practical and development process, the important and
catalytic processes will help explore the link subtle relationship between chirality and bio-
between chirality and biological activity. logical activity should be carefully considered.
TMSN3 (0.5 equiv.)

. 2 Steps
. O y ' NH2
Me
Me Me
(152) (97%ee; (153) (84%yield)
Figure 18.50.
References
(157) (81%ee;53% yield)
Epibatidine (158)
Figure 18.51.
Enantiomers frequently display markedly methods available to prepare or isolate either

different biological activity; however, the fact isomer. From the examples given in this chap-
that a large and adaptable toolbox of chemical ter, stereoisomers frequently display mark-
and biological techniques to obtain single iso- edly different biological properties where the
mers are available allows the medicinal chem- desirable properties associated with one iso-
ist to avoid working with mixtures of stereo- mer may not be apparent when the corre-
isomers. sponding racemic mixture is tested either in
As reviewed in this chapter, there are nu- vivo or in vitro.
merous synthetic strategies available to the
medicinal chemist that offer their own partic- REFERENCES
ular drawbacks and advantages. In the early 1. A. Michaels and J. Fuller, Financial Times
stages of research it may be preferable to sep- (Lond.),23,21(2001).
arate isomers by chromatography, thus pro- 2. R. B. Raffa, E. Friderichs, W. Reimann, R. P.
viding both single enantiomers for biological Shank, E. E. Codd, J. L. Vaught, H. I. Jacoby,
- It should be noted that all the tech-
testing. and N. Selve, J. Pharmacol. Exp. Ther., 267,
niques described in this chapter can be used in 331-340(1993).
conjunction with one another. That is to say, if 3. M. B. Smith and J. March, March's Advanced
one technique such as asymmetric synthesis Organic Chemistry, 5th ed., Wiley, New York,
failed to deliver enantiopure material, then 2001,pp. 125-217;E.L. Eliel, S. H. Wilen, and
another technique such as crystallization can L. N. Mander, Stereochemistry of Organic
be used to push through the product to the Compounds, Wiley, New York, 1994; E. L.
Eliel, S. H. Wilen, and M. P. Doyle, Basic Or-
desired purity. As an example of this "double" ganic Stereochemistry, Wiley, New York, 2001.
approach, the use of SMB and crystallization
4. T. D. Stephens, Chem. Br., 37,38(2001).
in the separation of mandelic acid is worthy of
5. J. J. Baldwin and W. B. Abrams in I. W. Wainer
note (56). The use of asymmetric hydrogena- and D. E. Drayer, Eds., Stereochemically Pure
tion followed by asymmetric enzymic transfor- Drugs: An Industrial Perspective, Marcel Dek-
mation to obtain single isomer products has ker, New York, 1988,p. 311.
also been described by Taylor et al. at Chiro- 6. G. C. Cotzias, P. S. Papavasiliou, R. Gellene,
Tech (167). N. Engl. J. Med., 280,337(1969).
In conclusion, if a chiral center is present in 7. H.K.Kroemer, J. Turgeon, R. A. Parker, and
a molecule designed and synthesized by a me- D. M. Roden, Clin. Pharmacol. Ther., 46,584
dicinal chemist, there are a broad number of (1989).
8. R. A. O'Reily, Clin. Pharmacol. Ther., 16, 348 27. J. T . F. Keurentjes and F. J. M. Voermans in
(1974). A. N. Collins, G. N. Sheldrake, and J. Crosby,
9. A. Breckenbridge, M. Orme, H. Wesseling, R. J. Eds., Chirality and industry. 11. Developments
Lewis, and R. Gibbons, Clin. Pharmacol. in the Manufacture of Optically Active Com-
Ther., 15,424 (1974). pounds, chap. 8,Wiley, NewYork, 1997, p. 157.
10. T . Walle, J. G. Webb, E. E. Bagwell, U . K. 28. S. C. Stinson, Chem. Eng. News, 73,44 (1995).
Walle, H. B. Daniell, and T . E. Gaffney, Bio- 29. E. Francotte, J. Chromatogr. A, 666, 565
chem. Pharamacol., 37, 115 (1988). (1994).
11. W . Lindner, M. Rath, K. Stoschitzky, and H. J. 30. CHIRBASE, available online at http://
Semmelrock, Chirality, 1, 10 (1989). chirbase.u-3mrs.fr, accessed on July 29,2002.
12. D. E. Drayer in I. W . Wainer and D. E. Drayer, 31. Daicel Chemical Industries, Ltd., available on-
Eds., Drug Sterochemistry-Analytical Methods line at http://www.daicel.co.jp/chiral,accessed
and Pharmacology, Marcel Dekker, New York, on July 29, 2002. NOVASEP, available online
1988, p. 209. at http://www.novasep.com, accessed on July
29, 2002. Astec, available online at http:ll
13. K. J. Fehske, W . E. Muller, and U . Wollert,
www.astecusa.com, accessed on July 29, 2002.
Biochem. Pharmacol., 30,687 (1981).
32. D. Boyd, M. O'Keeffe,and M. R. Smyth, Ana-
14. R. H. McMenamy and J. L. Oncley, J. Biol.
lyst, 119, 1467 (1994).
Chem., 233,1436 (1958).
33. D. A von Deutsch, I. K. Abukhalaf, L. E.
15. W . E. Muller in I. W . Wainer and D. E. Drayer, Wineski, H. Y . Aboul-Enein, S. A. Pitts, B. A.
Eds., Drug Sterochemistry-Analytical Methods
Parks, R. A. Oster, D. F. Paulsen, and D. E.
and Pharmacology, Marcel Dekker, New York,
Potter, Chirality, 12,637 (2000).
1988, p. 227.
34. B. Waldeck, E. Widmark, Acta Pharmacol.
16. S. Toon, L. K. Low, M. Gibaldi, W . F. Trager, Toxicol., 56,221-227 (1985).
R. A. O'Reily, C. H. Motley, and D. A. Goulart,
35. D. J. Triggle, D. A. Langs, and R. A. Janis, Med.
Clin. Pharmacol. Ther., 39, 15 (1986).
Res. Rev., 9 , 123 (1989);V . C. 0. Njar and
17. M. A. Campanero, B. Calahorra, M. Valle, I. F. A. M. H. Brodie, Drugs, 58,233 (1999).
Troconiz, and J. Honorato, Chirality, 11, 272
36. S. Visentin, P. Amiel, A. Gasco, B. Bonnet, C.
(1999).
Suteu, and C. Roussel, Chirality, 11, 602
18. R. Stevenson, Chem. Br., 37,24 (2001). (1999).
19. N. M. Maier, P. Franco, and W . Lindner, 37. P. Tullio, A. Ceccato, J-F. Liegeois, B. Pirotte,
J. Chromatogr. A, 906,3 (2001). A. Felikidis, M. Stachow, P. Hubert, J. Crom-
20. L. Miller, C. Orihuela, R. Fronek, D. Honda, men, J. Geczy, and J. Delarge, Chirality, 11,
and 0. Dapremont, J. Chromatogr. A, 849,309 261 (1999).
(1999). 38. J. Bruhwyler, J. F. Liegeois, J. Gerardy, J.
21. V . M. Meyer, Chirality, 7, 567 (1995);0. P. Damas, E. Chleide, C. Lejeuns, E. Decamp, P.
Kleidernigg, M. Lammerhofer, and W . Lind- De Tullio, J. Delarge, A. Dresse, and J. Geczy,
ner, Enantiomer, 1,387 (1996). Behav. Pharmacol., 9 , 731 (1998).
22. V . Schurig, J. Chromatogr.441,135 (1988);K. 39. T . Shibata, I. Okamoto, and K. Ishii, J. Liq.
Watabe, S. C. Chang, E. Gil-Av, and B. Kop- Chromatogr., 9,313 (1986);E. Yashima andY.
penhofer, Synthesis, 3,225 (1987). Okamoto, Bull. Chem. Soc. Jpn., 68, 3289
23. E. R. Francotte in S. Ahuja, Ed., Chiral Sepa- (1995).
rations, Applications and Technology, chap. 40. W . H. Pirkle, D. W . House, and J. H. Finn,
10, American Chemical Society, Washington J. Chromatogr., 192, 143 (1980).
DC, 1997, p. 271. 41. J. N. Kinkel in A. Subramanian, Ed., A Practi-
24. K. D. Altria, N. W . Smith, and C. H. Turnbull, cal Approach to Chiral Separations by Liquid
Chromatographia, 46,664 (1997);K. D. Altria, Chromatography, VHC, New York, 1994.
M. A. Kelly, and B. J. Clark, Trends Anal. 42. S. G. Allenmark, S. Andersson, P. Moller, and
Chem., 17,214 (1998). D. Sanchez, Chirality, 7,248 (1995).
25. K. L. Williams, L. C. Sander, and S. A. Wise, 43. M. Meurer, U . Altenhoner, J. Straube, and H.
J. Pharm. Biomed. Anal., 15, 1789 (1997);N. Schmidt-Traub, J. Chromatogr. A, 769, 71-79
Bargrnann-Leyder, A. Tambute, and M. (1997).
Claude, Chirality, 7 , 3 1 1 (1995). 44. M. Schulte, R. Ditz, R. M. Devant, J. N. Kinkel,
26. M. Juza, M. Mazzotti, and M. Morbidelli, and F. Charton, J. Chromatogr. A, 769, 93
Trends Biotechnol., 18, 108 (2000). (1997).
References
45. D. Y. Wang, F. Hanotte, C De Vos, and P. Clem- 63. S. Faulconbridge, H . S. Zavareh, G. R. Evans,
ent, Eur. J . Allerg. a n d Clin. Immunol., 56, and M. Langston, inventors, Medeva Europe
339 (2001). Ltd. (GB), assignee, World patent W0981
46. E. J. Corey and C. J. Helal, Tetrahedron Lett., 25902, June 18, 1998.
37,4837 (1996). 64. A. S . C. Chan, Chemtech., 3 , 4 6 (1993).
47. D. A. Pflum, H. Scot Wilkinson, G. J. Tanoury, 65. P . J. Harrington and E. Loderwijk, Org. Pro-
D. W. Kessler, H. B. Kraus, C. H. Senanayake, cess Res. Dev., 1, 72 (1997).
and S. A. Wald, Org. Process Res. Dev., 5 , 110 66. W. J. Pope and S. J. Peachey, J . Chem. Soc., 75,
(2001). 1066 (1899).
48. E. R. Francotte and P. Richert, J . Chromatogr. 67. R. Gristwood, H. Bardsley, H. Baker, and J.
A, 769, 101 (1997);M. Negawa and F. Shoji, Dickens, J. Exp. Opin. Invest. Drugs, 3 , 1209
J . Chromatogr., 590, 113 (1992). (1994).
49. G. Ganetsos, P. E. Barker, J. A. Johnson, R. G. 68. M. Langston, U. C. Dyer, G. A. C. Frampton, G.
Kabza, K. Hashimoto, S. Adachi, Y. Shirai, M. Hutton, C. J. Lock, B. M. Skead, M. Woods, and
Morishita, B. Balannec, G. Hotier, and H. Ma- H. Zavareh, Org. Process Res. Dev., 4, 530
kai in G. Ganetsos and P. E. Barker, Eds., Pre- (2000).
parative and Production Scale Chromatogra- 69. B. A. Astleford and L. 0. Weigel in A. N. Col-
phy, chaps. 11-15, Marcel Dekker, New York, lins, G. N. Sheldrake, and J. Crosby, Eds.,
1993, p. 233; D. B. Broughton and C. G. Ger- Chiralty in Industry I, chap. 6, Wiley, New
hold, inventors, Universal Oil Prod. Co., as- York, 1997, p. 99.
signee, US patent 2,985,589, May 23, 1961. 70. K. Flick and E. Frankus, inventors, Gru-
50. D. M. Ruthven, Principles of Adsorption and enenthal Chemie, assignee, US patent
Adsorption Processes, chap. 12, Wiley, New 3,652,589, March 28, 1972.
York, 1984, p. 380. 71. G. R. Evans, inventor, Darwin Discovery Ltd.,
51. E. R. Francotte, Chim. Nouvelle, 53, 1541 (US) assignee, World Patent W000132554,
(1996). June 8,2000; G. R. Evans, J. A. Henshilwood,
52. D. W. Guest, J . Chromatogr. A, 760, 159 and J. O'Rourke, Tetrahedron Asymmetry, 12,
(1997). 1663 (2001).
53. M. Schulte and J. Strube, J . Chromatgr. A, 72. L. G. Humber, Med. Res. Rev., 7 , l (1987).
906, 399 (2001). 73. L. G. Humber, J . Med. Chem., 29,871 (1986).
54. K. E. Goeringer, B. K. Logan, and G. D. Chris- 74. M. Woods, U . C. Dyer, J. F. Andrews, C. N. '
tian, J . Anal. Toxicol. 21, 529 (1997). Morfitt, R. Valentine, and J. Sanderson, Org.
55. E. Cavoy, M.-F. Deltent, S . Lehoucq, and D. Process Res. Dev., 4,418 (2000).
Miggiano, J . Chromatogr. A, 769,49 (1997). 75. D. Askin, K . K. Eng, R. M. Purick, K. M. Wells,
56. H. Lorenz, P. Sheehan, and A. Seidel-Morgen- R. P. Volante, and P. J . Reider, Tetrahedron
stern, J . Chromatogr. A, 908,201 (2001). Lett., 35,673 (1994).
57. J. Jacques, A. Collet, and S. H. Wilen, Enanti- 76. M. Kottenhahn, K . Stingl, and K. Drauz, in-
omers, Racemates a n d Resolutions, Krieger: ventors, Degussa (DE), assignee, US patent
Malabar, Florida, 1994. 6,093,823, July 25,2000.
58. A. Collet, M . J. Brienne, and J. Jacques, Chem. 77. M. Eichelbaum, Federation. Proc., 43, 2298
Rev., 80,215 (1980). (1984);M. Eichelbaum, Biochem. Pharmacol.,
59. S. H. Wilen in E. L. Eliel, Ed., Tables ofResolv- 3 7 , 9 3 (1988).
ing Agents and Optical Resolutions, Univer- 78. H. Echizen, T. Brecht, S. Neidergesass, B.
sity of Notre Dame Press, Notre Dame, IN, Volgelgesang, and M. Eichelbaum, Am.
1972; P. Newman, Optical Resolution Proce- Heart J.109,210 (1985).
dures for Chemical Compounds, vol. 1-3, Op- 79. 0. Ehrmann, H. Nagel, and W. Karau, inven-
tical Resolution Center, Manhattan College, tors, Knoll Aq (DE), assignee, US patent
New York, 1978-1984. 5,457,224, October 10, 1995 and World patent
60. M. J. Cannarsa, Chimica Oggi, 17,28 (1999). WO94/08950,April 14,1994.
61. M. Prashad, D. Har, 0. Repic, T. J. Blacklock, 80. E. J. Trieber, M. Raschack, and F. Dengel, in-
and P. Giannousis, Tetrahedron Asymmetry, ventors, Knoll Aq (DE), assignee, German
10,3111 (1999). patent 2059923,1972.
62. R. A. Maxwell, E. Chaplin, S. B. Eckhardt, J. R. 81. R. M. Bannister, M. H. Brookes, G. R. Evans,
Soares, and G. Hite, J . Pharmacol. Exp. Ther, R. B. Katz, and N. D. Tyrrell, Org. Process Res.
173,158 (1970). Dev., 4,467 (2000).
82. P. E. Graves, H. A. Salhanick, Endocrinology, B. Kuenburg, L. Czollner, J. Frohlich, and U.

105, 52 (1979). Jordis, Org. Process Res. Dev., 3,425 (1999).
83. See ref. 57, pp. 299-301,382-383. G. L. Plosker and K. L. Goa, Drugs, 42, 805
84. N. Finch, R. Dziemian, J. Cohen, and B. G. (1991).
Steinetz, Experientia, 31, 1002 (1975). H. Harada, T. Morie, Y. Hirokawa, and S.
85. M. J. Bunegar, U. C. Dyer, G. R. Evans, R. P. Kato, Tetrahedron Asymmetry, 8,2367 (1997).
Hewitt, S. W. Jones, N. Henderson, C. J. Rich- R. M. Williams in J. E. Baldwin and P. D. Mag-
ard, S. Sivaprasad, B. M. Skead, M. A. Stark, nus, Eds., Synthesis of Optically Active '61-
and E. Teale, Org. Process Res. Dev., 3, 442 Amino Acids, vol. 7, Pergamon Press, Oxford,
(1999). UK 1989; R. 0. Duthaler, Tetrahedron, 50,
86. See ref. 57, pp. 374-375. 1539 (1994).
87. K. P. Datla and G. Curzon, Neuropharmacol- T. Shiraiwa, H. Miyazaki, T. Watanabe, and H.
ogy, 35, 315 (1996); S. Salvadori, C. Bianchi, Kurokawa, Chirality, 9,48 (1997).
L. H. Lazurus, V. Scaranari, M. Attila, and R. T. Shiraiwa, H. Miyazaki, A. Ohta, K. Mo-
Tomatis, J. Med. Chem., 35,4651 (1992). tonaka, E. Kobayashi, M. Kubo, and H. Kuro-
88. C. A. Maryanoff, L. Scott, R. D. Shah, and F. J. kawa, Chirality, 9, 656 (1997).
Villani Jr., Tetrahedron Asymmetry, 9, 3247 K. Abe, H. Inoue, and T. Nagao, Yakugaku
(1998). Zasshi, 108, 716 (1988).
89. A. Bruggink in A. N. Collins, G. N. Sheldrake, S. Yamada, K. Morimatsu, R. Yoshioka, Y.
and J. Crosby, Eds., Chirality in Industry 11, Ozaki, and H. Seko, Tetrahedron Asymmetry,
chap. 5, Wiley, New York, 1997, p. 81. 9, 1713 (1998).
90. T. Vries, H. Wynberg, E. van Echten, J. Koek, E. K. Rowinsky, L. A. Cazenav, and R. C. Done-
W. ten Hoeve, R. M. Kellogg, Q. B. Broxter- hower, J. Natl. Cancer Inst., 82,1247 (1990).
man, A. Minnaard, B. Kaptien, S. van der J. N. Denis, A. E. Green, D. Guenard, F.
Sluis, L. Hulshof, and J. Kooistra, Angew. Gueritte-Voegelein, L. Mangatal, and P. Po-
Chem. Int. Ed., 37,2349 (1998). tier, J. Am. Chem. Soc., 110, 5917 (1988).
91. E. Ebbers, G. J . A. Arianns, B. Zwanenburg, J. Kearns and M. M. Kayser, Tetrahedron Lett.,
and A. Bruggink, Tetrahedron Asymmetry, 9, 35, 2845 (1994).
2745 (1998);U. C. Dyer, D. A. Henderson, and
R. P. Srivastava, J. K. Zjawiony, J. R. Peterson,
M. B. Mitchell, Org. Process Res. Dev., 3, 161
and J. D. McChesney, Tetrahedron Asymme-
(1999).
try, 5, 1683 (1994).
92. S. H. Wilen, A. Collet, and J. Jacques, Tetrahe-
R. N. Patel, Adv. Appl. Microbiol., 47, 33
dron, 33,2725 (1977).
(2000); V. Gotor, Biocat and Biotrans 18, 87
93. Z. J. Li, M. T. Zell, E. J. Munson, and D. J . W. (2000);W. A. Loughlin, Bioresource Technol-
Grant, J. Pharm. Sci., 88,337 (1999). ogy, 74, 49 (2000); S. M. Roberts, J. Chem.
94. R. Tamura, T. Ushio, H. Takahashi, K. Naka- Soc., Perkin Trans., 1, 1 (1999).
mura, N. Azuma, F. Toda, and K. Endo, C.-H. Wong and G. M. Whitesides, Enzymes in
Chirality, 9, 220 (1997). Synthetic Organic Chemistry, Pergamon, New
95. H. M. L. Davies, T. Hansen, D. W. Hopper, and York, 1994.
S. A. Panaro, J. Am. Chem. Soc., 121, 6509 A. D. Baxter, J . B. Bird, R. Bannister, R. Bho-
(1999). gal, D. T. Manallack, R. W. Watson, D. A.
96. J . M. Axten, R. Ivy, L. Krim, and J. D. Winkler, Owen, J. Montana, J. Henshilwood, and R. C.
J. Am. Chem. Soc., 121,6511 (1999). Jackson in N. Clendeninn and K. Appelt, Eds.,
97. A. Robinson, H. Y. Li, and J. Feaster, Tetrahe- Matrix Metalloproteinase Inhibitors in Cancer
dron Lett., 37,8321 (1996). Therapy, Humana Press, Totawa, NJ, 2000,
pp. 193-221.
98. W. Goppel, W. Betram, Psychiatr. Neurol. Med.
Psychol., 23, 712 (1971). 117. N. L. St. Clair and N. L. Nathrough, J. Am.
Chem. Soc., 114,7314 (1992);N. Khalaf, C. P.
99. H. A. M. Mucke, Drugs Future, 33,259 (1997). Govardan, J. J. Lalonde, R. A. Persichetti, Y. F.
100. D. H. R. Barton and G. W. Kirby, J. Chem. Soc., Wang, and A. L. Margolin, J. Am. Chem. Soc.,
806 (1962). 118, 5495 (1996).
101. W. Shieh and J . A. Carlson, J. Org. Chem., 59, 118. A. M. Evans, J. Clin. Pharmacol.36 (Suppl 12),
5463 (1994). 7s (1996).
References
119. S.-W. Tsai and H . J . Wei, Enzyme Microb. 132. M. R. Attwood, C. H. Hassall, A. Krohn, G.
Technol., 16,328(1994). Lawton, and S. Redshaw, J. Chem. Soc., 1,
120. J.-Y. Xin, S.-B. Li, Y . Xy, J.-R. Chui; and C.-G. 1011-1019(1986).
Xi, J. Chem. Tech. Biotechnol., 76,579(2001). 133. E. J. Stoner, A. J. Cooper, D. A. Dickman, L.
121. M. E. Swarbrick, F. Gosselin, and W . D. Lubell, Kolaczkowski, J. E. Lallaman, J.-H. Liu, P. A.
J. Org. Chem., 64,1993-2002(1999). Oliver-Shaffer,K. M . Patel, J . B. Paterson Jr.,
122. W . L. Nelson and T . R. Burke, J. Org. Chem., D. J. Plata, D. A. Riley, H. L. Sham, P. J. Sten-
43,3641(1978). gel, and J.-H. J. Tien, Org. Process Res. Dev., 4,
123. R. M. Patel, Stereosel. Biocatal., Marcel Dek- 264-269(2000).
ker, Inc., New York, 2000, pp. 87-130.
124. E. P. Siqueira Fihlo, J. A. R. Rodrigues, and 134. J. T . Kovalainen, J. A. M. Christiaans, S. Koti-
P. J . S. Moran, Tetrahedron Asymmetry, 12, saari, J . T . Laitinen, P. T . Mannisto, L.
847(2001). Tuomisto, and J. Gynther, J. Med. Chem., 42,
125. P. A. Procopiou, G. E. Morton, M. Todd, and G. 1193-1202(1999).
Webb, Tetrahedron Asymmetry, 12, 2005 135. G. M. Coppola and H. F . Schuster, Chiral a
(2001). -Hydroxy Acids in Enantioselective Syntheses,
126. T . P. Jerusi, inventor, Sepracor Inc. (US),as- VCH,Weinheim, Germany, 1997.
signee, World patent W099113867,March 25, 136. G.Wang and R. Hollinsworth, J. Org. Chem.,
1999. 64,1036-1038(1999).
127. J. D. Morrison, Ed., Asymmetric Synthesis, Ac-
ademic Press, San Diego, CA, 1983;G. Procter, 137. G. G. W u , Org. Process Res. Dev., 4, 298-300
Asymmetric Synthesis, Oxford University (2000); G. W u , Y . S. Wong, X . Chen, and Z.
Press, Oxford, UK, 1996; M. N6gradi, Stereo- Ding, J. Org. Chem., 64, 3714-3718(1999).
selective Synthesis, 2nd ed., VCH, Weinheim, 138. D.S. Garvey, J. T .Wasicak, M .W . Decker, J . D.
Germany, 1995; D. J. Ager and M. L. East, Brioni, M. J. Buckley, J. P. Sullivan, G. M. Car-
Eds., Asymmetric Synthetic Methodology, CRC rera, M. W . Holladay, S. P. Arneric, and M.
Press, Boca Raton, FL, 1995; D. J. Ager, Ed., Williams, J. Med. Chem., 37, 1055-1059
Handbook of Chiral Chemicals, Marcel Dek- (1994).
ker, New York, 1999; P. O'Brien, J. Chem.
Soc., 1, 95-113 (2001); H. Tye and P. J. 139. N.-H. Ling, Y . He, and H. Kopecka, Tetrahe-
Comino, J. Chem. Soc., 1, 1729-1747 (2001); dron Lett. 36,2563-2566(1995).
P. I. Dalko and L. Moisan, Chem. Int. Ed., 40, 140. D. J. Ager, I. Prakash, and D. R. Schaad, Chem.
3726-3748 (2001); K. C. Nicolaou and E. J. Rev., 96,835-875(1996).
Sorensen, Classics in Total Synthesis, VCH, 141. D. A. Evans, J. M . Takacs, L. R. McGee, D. J.
Basel, Switzerland, 1996. Mathre, and J. Bartroli, Pure Appl. Chem., 53,
128. S. Redshaw in C. R. Ganellin and S. M. Rob- 1109 (1981); D. A. Evans, M. D. Ennis, and
erts, Eds., Medicinal Chemistry, 2nd ed., Aca- D. J. Mathre, J. Am. Chem. Soc., 104, 1737
demic Press, San Diego, 1993, pp. 163-186. (1982);D.A. Evans, Aldrichimica Acta, 15,23
129. A. A. Patchett, E. Harris, E. W . Tristram, M. J. (1982).
Wyvratt, M. T . W u , D. Taub, E. R. Peterson, 142. D. R.Schaad i n ref. 127, pp. 287-300.
T . J. Ikeler, J. ten Broeke, L. G. Payne, D. L.
Ondeyka, E. D. Thorsett, W . J. Greenlee, N. S. 143. S. J. Wittenberger and M. A. McLaughlin, Tet-
Lohr, R. D. Hoffsommer, H. Joshua, W. V . rahedron Lett., 40,7175-7178(1999).
Ruyle, J. W . Rothrock, S. D. Aster, A. L. May- 144. S. R.Turner, J. W . Strohbach, R. A. Tommasi,
cock, F. M. Robinson, R. Hirschmann, C. S. P. A. Aristoff, P. D. Johnson, H . I. Skulnick,
Sweet, E. H. Ulm, D. M. Gross, T . C. Vassil, L. A. Dolak, E. P. Seest, P. K. Tomich, M. J.
and C. A. Stone, Nature, 288,280(1980). Bohanon, M.-M. Horng, J. C. Lynn, K. T .
130. T . J. Blacklock, R. F. Shuman, J. W . Butcher, Chong, R. R. Hinshaw, K. D. Watenpaugh,
W . E. Shearin, J. Budavari, and V . J. J. M. N. Janakirarnan, and S. Thaisrivongs,
Grenda, J. Org. Chem., 53,836(1988). J. Med. Chem., 41,3467-3476(1998).
131. T . Ohashi and J. Hasegawa in A. N. Collins, 145. T . M. Judge, G. Phillips, J . K. Morris, K. D.
G. N. Sheldrake, and J. Crosby, Eds., Chirality Lovasz, K. R. Romines, G. P. Luke, J. Tulin-
i n Industry II: Developments in the Commer- sky, J. M. Tustin, R. A. Chrusciel, L. A. Dolak,
cial Manufacture andApp1ication.s of Optically S. A. Mizsak, W . Watt, J. Morris, S. L. Vander
Active Compounds, Wiley, Chichester, UK, Velde, J. W. Strohbach, and R. B. Gammill,
1997, p. 269. J. Am. Chem. Soc., 119,3627-3628(1997).
146. N. Cabedo, I. Andreu, M. C. Ramirez de Arel- 156. M. J. Burk, F. Bienewald, S. Challenger, A.

lano, A. Chagraoui, A. Serrano, A. Bermejo, P. Derrick, and J. A. Rarnsden, J.Org. Chem.,64,
Protais, and D. Cortes, J. Med. Chem., 44, 3290-3298(1999).
1794-1801 (2001). 157. H. C. Kolb, M. S. VanNieuwenhze, and K. B.
147. S. T. Moe, D. L. Smith, E. G. DelMar, S. M. Sharpless, Chem. Rev., 94,2483(1994).
Shimizu, B. C. Van Wagenen, M. F. Balandrin, 158. R. Shirai, H. Takayama, A. Nishikawa, Y.
Y. Chien, J. L. Raszkiewicz, L. D. Artman, Koiso, and Y. Hashimoto, Bioorg. Med. Chem.
H. S. White, and A. L. Mueller, Bioorg. Med. Lett., 8,1997-2000 (1998).
Chem. Lett., 10,2411-2415(2000). 159. A. E. Fenwick, A. D. Gribble, R. J. Ife, N.
148. M. D. Goodyear, P. 0.Dwyer, M. L. Hill, A. J. Stevens, and J. Witherington, Biorg. Med.
Whitehead, R. Hornby, and P. Hallett, inven- Chem. Lett., 11,199-202 (2001).
tors, Glaxo Group Ltd. (GB), assignee, World 160. L. E. Martinez, J. L. Leighton, D. H.Carsten,
patent WO-09529174,April 18,1995. and E. N. Jacobsen, J. Am. Chem. Soc., 117,
149. M. P. DeNinno, R. Schoenleber, R. J. Perner, 5897 (1995);S. E. Schaus, J. F. Larrow, and
L. Lijewski, K. E. Asin, D. R. Britton, R. Mac- E. N. Jacobsen, J. Org. Chem.,62,4197-4199
Kenzie, and J. W. Kebabian, J. Med. Chem., (1997).
34,2561-2569 (1991). 161. S. E. Zook, J. K. Busse, and B. C. Borer, Tetra-
150. P. Hewawasam, N. A. Meanwell, V. K. hedron Lett., 41,7017-7021 (2000).
Gribkoff, S. I. Dworetzky, and C. G. Boissard, 162. J. F. Larrow, S. E. Schaus, and E. N. Jacobsen,
Bioorg. Med. Chem. Lett., 7,1255-1260(1997). J. Am. Chem. Soc., 118, 7420-7421 (1996);
151. F. A. Davis and B.X. Chen, Chem. Rev., 92, S. E. Schaus and E. N. Jacobsen, Tetrahedron
919(1992). Lett., 37,7937-7940 (1996).
152. M. K. O'Brien and B. Vanase, Curr. Opin. 163. C.-C. Tsai, K. E. Follis, A. Sabo, T. W. Beck,
Drug Discovery Dev., 3,793-806(2000);R.A. R. F. Grant, N. Bischofberger, R. E. Ben-
Sheldon, Ed., Chirotechnology: Industrial veniste, and R. Black, Science,270,1197-1199
Synthesis ofOptically Active Compounds, Mar- (1995).
cel Dekker, New York, 1993.
164. M. Tokunaga, J. F. Larrow, F. Kakiuchi, and
153. E. J. Corey, R. K. Bakshi, and S. Shibita, J.Am. E. N. Jacobsen, Science, 277,936 (1997).
Chem. Soc.109,5551(1987);E. J. Corey, R. K.
Bakshi, S. Shibita, C.-P. Chen, andV. K. Singh, 165. J. M. Keith, J. F. Larrow, and E. N. Jacobsen,
J. Am. Chem. SOC.,109,7925(1987). Adv. Synth. Catal., 343,5-26 (2001).
154. E.J. Corey and J. 0.Link, Tetrahedron Lett., 166. J. H. Namyslo and D. E. Kaufmann, Synl@t.,
31,601 (1990); E. J. Corey and J. 0. Link, J. 804-806(1999).
Org. Chem.,56,442(1991). 167. S. J. C. Taylor, K. E. Holt, R. C. Brown, P. A.
155. Y.-J. Shi, D. Cai, U.-H. Dolling, A. W. Douglas, Keene, and I. N. Taylor in R. N. Patel, Ed.,
D. M. Tschaen, and T. R. Verhoeven, Tetrahe- Stereoselective Biocatalysis, Marcel Dekker,
dron Lett., 35,6409-6412(1994). New York, 2000,p. 397.
CHAPTER NINETEEN
Structural Concepts in the

Prediction of the Toxicity-
of Therapeutical Agents
HERBERT S. ROSENKRANZ
Department of Biomedical Sciences
Florida Atlantic University
Boca Raton, Florida
Contents
1 Introduction, 828
1.1 Development of Database, 828
1.2 Model Building, 829
1.3 Model Characterization, 831
1.4 Model Validation, 834
1.5 Applications and Mechanistic Studies, 836
2 Conclusions, 844

ISBN 0-471-27090-3 O 2003 John Wiey & Sons, Inc.
Structural Concepts in the Prediction of the Toxicity of Therapeutical Agents
1 INTRODUCTION Table 19.1 Some SAR Approaches Used

in Toxicology
The increased acceptance and availability of Designation Approacha References
various structure-activity relationship (SAR) MULTICASE I 3-5
approaches in health hazard identification (1, TOPKAT I 6
2) is accompanied by many opportunities and COMPACT I 7
some pitfalls. The latter are derived from the DEREK I1 8-10
availability of various computer-based SAR ONCOLOGIC I1 11,12
platforms whose basis and performance char- Hazard Expert I1 13
PROGOL I 14
acteristics are not transparent to the user.
Structural Alerts I1 15
Such programs, in the hand of the non-expert,
may be misused. SAR models and associated "Approach I indicates statistical automated algorithms
technologies on the other hand, while not crys- not dependent on prior expert judgment. Approach I1 indi-
cates a rule-based technique that requires prior expert
tal balls, provide the expert toxicologist with judgment.
meaningful information regarding the puta-
tive toxicological profile of candidate agents. rule-based requiring prior expert knowledge
They can guide in the design of agents with (Table 19.1). However, as will be stressed
decreased or without unwanted side effects herein, even the approach not requiring prior
and yet retain or even enhance therapeutic expert input is very much dependent on hu-
effectiveness. Finally, the SAR technology can man expertise at various stages of the model
provide insights into the mechanism whereby development and interpretation process.
a chemical exerts its toxic effects and thereby Reviews and assessments of the various
provide a better understanding of the risk that SAR methodologies used to analyze toxicolog-
the agent poses to humans. ical phenomena are available (16-21).
However, to achieve these aims, it is essen-
tial that the performance characteristics and 1 .I Development of Database
the basis of the SAR model be known. This
involves several critical steps in the SAR Most experimental data compilations of to%-
model development. These are listed below, cological effects both in the public domain as
each of which will be amplified: well as in proprietary databases were not de-
veloped for SAR purposes. Thus, with respect
1. Development of database to some toxicological phenomena, the data-
base may be rich with certain chemical classes
2. Model building
such as chloroarene and lacking in data relat-
3. Model characterization ing to others, e.g., aminoarenes. Yet, unlike
4. 'Model validation the SAR models developed for drug discovery,
5. Model application to individual agents SAR models of toxicological phenomena must
and for mechanistic evaluations be able to handle databases composed of non-
congeneric chemicals. Additionally, as a con-
Although the present review focuses on the sequence of how toxicological data are gener-
MULTICASE SAR methodology (3-51, the ated, there may be a paucity of data
concepts discussed herein apply to all gener- altogether. Yet, for optimal SAR models of tox-
ally available SAR techniques used to study icological phenomena, the "learning set"
toxicological phenomena. Basically SAR ap- should include at least 300 non-congeneric
proaches that have been used fall into two cat- chemicals (3, 22).
egories: those that are based on statistical au- Accordingly, the human expert may sug-
tomated algorithms not dependent on prior gest, that for certain purposes, the results of
expert judgment and those that are a priori certain assays be pooled, e.g., rat and mouse
1 Introduction
carcinogenicity or the results of the Salmo- tive, marginally active, and inactive, or a con-
nella and E. coli WP uvrB mutagenicity assays tinuous scale of potencies) are entered, the
(23,24). Obviously, such data pooling must be program identifies the chemical substructures
based on a sound scientific basis as well as data significantly associated with the toxicological
that show extensive concordance between the phenomenon under investigation (Table
experimental results of the systems to be 19.2). Each of these structural determinants
pooled, i.e., that a substantial number of (''toxicophore") is associated with a base po-
chemicals must give identical results in the tency and a probability of activity (see Fig.
two systems, thereby indicating that results 19.1). The latter is derived from the distribu-
obtained with one system can be amalgamated tion of active and inactive molecules that con-
with those obtained in the other (25). For ex-
tain the toxicophore. The program also identi-
ample, when the same chemicals were tested
fies the chemicals that give rise to the
for their ability to induce sister chromated ex-
changes and chromosomal aberrations in cul- toxicophore (Table 19.3 and Fig. 19.7). This
tured Chinese hamster ovary (CHO) cells, enables the human expert (see below) to ascer-
they showed divergent results (26).Hence, the tain whether the structures of the chemicals
results of the two assays cannot be amalgam- giving rise to the toxicophores are germane to
ated into a single database to develop an SAR the test chemical whose toxicity is predicted.
model of cytogenetic effects. Similarly, even In addition to the toxicophores, the pro-
using the same indicator system, results can- gram also identifies modulators for specific
not be merged if different criteria are used to toxicophores (Table 19.4). These are substruc-
interpret the significance of the results. That tures or physicochemical parameters that de-
situation prevails with respect to the induc- termine whether the specific toxicity inherent
tion of mutations at the thymidine kinase loin the toxicophore will be expressed or
cus of mouse lymphoma cells vis-a-vis the cri- whether it is augmented further.
teria used by the U.S. National Toxicology Thus, when faced with a chemical of un-
Program versus those employed by the U.S. known activity, the program uses the presence
Environmental Protection Agency's GeneTox or absence of toxicophores and of modulators
Program. In fact, each data set gives rise to a to predict its toxicity (Figs. 19.1-19.3). Thus,
distinct SAR model (27-29). the presence of the toxicophore OH+ (a
On the other hand, the consensus database phenol) endows a chemical with an 87.5%
of potential developmental toxicity in humans, probability of being a contact allergen and a
based on experimental results in animals, ob- potency of 51 (moderate activity, see Table
servations in exposed humans, and expert 19.3). That basal activity is modulated by
judgment, yields a coherent SAR model of de- -25.8 x electronegativity (see Table 19.3).
velopmental risks to humans (30). That model For the example in Fig. 19.1, this results in a
is distinct from SAR models of developmental further increase in potency. The total potency
toxicity to individual rodent species (31). of 55 units corresponds to a moderately strong
activity (Table 19.3). A chemical with that
toxicophore may also contain a structural
1.2 Model Building
modulator that augments the basal activity
Once a "learning set" (i.e., database) satisfy- further (Fig. 19.2). On the other hand, the
ing preset criteria for acceptance (3) has been chemical may contain a modulator which com-
developed, the model building phase can be- pletely abolishes a chemical's potential to be
gin. In general, this is a straightforward pro- an allergen (Fig. 19.3). Additionally, the MUL-
cess that is specific for the SAR method em- TICASE SAR program will identify substruc-
-ployed.
- tures that are absent from the learning set and
Here, I will exemplify the various stages therefore may introduce an element of uncer-
with the MULTICASE SAR system ( 3 4 3 2 ) . tainty in the prediction, i.e., the "unknown"
Thus. once the structures of the chemicals and substructure could represent either potential
an indication of their potency (i.e., either ac- toxicophore or a modulator that alters a rec-
Table 19.2 Major Toxicophores Associated with Allergic Contact Dermatitis in Humans
Toxicophore
Fragment N* Inactives* Marginals* Actives* No.
The database and derivation of the SAR model have been described (33).
*N indicates the number of chemicals in the database that contain that toxicophore. "Inadives," "marginals," and "actives" indicate the distribution of that toxicophore among
activity groups.
Toxicophore No. 4 is shown embedded in chemicals in Figs. 19.1-19.3 and No. 5 is shown in Fig. 19.5.
C indicates a carbon atom shared by two rings; (3-NH,) indicates an amino group attached to the third non-hydrogen atom from the left. In toxicophore No. 17, the last carbon
to the right is shown as unsubstituted. This means that it can be substituted with any atom except a hydrogen. On the other hand, in toxicophore No. 8, the penultimate carbon is
shown unsubstituted; it can only be substituted by an amino group (i.e., (SNH,). However, the last carbon of that toxicophore is shown with an attached hydrogen. It cannot be
substituted by any other atom.
1 Introduction
Table 19.3 Derivation of Toxicophore: In view of the above considerations, once

The 19 Molecules Containing Fragment an SAR model has been developed, it requires
SH--CH, extensive validation and characterization.
~-
Chemicals Potency"
1.3 Model Characterization
2,3-Dimercapto-1-propanol 55
2-Mercaptoethanesulfonic acid 55 As mentioned above, the nature of the SAR
2-Mercaptoethyl methyl sulfone 45 model that is derived is a reflection of the com-
2-Mercaptoethyl urea 35 plexity of the toxicological phenomenon that it
2-Methoxyethyl mercaptoacetate 35 describes, as well as of the size of the learning
N-(1,l-dimethylolethyl) and the extent to which it includes chemical
mercaptoacetamide classes and/or substructures that are repre-
N,N-dimethyl mercaptoacetarnide sentative of the chemical species to which it
N-(2-mercaptoethyl)acetamide will be applied. Thus, the chemical substruc-
N42-mercaptoethyl) pyrolidone tures present among therapeutics are much
N-(mercaptoacetyl) urea
greater and diverse than, for example, those
N4mercaptoacetyl) glycine
N-(mercaptoethyl) morpholine used or generated in the chemical or agricul-
N-methyl mercaptoacetamide tural industries. This means that SAR models
N-trimethylolmethyl used to examine pharmacologically active sub-
mercaptoacetamide 45 stance must contain a greater variety of chem-
Cysteine 45 ical substructures. This may well translate
Mercaptoacetamide 55 into a requirement for a larger experimental
Mercaptoacethydrazide 45 data set (i.e., one containing an increased
Mercaptoacetic acid 45 number of chemicals).
Thioglycerol 55 In evaluating the SAR model, it is of impor-
The program identifies the chemicals that are responsi- tance to determine the relationship between
ble for toxicophore No. 5 of Table 19.1 (see also Fig. 19.7). its predictivity and the size of the database to
The toxicophore is shown embedded in a molecule in Fig. determine whether the model is ovtimal. This
19.5. m
"The allergenic potencies were defined based on the per- can be ascertained by first determining the
cent responders in the human maximization test as follows model's predictivity (see below), and then sys-
(33):10, Non-sensitizer; 25, "marginal" (4-7% responders); tematically decreasing the size of the database
39, "weak" (8-23% responders); 49, "moderate": (2445% by random deletion of chemicals to determine
responders); 59, "strong" (56-83% responders); 69, "ex-
treme" (84-100%responders).
the predictive parameters of the model de-
rived from the reduced data set. Doing this
iteratively will allow a determination of the
ognized toxicophore or a noninformative relationship between database size and con-
structure unrelated to toxicity (Fig. 19.4). cordance between predicted and experimen-
It should be stressed that not every experi- tally derived results (22). If the relationship,
mental data set gives rise to a coherent SAR including the value for the SAR model derived
model. Failure to construct a model may be from the total database is linear, then the
caused by the fact that the experimental data model will not be optimally predictive and con-
are invalid or that they do not reflect a specific sideration should be given to obtaining addi-
toxicological phenomenon. Additionally, the tional experimental data and deriving a fur-
phenomenon under investigation may be so ther model. On the other hand, if the
complex or be the result of so many different relationship including the data for the SAR
mechanisms that the experimental database is model derived from the total database is no
not sufficiently large to describe it. With this longer linear, the size of the data set may be
in mind, it should be stressed that the predic- satisfactory. Incremental data may not yield a
tivity of the SAR model will be a reflection of correspondingly significant increase in the
the complexity of the phenomenon, the size of model's performance. Thus, the predidivity of
the database (i.e., the number of chemicals for the SAR model of mutagenicity in Salmonella
which experimental data are available), and the improves linearity until a database size of 350
ratio of activesJinactivesin the data set (3,221. chemicals is reached, and then it plateaus (22).
Table 19.4 List of MODULATORS Related to Toxicophore OH -c =
Constant = 51.0 Toxicophore

Fragment QSAR No.
1 2 3 4
2D [N-I ( 4 . 5 A - ) [NH-I
CO -4H2 4 H 2-
cH ===c h 3H-
OH ---C ===c 4 0
o
W
cH =c -cH 3 H
N CH2 4 H -c ==xH
cH =cH -cH ===c
cH =cH -cH ==xH
OH -c =c 4
OH -c =cH -cH
OH -c 3 -cH
(HOMO + LUMO)/2
Modulators associated with toxicophore No. 4 of Table 19.2. Each of the modulators augments or decreases the activity inherent in the toxicophore (i.e., 51.0 units; see Figs.
19.1-19.3). (HOMO + LUM0)I.Zdescribes the electronegativity of the molecule. That value is multiplied by -25.8. Modulator No. 5 and No. 2 are shown embedded in chemids in Figs.
19.2 and 19.3, respectively. Modulator No. 1 describes a 2D distance descriptor of 6.5 A between two atoms. For interpretation of the structures see legend to Table 19.2.
1 Introduction
The molecule contains the Toxicophore (nr.occ.= 2 ) :
*** 76 out of the known 86 rnolec~le~

( BBO) containing such a
Toxieophore are Contact Allergen8 with a n average accivity of 49.
(conf.level=lOO%)
+** QSAR Contribution : Constant is 51.04
** The following Modulator is also preeent:

Electronegativity = -0.15 1 It8 contribution is 3.83
------
** Total projected QGATC activity 54.87
*** The probability that this molecule is a Contact Allergen is

87.5% f f
** The projected Allergic Potency is 5 4 . 9 CASE units **
Figure 19.1. Prediction of the contact allergenicity of 2-methyl-1,4-benzenediol. The prediction is

based on the presence of the toxicophore (shown in bold). The potency is modulated fkrther by the
electronegativity (see Table 19.4). A potency of 55 units indicates a moderately strong allergen (see
Table 19.3).
Another concern relates to the effect of the is a function of the size of the database (22,34,
ratio of active to inactive chemicals in the data 35), it follows that if the number of actives
set. Some SAR models are most predictive exceeds the number of inactives that removal
when that ratio is unity (3, 22). Hence, for a of actives to achieve a ratio of unity is not the
model that will be widely used for hazard iden- optimal solution. Rather, we have found that
tification and risk assessment purposes, it supplementing the database with randomly
would be of importance to determine whether selected chemicals from a "pool" of normal
its performance is optimal. Thus, if the num- physiological chemicals (amino acids, sugar,
ber of inactives exceeds the number of actives, lipids, purines, pyrimidines, etc., but exclud-
the number of inactives can be decreased by ing hormones, prostaglandins, and vitamins),
randomly removing the appropriate number assuming these chemicals to be inactive, is a
of inactives and determining the performance viable alternative (36,37). This is based on the
of the resulting SAR model. The random dele- recognition that the biological and/or toxico-
tion of inactives and the model derivation logical phenomena being modeled occur in a
should be repeated several times to ascertain milieu that is rich in these physiological chem-
that a robust model has been derived. We icals.
found that because the nature of the toxico- Finally, the "informational content" of an
phores is determined primarily by the actives SAR model determines its coverage. Thus, if a
and because the "quality" of the toxicophores test molecule contains a substructure un-
834 Structural Concepts in the Prediction of the Toxicity of Therapeutical Agents
The molecule contains the Toxicophore (nr.occ.= 2):
+** 76 out of the known 86 molecules ( 88%) containing such a
~oxicophoreare contact allergiee with an average activity of 49.
*** QSAR Contribution r Constant is 51.04
** The following Modulators are also present:
( 1) cH =c -cH -cH -c = c2-OH > Activating 8.33
Electronegativity o -0.17 ; Its contribution is 4.29

------
** Total projected QSAR activity 63 - 6 5
*** The probability chat thie molecule is a Contact Allergen is
** The projected Allergic Potency ie 63.7 CASE units +*
Figure 19.2. Prediction of the contact allergenicity of 4-chloro-1,3-benzenediol. In addition to the

probability of activity and the basal potency derived from the toxicophore (shown in bold in A),the
chemical also contains an activating modulator (shown in bold in B),which further augments the
potency. A potency of 64 units indicates a very strong potency (Table 19.3).
known to the model, this introduces a measure moieties are present will allow a determina-
of uncertainty into the SAR prediction. In the tion of their importance and thereby identifies
MULTICASE SAR program, such an "un- chemicals that should be tested and the re-
known" moiety is flagged (Fig. 19.4). We have sults included in the model to improve the pre-
found that a satisfactory approach to deter- dictive performance. This is based on the ob-
mining informational content is to challenge servation that the greater the informational
an SAR model with a panel of 10,000 chemi- content (i.e., the fewer warnings of "un-
cals representative of the "universe of chemi- known" moieties), the greater the model's pre-
cals" and determining the frequency with dictivity (22,34,35).
which the SAR predictions are accompanied
1.4 Model Validation
by - of the presence of "unknown"
" a "warning"
substructures. An enumeration of the fre- In its application to toxicology, SAR can serve
quency with which the individual unknown two functions: (1)to predict a specific toxico-
1 Introduction
The molecule contains the Toxicophore (nr.occ.= 1):
OH -cU
*** 76 out of the known 86 molecules ( 80%) containing such a
Toxicophore are Contact Allergen8 with an average activity of 49.
*** W A R Contribution : Constant is 5 1 - 04
** The following Modulator8 are also present:

( 1) CO -CHZ-CH2- Inactivating -53.33
Electronegativity = -0.10 ; It6 contribution is 2.51

------
** Total projected QSAR activity 0.22
** The molecule contains rhe following DEACTIVATING Fragment:
*r* The probability that thim molecule is a Contact Allergen in 63.6% **
Figure 19.3. Prediction of the lack of contact allergenicity of zingerone.Whereas the presence of the
toxicophore (A) is associated with a probability of activity and a potency, the presence of the inacti-
vating modulator (B) abolishes the potency. Moreover, the presence of a deactivating moiety (C),
which is present in five chemicals in the database that are devoid of allergenicity (Table 19.2, No. 19),
further decreases the likelihood that the zigerone is a contact allergen.
logical effect based on the identification of chemicals) and the specificity (number of cor-
substructures significantly associated with rect negative predictionsltotal number of neg-
that activity and (2) to gain insight into the ative chemicals) (22). Moreover, because the
mechanistic basis of that effect. basic function of SAR applied to toxicological
To be useful in its predictive mode, the per- phenomena is the prevention, reduction, or
formance of a model does not need to be per- elimination of harmful chemicals from the
fect, but it must be known. The predictivity of home, the environment, and the workplace,
an SAR model is defined by the concordance risk averse prediction models are preferred.
between the predictions of chemicals external That is achieved by the development of SAR
to the SAR model and the experimentally de- models that yield a low frequency of false neg-
termined toxicities. The predictivity is gov- ative predictions, i.e., high specificity. Obvi-
erned by the sensitivity (number of correct ously, ideally the model should have high sen-
positive predictions/total number of positive sitivity as well as high specificity (38).
O*** WARNING *** The following functionalities are UNKNOWN to me;
*** 0 -C. =C. -

*** CO -0 -C. -
** The molecul e does not contain any known Biog
it is therefore preaumed to be INACTIVE
Figure 19.4. Prediction of the lack of contact allergenicity of of dehydroalantolactone. The chemical
contains no toxicophore; therefore, it is presumed to be inactive. However, it contains two structures
(shown in bold) that are "unknown" to the model. That introduces an element of uncertainity in the
prediction.
The simplest way to determine predictivity however, the learning set consists of less than
parameters is to remove initially from the data 150 chemicals, a more tedious procedure may
set a random representative sample (e.g., 5%) be required, wherein one to two chemicals
to be used as a "tester set," to develop the SAR (i.e., n-1 or n-2) are removed at a time to serve
model on the remaining chemicals (i.e., 95%), as the "tester set" and the process is repeated
and then challenge the model with the "tester n or n/2 times.
set" and ascertain the predictivity. However,
as has been demonstrated on a number of oc-
1.5 Applications and Mechanistic Studies
casions, the predictivity of an SAR model is
determined by the size of the database (221, As has been mentioned earlier (Table 19.11,
and as in most instances, the size of the avail- SAR methodologies can be divided into two
able data set is not optimal, therefore, further general non-mutually exclusive approaches:
decreasing the size of the learning set by se- (1) hypothesis driven and (2) knowledge
questering the "tester set" is not optimal. based. The former is rule driven, wherein spe-
To overcome this limitation, a cross-valida- cific properties or chemical substructures are
tion approach has been used (39). In that pro- looked for, e.g., mutagens are electrophiles
cedure, a portion of the database (e.g., 5%) is and hence one would look for electrophilic or
randomly selected and removed, and a model proelectrophilic moieties. This approach as-
is developed from the remaining 95%. That sumes that mutations are caused solely by co-
model is challenged with the "tester set" (5%). valent binding of electrophiles to DNA. Agents
That procedure is repeated 20 times, and the that induce mutations by a nonelectrophilic
cumulative predictivity is determined. The fi- (i.e., non-DNA damaging) mechanism will not
nal SAR model includes the complete database be detected. Thus, agents that mutagenize
i . . , 100%). Because the predictive perfor- purely as a result of intercalation between
mance is a function of the size of the database, DNA base pairs (e.g., acridine orange,
the performance of the final model will be bet- ethidium bromide) will not be identified. Such
ter than that based on 95% of the data. When, rules are based on prior knowledge and/or in-
1 Introduction
The molecule contains the Biophore
SH -a2
*=* 18 out of che known 19 molecule8 ( 95%) containing such a
toxicophore are Contact ALlergene with an average activity of 42.
(conf.levrl=100%)
*** QSAR Contribution ; Constant i a 52.50
** The following M0d~lator6are also present:

(2D) [ C O - ] e- - 5 . 2 A - - > ISH-1 Inactivating -7.6
Electronegativity = 0.10 ; Ite contribution is -0.67
** Total projected QSAR activity
*** The probability that this molecule is a Contact

Allergen is 95.0% **
** The projected Allergic Cont rativity i e 4 4 . 3 CASE units +*
Figure 19.5. Prediction of the contact allergenicity of N-acetyl-~qsteine.The prediction is based

on the presence of the toxicophore (shown in bold), which is present in 19 chemicals in the database
(18allergens and 1 marginal allergen; see Table 19.3). The arrow indicates the 5.2 A distance
described by the inactivating modulator.
tuition and do not necessarily require adher- tal results in the database (3) as well as in
ence to strict statistical criteria. examining the plausibility of the final model
The approach illustrated herein, exempli- based on exact knowledge of the toxicological
fied by MULTICASE (31, is knowledge based. phenomenon under investigation. The human
The input consists of the structures and toxi- expert again also determines the acceptability
cological activities of the chemicals in the of individual predictions (see below).
learning set. The program then identifies Once an SAFt model has been developed
structural descriptors (toxicophores) that are and validated, it can be applied in a number of
significantly associated with activity (see Ta- fashions. SAFt methodologies, such as MULTI-
ble 19.3). The human expert participates in CASE (3-5), which document predictions (Ta-
setting criteria for the inclusion of experimen- ble 19.21,are obviously preferable to those
The molecule contains the Toxicophore
S -C
\\
N
/
C"
*** 5 out of the known 5 molecules (100%) containing such Biophore
are Mouse carcinogens with an average activity of 62.
.
(conf lcvel=W%)
** This Biophore exists in a significantly different environment:
than in che data baee (i-e. 5 . 4 5 ) ; It may not be relevant
*** QSAR Contribution : Constant is 64.00

------
*** The probability that this molecule is a Mouse carcinogen is 05.7%
** The projected Mouse carcinogenic potency is 64.0 CASE units **

Figure 19.6. Prediction of the carcinogenicity in mice of epitholone A. The structure of epitholone
A (toxicophore shown in bold) is given in Fig. 19.7.
that operate like a "black box." The latter is associated with an 89% probability of carci-
simply provides a likelihood that a test chem- nogenicity and a potency of 63 units, which
ical is active or inactive. When, however, the corresponds to a TD,, value of 0.039 mmoljkg
SAR prediction is accompanied by documenta- per day (40). However, the program flags the
tion of the basis of that forecast, the human toxicophore because its environment in epi-
expert can determine whether it is justified tholone A is significantly different from that of
and whether it applies to the specific test the molecules in the learning set (Fig. 19.6). In
chemical. fact, examination of the structures of the mol-
Thus, the mucolytic agent N-acetyl-L-cys- ecules that contribute to the biophore (Fig.
teine is predicted to have a potential to induce 19.7) indicates that indeed the molecules are
allergic contact dermatitis by virtue of the bio- quite different from that of epitholone A, and
phore SH--CH, (Fig. 19.5). Moreover, exami- hence, the prediction of carcinogenicity can be
nation of the chemicals that contribute to that disregarded (however, also see below).
toxicophore reveals that indeed they all have Moreover the molecules that contributed to
the substructure in an environment that is this toxicophore (Fig. 19.7), even though they
similar to the one found in N-acetyl-L-cysteine contain the W N - C ! = moiety (Fig. 19.61,
(Table 19.3). On the other hand, the tubulin also contain functionalities (i.e., "structural
polymerization perturber (and potential anti- alerts") that are associated with carcinogenic-
neoplastic agent) epitholone A (Fig. 19.6) is ity/genotoxicity such as nitro, amino, and hy-
predicted to be a mouse carcinogen by virtue drazino groups. In fact, these could be respon-
of the toxicophore units shown in bold. That sible for the murine carcinogenicity of these
toxicophore is present in five molecules in the chemicals. Obviously, these latter functional-
learning set. The presence of that toxicophore ities are absent in epitholone A.
1 Introduction
H'
Epitholone A
Figure 19.7. Structures of epitholone A and of chemicals that contain the toxicophore. The toxico-
phore (Fig. 19.6) is shown in bold.
Table 19.5 SAR Predictions Related to the Potential Carcinogenicity of Epitholone A

SAR Model Prediction References
Mutagenicity: Salmonella Negative 22,47
Error-prone DNA repair Negative 48
Unscheduled DNA synthesis Negative 49
Mouse MTD Positive 50
Rat LD,, Positive SAR model based on RTECS
Cell toxicity Positive 51
Inhibition GJIC Negative 52
A positive response indicates a potential for maximum tolerated dose of less than 0.9 mmolkg; an LD,, value of less than
7.2 mmolkg or a toxicity (IC,,) for cultured BALBl3T3 cells of less than 1 p M .
GJIC, gap junctional intercellular communication; RTECS, Registry of Toxic Effects of Chemical Substances.
Table 19.6 Predicted Toxicological Profile of N-Acetylcysteine

Multicase
SAR Model Probability (%) Potency (units)
Structure alerts 0 0
Salmonella mutagenicity 0 0
SOS chromotest 0 0
umuISOS repair 0 0
Carcinogenicity: rodents-NTP 0 0
Carcinogenicity: mice-NTP 0 0
Carcinogenicity: rats-NTP 0 0
Carcinogenicity: rodent-CPDB 0 0
Carcinogenicity: mice-CPDB 0 0
Carcinogenicity: rats-CPDB 0 0
Inhibition gap junction intercell comm 0 0
Binding to Ah receptor 0 0
Mutations in mouse lymphoma (NTP) 0 0
Mutations in mouse lymphoma (GenTox) 0 0
Sister chromatic exchanges in vitro 0 0
Chromosomal aberrations in vitro 0 0
Unscheduled DNA synthesis in vitro 0 0
Cell transformation 0 0
Drosophila somatic mutations 0 0
Sister chromatic exchanges in vivo 0 0
Induction of micronuclei in vivo 0 0
Yeast malsegregation 0 0
Inhibition of tubulin polymerization 0 0
Sensory irritation 89 72
Eye irritation 72 52
Respiratory hypersensitivity 0 0
Allergic contact dermatitis 95 44
Rat lethality (LD50) 0 0
Mouse MTD 0 0
Rat MTD 0 0
Cellular toxicity (3T3) 0 0
Cellular toxicity (HeLa) 0 0
Nephrotoxicity: male rats (a2pglobulin) 0 0
Inhibition human cyt. P4502D 0 0
Developmental toxicity: hamster 0 0
Developmental toxicity: human 0 0
Aquatic toxicity (minnows) 0 0
Water solubility: 3.88 log P (Octanol: water): -1.79
Electronegativity: 0.10
NTP and CPDB refer to the US.National Toxicology Program carcinogenicity assays (45) and to the Carcinogenic
Potency Data Bases (461,respectively.
Based on all of these considerations, the A, on the other hand, was not predicted to be
"human expert" would overrule the prediction genotoxic (i.e., a DNA-damaging agent), evi-
of rodent carcinogenicity. Additionally, in denced by its lack of potential to induce muta-
overriding the computer-based prediction, tions in Salmonella, error-prone DNA repair,
cognisance was also taken of the understand- or unscheduled DNA synthesis in rat hepato-
ing that the vast majority of recognized hu- cytes (Table 19.5). Thus, even if the potential
man carcinogens are genotoxicants, i.e., for murine carcinogenicity were accepted, in
"genotoxic carcinogens" (41-44). Epitholone view of the fact that the vast majority of rec-
1 Introduction
The molecule contains the Biophore (nr.occ.= 1) :
*** 38 out of the known 41 molecules (93%) containing such a Biophore

are perturbers of Tubulin Polymerization
*** QSAR Contribution :
** Total projected QSAR activity

** The probability that this molecule inhibits Tubulin
Polymerization is 93% **
** The projected Tubulin Polymerization Inhibitory activity is 8 5 . 9
CASE units **
Figure 19.8. Prediction of the ability of colchicine to inhibit tubulin polymerization. The structure
of colchicine is shown in Fig. 19.9. The biophore is shown in bold (a) in Fig. 19.9.
ognized human carcinogens are mutagens1 nogenicity in mice based on the differences in
genotoxicants or are hormones and epitholone chemical environments between epitholone A
A is neither, it would not represent a human and the molecules responsible for the toxico-
risk. phore (Figs. 19.6 and 19.7), he could examine
If, based on the above, it were accepted that mechanisms of non-genotoxic carcinogenicity,
epitholone A is not genotoxic, and if the hu- even though its relevance to human may not
man expert examining the documentation be applicable. One of the mechanisms of non-
wished not to override the prediction of carci- genotoxic carcinogenicity is inhibition of in-
tercellular communication (53). Epitholone A
does not possess such a potential (Table 19.5).
Another mechanism for non-genotoxic rodent
carcinogenesis may involve systemic or cell
toxicity followed by mitogenesis (54-56). This
may occur as a consequence of including the
maximum tolerated dose (MTD) in the cancer
bioassay protocol. When this is done, up to
50% of chemicals tested are found to be rodent
carcinogens (54). Obviously, this MTD situa-
tion rarely, if ever, applies to humans. Still,
epitholone A has the potential for inducing
Figure 19.9. Structure of colchicine. The biophore cellular as well as systemic toxicity (Table
A (bold, see Fig. 19.8) is responsible for the thera-
19.5), which may explain its potential carcino-
peutic effectiveness. Toxicophore B (see Fig. 19.10;
shown in bold) is responsible for the induction of genicity in mice, were we to discount the dif-
sister chromatid exchanges (SCE) in vivo. Removal ference in chemical environment.
of toxicophore B or its replacement be isopropoxy Obviously, the availability of a number of
groups abolishes the induction of SCEs without af- characterized and validated SAR models al-
fecting the therapeutic potential. lows the development of a putative toxicologi-
The molecule contains the toxicoophore (nr.0cc.s 3):
*** out of the known 0 molecules (100%) containing such a

8
toxicophore are Mouse SCE inducers with an average activity of
57.
*** QSAR Contribution ; Constant is 73.17
++ The following Modulators are also present:

( 3) -3-0 -c = Inactivating -7.41
( 1) CH3-0 -c =cH - Inactivating -7.41
Log partition c0eff.r 3.19 ; LogP contribution is -7.65

------
+* The probability that this molecule induces Mouse SCEe ie 90.0% **

+* The projected Mouse SCE inducing activity i o 5 0 . 7 CASE units **
Figure 19.10. The potential of colchicine to induce sister chromatic exchanges in vivo. The struc-
ture of colchicine and of the toxicophore B is given in Fig. 19.9. One of the inactivating modulators (c)
is also shown in bold in Fig. 19.9.
cal profile (Table 19.6). This can be used as a tials. However, the potential for inducing
guideline in the product developmental phase SCEs i n vivo is associated with the methoxy
to select lead compounds least likely to induce moiety (Figs. 19.9 and 19.10). Removal of that
unwanted side effects. However, the SAR ap- moiety or replacing it with an isopropoxy
proach can also be used to optimize beneficial group abolishes the SCE-inducing ability of
effects and decrease or eliminate unwanted CH without affecting its potential for iTP (i.e.,
toxic effects. the basis of its anti-inflammatory action).
Thus, let us examine colchicine (CH), an Finally, SAR approaches can also be used to
anti-inflammatory agent that has been in use provide a basis for making intelligent risk as-
for several centuries for the treatment of gout. sessments. Thus, it has been shown that the
The anti-inflammatory potential of CH is un- similarity in biophores/toxicophores present
derstood to derive from its ability to inhibit in different SAR models of toxicological phe-
tubulin polymerization (iTP) (57).That is also nomena provides a measure of mechanistic
the basis of the anticancer activity of pacli- similarity (3). The SAR models of mutagenic-
taxel (Taxol) (58-60). The structural basis of ity in Salmonella and of error-prone DNA re-
that activity derives from the presence in CH pair (SOS Chromotest) show significant over-
of the N H - C H - C . = moiety (Figs. 19.8 and laps (Table 19.7). This is not unexpected
19.9), which endows the molecule with a 93% because DNA is the target of both phenomena,
probability of activity. However, colchicine and the tester strain used for the Salmonella
also has the potential for inducing sister chro- mutagenicity assays contains a plasmid that
matid exchanges (SCEs) in vivo (Fig. 19.10). codes for error-prone DNA repair (61). In fact
This SCE-inducing ability may endow it with there is a substantial (though not complete)
genotoxic and developmental toxicity poten- overlap among chemicals that cause the two
1 Introduction
Table 19.7 Structural Commonalities whether the chemical is genotoxic in vivo as

among SAR Models well (43, 65) and thus represent a risk to hu-
SAR Models Percent mans.
Salmonella mutagenicity and SOS
However, it was found that there is also
chromotest 57 substantial overlap between Mnt and iTP, the
Salmonella mutagenicity and latter being a non-genotoxic phenomenon (Ta-
iGJIC 10 ble 19.7) (66). This finding suggests that Mnt
Salmonella mutagenicity and iTP 9 can occur by genotoxic as well as non-geno-
Salmonella mutagenicity and Mnt 53 toxic mechanisms. Thus, a positive Mnt re-
Mnt and iTP 71 sponse by a chemical that does not induce mu-
iGJIC, inhibition of gap functional intercellular com- tations in Salmonella does not necessarily
munications; iTP, inhibition of tubulin polymerization; represent a carcinogenic risk to humans.
Mnt, induction of bone marrow micronuclei in uivo.
Discodermolide (Fig. 19.11) is a promising
antineoplastic agent, which like paclitaxel, in-
hibits tubulin polymerization (671, but being
phenomena (48,62). On the other hand, there considerably more water-soluble than pacli-
is little overlap between Salmonella mutage-
taxel, discodermolide may present certain
nicity and inhibition of gap junctional inter-
therapeutic advantages while also being effec-
cellular communication (Table 19.7), which is
considered the epigenetic (non-genotoxic) tive against paclitaxel-resistant cells (67). Nei-
phenomenon par excellence (53). Nor do the ther discodermolide nor paclitaxel are muta-
SAR models for Salmonella mutagenicity and genic in Salmonella (and in fact neither is
inhibition of tubulin polymerization overlap predicted to be a rodent carcinogen). However,
significantly (Table 19.7), which is further both of these agents have a potential (deter-
support for the fact that genotoxicity and in- mined by SARI to induce Mnt i n vivo. In fact,
hibition of tubulin polymerization can be dis- for paclitaxel that potential has been deter-
sociated (see above). mined experimentally. This has led to the sug-
With respect to the in vivo induction of mi- gestion that paclitaxel, because of its ability to
cronuclei (Mnt), a different situation prevails. induce Mnt, presented a carcinogenic risk
There is considerable overlap between the (68). However, based on the above findings
toxicophores associated with Mnt and those (Table 19.71, it can be assumed that the ability
with the induction of mutation is Salmonella of discodermolide and of paclitaxel to induce
(Table 19.7). This is not surprising, because Mnt is independent of genotoxicity, and in
the induction of Mnt is known to involve a fact, derives from iTP. Thus, it does not rep-
genotoxic mechanism (63, 64). Indeed, when resent an unreasonable risk to humans who
attempting to identify potential genotoxic car- are treated with those antineoplastic agents.
cinogens, when a chemical is found to induce In fact, the biophores in discodermolide re-
mutations in Salmonella, this result is fre- sponsible for the induction of Mnt and iTP are
quently followed by a Mnt test to determine identical (Fig. 19.11).
Figure 19.11. Structure of disco-

dermolide. The circled biophore is re-
OH sponsible for the inhibition of tubu-
Discodermolide lin polymerization.
2 CONCLUSIONS 12. Y. T. Woo, D. Y. Lai, M. F. Argus, and J. C.

Arcos, Environ. Carcin. Ecotoxicol. Rev. C., 16,
SAR methodologies, in their present state, 101-102 (1998).
coupled with human expertise, can be used to 13. F. Darvas, A. Papp, A. Allerdyce, E. Benfenati,
determine and to understand the potential G. Fini, M. Tichy, N. Sobb, and A. Citti, A M
toxicity of therapeutic agents. In fact, this ap- Spring Symposium on Predictive Toxicology of
Chemicals: Experiences and Impact of A1 Tools,
proach can be used to engineer molecules de- Technical Report SS-99-01, AAAI Press, Mento
void of the moieties associated with these un- Park, CA, 1999.
wanted side effects. It must be understood, 14. R. D. King, S. H. Muggleton, A. Srinivasan, and
however, that while SAR techniques can be M. J. E. Sternberg, Proc. Natl. Acad. Sci., 93,
used to accelerate the identification and devel- 438-442 (1996).
opment of safe therapeutic agents, it is to be 15. J. Ashby, Environ. Mutagen., 7, 919-921
used as an adjunct to experimental determina- (1985).
tions. 16. A. M. Richard, Mutat. Res., 305,73-77 (1994).
17. A. M. Richard, Knowledge Engineer. Rev., 14,
3 ACKNOWLEDGMENTS 307-317 (1999).
18. R. D. Combes and P. Judson, Pestic. Sci., 45,
The support of the Vira Heinz Endowment is 179-194 (1995).
gratefully acknowledged. 19. A. M. Richard, Toxicol. Lett.102-103, 611-616
(1998).
20. C. Helma, E. Gottmann, and S. Kramer, Stat.
REFERENCES Methods. Med. Res., 9, 1 3 0 (2000).
1. National Research Council, Science and Judg- 21. A. M. Richard and R. Benigni, SAR QSAR En-
ment in Risk Assessment, National Academy viron. Res., 13, 1-19 (2002).
Press, Washington, DC, 1994. 22. M. Liu, N. Sussman, G. Klopman, and H. S.
2. I. D. McKinney, A. Richard, C. Waller, M. C. Rosenkranz, Mutat. Res., 358,63-72 (1996).
Newrnan, and F. Gerberich, Toxicol. Sci., 56, 23. D. M. Maron and B. N. Ames, Mutat. Res., 113,
8-17 (2000). 173-215 (1983).
3. H. S. Rosenkranz, A. R. Cunningham, Y. P. 24. D. Brusick, V. F. Simmon, H. S. Rosenkranz, V.
Zhang, H. G. Claycamp, 0.T. Macina, N. B. Ray, and R. S. Stafford, Mutat. Res., 76, 1%9-
Sussman, S. G. Grant, and G. Klopman, SAR 190 (1980).
QSAR Environ. Res., 10,277-298 (1999). 25. J. Pet-Edwards, H. S. Rosenkranz, V.
4. G. Klopman and H. S. Rosenkranz, Mutat. Res., Chankong, and Y. Y. Haimes, Mutat. Res., 153,
305,33-46 (1994). 167-185 (1985).
5. G. Klopman and H. S. Rosenkranz, Toxicol. 26. H. S. Rosenkranz, F. K. Ennever, M. Dimayuga,
Lett.,79,145-155 (1995). and G. Klopman, Environ. Mol. Mutagen., 16,
6. K. Einslein, V. K. Gombar, and B. W. Blake, 149-177 (1990).
Mutat. Res., 305,47-61 (1994). 27. A. D. Mitchell, A. E. Auletta, D. Clive, P. E.
7. D. F. V. Lewis, C. Ioannides, and D. V. Parke, Kirby, M. M. Moore, and B. C. Myhr, Mutat.
Environ. Health Perspect., 104, 1011-1016 Res., 394,177-303 (1997).
(1996). 28. B. Henry, S. G. Grant, G. Klopman, and H. S.
8. D. M. Sanderson and C. G. Earnshaw, Human Rosenkranz, Mutat. Res., 397,313-335 (1998).
Exp. Toxicol., 10,261-273 (1991). 29. S. G. Grant, Y. P. Zhang, G. Klopman, and H. S.
9. N. Greene, J. Chem. Znf. Comput. Sci., 37,148- Rosenkranz, Mutat. Res., 465,201-229 (2000).
150 (1996). 30. M. Ghanooni, D. R. Mattison, Y. P. Zhang, 0.T.
10. J. E. Ridings, M. D. Barratt, R. Cary, C. G. Earn- Macina, H. S. Rosenkranz, and G. Klopman,
shaw, C. E. Eggington, M. K. Ellis, P. N. Judson, Am. J. Obstet. Gynecol., 176, 799-806 (1997).
J. J. Langowski, C. A. Marchant, M. P. Payne, 31. J. Gbmez, 0.T. Macina, D. R. Mattison, Y. P.
W. P. Watson, and T. D. Yih, Toxicology, 106, Zhang, G. Klopman, and H. S. Rosenkranz, Ter-
267-279 (1996). atology, 60, 190-205 (1999).
11. Y. T. Woo, D. Y. Lai, M. F. Argus, and J. C. 32. H. S. Rosenkranz, Y. P. Zhang, and G. Klopman,
Arcos, Toxicol. Lett.,79,219-228 (1995). Altern. Anim. Test., 26, 779-809 (1998).
References 845
33. C. Graham, R. Gealy, 0. T. Macina, M. H. Karol, 52. M. Rosenkranz, H. S. Rosenkranz, and G. Klop-
and H. S. Rosenkranz, Quant. Struct. Activ. Re- man, Mutat. Res., 381, 171-188 (1997).
lat., 15, 224-229 (1996). 53. J. E. Trosko and C. C. Chang in R. W. Hoerger
34. N. Takihi, Y. P. Zhang, G. Klopman, and H. S. and F. D. Hoerger, Eds., Banbury Report 31:
Rosenkranz, Mutagenesis, 8,257-264 (1993). Carcinogen Risk Assessment: New Directions in
35. N. Takihi, Y. P. Zhang, G. Klopman, and H. S. Qualitative and Quantitative Aspects, Cold
Rosenkranz, Qual. Assur. Good Pract. Regul. Spring Harbor Laboratory Press, Cold Spring
Law, 2,255-264 (1993). Harbor, NY, 1988, pp. 139-174.
36. H. S. Rosenkranz and A. R. Cunningham, Mu- 54. B. N. Ames and L. S. Gold, Proc. Natl. h a d . Sci.
tat. Res., 476, 133-137 (2001). USA, 87,7772-7776 (1990).
55. S. M. Cohen and L. B. Ellwein, Science, 249,
37. H. S. Rosenkranz and A. R. Cunningham, SAR
1007-1011 (1990).
QSAR Environ. Res., 12,267-274 (2001).
56. S. Preston-Martin, M. C. Pike, R. K. Ross, P. A.
38. V. Chankong, Y. Y. Haimes, H. S. Rosenkranz,
Jones, and B. E. Henderson, Cancer Res., 50,
and J. Pet-Edwards, Mutat. Res., 153, 135-166
7415-7421 (1990).
(1985).
57. E. ter Haar, H. S. Rosenkranz, E. Hamel, and
39. Y. P. Zhang, N. Sussman, G. Klopman, and H. S. B. W. Day, Bioorg. Med. Chem., 4, 1659-1671
Rosenkranz, Quant. Struct. Activ. Relat., 16, (1996).
290-295 (1997).
58. E. Hamel, Med. Res. Rev., 16,207-231 (1996).
40. A. R. Cunningham, H. S. Rosenkranz, Y. P. 59. P. B. Schiff and S. B. Horwitz, Proc. Natl. h a d .
Zhang, and G. Klopman, Mutat. Res., 398,l-17 Sci. USA, 77,1561-1565 (1980).
(1998).
60. P. B. Schiff, J. Fant, and S. B. Horwitz, Nature
41. F. K. Ennever, T. J. Noonan, and H. S. Rosen- (Lond.),277,665-667 (1979).
kranz, Mutagenesis, 2, 73-78 (1987). 61. J. McCann, N. E. Spingarn, J. Kobori, and B. N.
42. H. Bartsch and C. Malaveille, Cell Biol. Toxicol., Ames, Proc. Natl. Acad. Sci. USA, 72,979-983
5, 115-127 (1989). (1975).
43. J. Ashby and R. S. Morrod, Nature, 352, 185- 62. V. Mersch-Sundermann, U.Schneider, G. Klop-
186 (1991). man, and H. S. Rosenkranz, Mutagenesis, 9,
44. M. D. Shelby, Mutat. Res., 204,3-15 (1988). 205-224. (1994).
45. J. Ashby and R. W. Tennant, Mutat. Res., 257, 63. J. A. Heddle, M. C. Cimino, M. Hayashi, F. Ro-
229-306 (1991). magna, M. D. Shelby, J. D. Tucker, Ph. Van-
parys, and J. T. MacGregor, Environ. Mol. Mu-
46. L. S. Gold, N. B. Manley, T. H. Slone, G. B.
tagen., 18,277-291 (1991).
Garfmkel, L. Rohrbach, and B. N. Ames, Envi-
ron. Health Perspect., 100,65135 (1993). 64. K. H. Mavournin, D. H. Blakey, M. C. Cimino,
M. F. Salamone, and J. A. Heddle, Mutat. Res.,
47. H. S. Rosenkranz and G. Klopman, Mutat. Res., 239,29-80 (1990).
228,51-80 (1990).
65. H. Tinwell and J. Ashby, Environ. Health Per-
48. V. Mersch-Sundermann, G. Klopman, and H. S. spect., 102, 758-762 (1994).
Rosenkranz, Mutat. Res., 340,81-91 (1996). 66. E. ter Haar, B. W. Day, and H. S. Rosenkranz,
49. Y. P. Zhang, A. van Praagh, G. Klopman, and Mutat. Res., 350, 331337 (1996).
H. S. Rosenkranz, Mutagenesis, 9, 141-149 67. E. ter Haar, R. J. Kowalski, E. Hamel, C. M. Lin,
(1994). R. E. Longley, S. P. Gunasekera, H. S. Rosen-
50. H. S. Rosenkranz and G. Klopman, Environ. kranz, and B. W. Day, Biochemistry, 35, 243-
Mol. Mutagen., 21, 193-206 (1993). 250 (1996).
51. H. S. Rosenkranz, E. J. Matthews, and G. Klop- 68. H. Tinwell and J. Ashby, Carcinogenesis, 15,
man, Altern. Anim. Test., 20, 549-562 (1992). 1499-1501 (1994).
CHAPTER TWENTY
Natural Products as Leads for

New Pharmaceuticals
A. D. Buss
MerLion Pharmaceuticals
Singapore Science Park, Singapore
B. Cox
Medicinal Chemistry
Respiratory Diseases Therapeutic Area
Nvvartis Pharma Research Centre
Horsham, United Kingdom
R. D. WAIGH
Department of Pharmaceutical Sciences
University of Strathclyde
Glasgow, Scotland
Contents
1 Introduction, 848
2 Drugs Affecting the Central Nervous System,
849
2.1 Morphine Alkaloids, 849
2.2 Conotoxins, 851
2.3 Cannabinoids, 852
2.4 Asperlicin, 855
3 Neuromuscular Blocking Drugs, 856
3.1 Curare, Decamethonium, and Atracurium,
856
4 Anticancer Drugs, 858
4.1 Catharanthus Winca) Alkaloids, 858
4.2 Camptothecin, 860
4.3 Paclitaxel and Docetaxel, 861
4.4 Epothilones, 864
4.5 Podophyllotoxin, Etoposide, and Teniposide,
865
4.6 Marine Sources, 867
5 Antibiotics, 868
5.1 p-Lactams, 868
Burger's Medicinal Chemistry and Drug Discovery 5.2 Erythromycin Macrolides, 874
Sixth Edition, Volume 1: Drug Discovery 5.3 Streptogramins, 876
Edited by Donald J. Abraham 5.4 Echinocandins, 877
ISBN 0-471-27090-3 O 2003 John Wiley & Sons, Inc. 6 Cardiovascular Drugs, 878
848 Natural Products as Leads for New Pharmaceuticals
6.1 Lovastatin, Simvastatin, and Pravastatin, 878 7.3 Contignasterol, 886

6.2 Teprotide and Captopril, 881 8 Antiparasitic Drugs, 886
6.3 Adrenaline, Propranolol, and Atenolol, 881 8.1 Artemisinin, Artemether, and Arteether, 886
6.4 Dicoumarol and Warfarin, 882 8.2 Quinine, Chloroquine, and Mefloquine, 888
7 Antiasthma Drugs, 883 8.3 Avermectins and Milbemycins, 891
7.1 Khellin and Sodium Cromoglycate, 883 9 Conclusion, 891
7.2 Ephedrine, Isoprenaline, and Salbutarnol,
884
1 INTRODUCTION would now call "bioactive" substances was a

mystery. A modern view is that these com-
Of the 520 new pharmaceuticals approved be- pounds have a role in protecting the otherwise
tween 1983 and 1994,39% were derived from defenseless, stationary plant from attack by
natural products, the proportion of antibacte- mammals, insects, fungi, bacteria, and vi-
rials and anticancer agents of which was over ruses. Taking morphine as an example of a
60% (1). Between 1990 and 2000, a total of 41 secondary metabolite whose value to the plant
drugs derived from natural products were is not entirely obvious, 14 steps are required
launched on the market by major pharmaceu- from available amino acids, including at least
tical companies (Table 20.11, including azithro- one step that is highly substrate specific (2).
mycin, orlistat, paclitaxel, sirolimus (rapamy- The presence of morphine in the tissues of Pa-
cin), Synercid, tacrolimus, and topotecan. In paver somniferum must therefore confer a se-
2000, one-half of the top-selling pharmaceuti- lectional advantage on the plant (3): genetic
cals were derived from natural products, hav- code is required for each,of the enzymes in-
ing combined sales of more than US $40 volved in the biosynthesis, valuable amino ac-
billion. These included the biggest selling an- ids are utilized in forming the enzymes, and a
ticancer drug paclitaxel, the "statin" family of relatively scarce nutrient (nitrogen) is locked
hypolipidemics, and the immunosuppressant up in the compounds produced. If the mor-
cyclosporin. During 2001 we have seen the phine did not continue to have value for the
launch of caspofungin from Merck and galan- plant, mutants would have arisen with the ad-
tamine from Johnson & Johnson, with rosuv- vantage of not having a drain on their meta-
astatin, telithromycin, daptomycin, and ect- bolic resources.
einascidin-743 due to follow in 2002. We can only guess a t the ecological func-
Despite the figures, the popularity of natu- tions of morphine. Perhaps a mammalian
ral products, particularly those from higher herbivore that consumed too many poppies
plants as leads for new pharmaceuticals, tends would become drowsy and itself fall prey to a
to fluctuate. At the time of writing, several of carnivore. It may be significant that the can-
the world's biggest pharmaceutical companies nabinoids, produced in greatest abundance
have reined back their natural product drug in the nutritious growing tips of the plant,
discovery programs and have placed great also induce mental effects that would com-
faith in combinatorial chemistry, coupled to promise a herbivore's ability to escape a
very high throughput screening. Time will tell predator. Whatever their natural protective
whether this is a wise stratagem, or whether functions, natural products are a rich source
the unique features of compounds that are of biologically active compounds that have
themselves derived from living organisms will arisen as the result of natural selection, over
once again see renewed acceptance. perhaps 300 million years. The challenge to
The abundance of plant and microbial sec- the medicinal chemist is to exploit this
ondary metabolites and their value in medi- unique chemical diversity. The following ac-
cine are undisputed, but one question that is count illustrates how natural products have
only partly answered concerns the reasons for been used as what are called lead com-
this abundance of complex chemical sub- pounds, or templates for the development of
stances. In the past, the production of what we important medicines.
2 Drugs Affecting the Central Nervous System 849
Table 20.1 Drugs Derived from Natural Products (1990-2000)

Name Originator IndicationDJse
Acarbose Bayer Diabetes
Artemisinin Kunming & Guilin Malaria
Azithromycin Pliva Antibiotic
Carbenin sankyo Antibiotic
Cefetamet pivoxil Takeda Antibiotic
Cefozopran Takeda Antibiotic
Cefpimizole Ajinomoto Antibiotic
Cefsulodin Takeda Antibiotic
Clarithromycin Taisho Antibiotic
Colforsin daropate Nippon Kayaku Asthma
Docetaxel Aventis Cancer
Dronabinol Solvay Alzheimer's disease
Galantamine Intelligen Alzheimer's disease, arthritis
Gusperimus Nippon Kayaku Arthritis
Irinotecan Yakult Honsha Cancer
Ivermectin Merck & Co Parasiticide
Lentinan Ajinomoto Cancer
LW-50020 Sankyo Immunomodulation
Masoprocol Access Cancer
~e~artricin SPA Benign prostatic hyperplasia
Miglitol Bayer Diabetes
Mizoribine Asahi Chemical Arthritis
Mycophenolate mofetil Hoffman-LaRoche Arthritis
Orlistat Hoffman-LaRoche Obesity
Paclitaxel Bristol-Myers Squibb Cancer
Pentostatin Warner-Lambert Leukemia
Podophyllotoxin Nycomed Pharma Human papillomavirus
Policosanol Dalmer Hyperlipidaemia
Everolimus Novartis Immunomodulation
Sirolimus American Home Prod1lcts Immunomodulation
Sizofilan Taito Cancer, hepatitis-B virus
Subreum OM Pharma Arthritis
Synercid Novartis Antibiotic
Tacrolimus Fujisawa Immunomodulation
Teicoplanin Aventis Antibiotic
Tirilazad mesylate Pharmacia & Upjohn Subarachnoid haemorrhage
Topotecan GlaxoSmithKline Diabetes
Ukrain Nowicky Pharma Cancer, HIVIAIDS
Vinorelbine Pierre Fabre Cancer
Voglibose Takeda Diabetes, obesity
2-100 Zeria Immunomodulation
2 DRUGS AFFECTING THE CENTRAL that have occurred since the isolation of mor-
NERVOUS SYSTEM phine in 1803. Codeine (2) continues to be
used widely for the treatment of moderate
2.1 Morphine Alkaloids pain and, although present in the opium poppy
The history of the opium alkaloids is too well (Papaver somniferum), it is normally synthe-
known to warrant repetition here, but the an- sized in higher yield from morphine (4).
algesics based on morphine (1)are too impor- Other than codeine, the earliest significant
tant to be left out of an account of natural semisynthetic derivative of morphine is the di-
products as leads. Thus we shall summarize acetate heroin (31, which is still widely used in
the clinically more important developments terminal cancer where its addictiveness is ir-
Natural Products as Leads for New Pharmaceuticals
(1)morphine R1 = Rz = H
(2) codeine Rl = CH3, R2 = H
(3) heroin Rl = Rz = COCH3 (5) naloxone
relevant. Acetylation masks the polar hydroxy give the morphinans (6). The system may be
groups, so that penetration into the central simplified even further (9),to give the benzo-
nervous system (CNS)is enhanced; hydrolysis morphans (7), although neither these nor the
then occurs to liberate the phenolic hydroxyl, morphinans have provided the long-sought
giving an active analgesic, and ultimately re- analgesic without addictive properties.
generates morphine (5). Heroin was thus one
of the first prodrugs.
Modifications to the C-ring of morphine are
legion, but none of the derivatives is free
from addictive liability, though many have
been used clinically. N-Demethylation and
realkylation yield more interesting analogs,
notably N-allylnormorphine and nalorphine
(4), which is a morphine antagonist (6). Fur-
ther modification leads to naloxone (51,
which unlike nalorphine has very little ago- (6) morphinan
nist activity (7) and has retained a place in
therapy for treatment of opiate-induced re-
spiratory depression. Naloxone will also pre-
cipitate withdrawal symptoms in opiate ad-
dicts, thereby facilitating diagnosis.
(7) benzomorphan
A semisynthetic route to morphine ana-

logs was found (10) from thebaine (8) using
Diels-Alder reactions in the C-ring. Adducts
such as (9) have the distinction of enormous
(4) nalorphine potency (1I), sufficient to immobilize rhi-
noceroses at moderate dose levels! Unfortu-
Total synthesis of morphine is difficult, but nately, the addictive liability runs parallel to
analogs lacking the dihydrofuran ring are ac- the increase in analgesic potency, a tendency
cessible (8) from 1-benzylisoquinolines, in that was partly overcome (12) in the analog
analogy with the biosynthesis of morphine, to buprenorphine (10).
2 Drugs Affecting the Central Nervous System
vation that meperidine (pethidine) (12) unex-

pectedly produced a reaction in mice known as
Straub tail, normally characteristic of the
morphine series (15). Meperidine itself is still
used widely in childbirth in the belief that
there is a lower incidence of respiratory de-
pression in the fetus. The realization that
4-phenylpiperidines, which are not obvious
structural analogs of morphine, could give rise
to useful analgesic effects, led to the synthesis
(8) thebaine
of many thousands of derivatives (161, many
with far greater potency than that of meperi-
dine. Unfortunately, as potency increases so
do addiction liability and respiratory depres-
sion.
(9) etorphine
(11) atropine
(10) buprenorphine
All this work was carried out in ignorance (12) pethidine

of the nature of the natural transmitter(&
which subsequently proved to be the peptides
2.2 Conotoxins
known as endorphins and their pentapeptide
fragments, the enkephalins (13). It is perhaps Elan Pharmaceuticals is developing SNX-111
significant that vastly improved understand- (Ziconotide), the synthetic equivalent of
ing of the biochemical basis for analgesia and w-Conopeptide-MVIIA, found in the venom of
the characterization of a family of related re- the predatory marine snail Conus magus, for
ceptors (14), known as 8, K , and p, have so far the treatment of severe pain and ischemia by
failed to yield any better drugs for the treat- the intrathecal or intravenous routes. The
ment of pain. peptide has the structure H-'Cys-Lys-Gly-
A series of analgesics that were discov- Lys-Gly-Ala-Lys-'Cys-Ser-Arg-Leu-Met-Try-
ered initially in an attempt to obtain smooth Asp-15Cys-16Cys-Thr-Gly-Ser-20Cys-Arg-Ser-
muscle relaxants based on another natural Gly-Lys-25Cys-NH,cyclic(1-16),(8-20),(15-
product, atropine ( l l ) ,started with the obser- 25)-tris(disulfide), which does not make it an
OH
(13) conotoxin analog
easy target for synthesis and gives it poor dis- drocannabinol(14) (THC), which has a multi-
tribution properties in vivo (17). plicity of actions. In animals the effects
SNX-111 blocks N-type calcium channels, include sedation and apparent hallucinations
which are located throughout the CNS on neu- (19), which are similar to the major effects in
ronal somata, dendrites, dendritic spines, and the CNS in humans. There are also cardiovas-
axon terminals, where they play a major role cular effects, notably tachycardia and postural
in the regulation of the neurotransmitters as- hypotension, that can be separated from the
sociated with pain transmission and stroke. CNS action, as in the synthetic analog A,,,,,-
The drive is to discover an orally active, selec- dimethylheptylTHC (151, which has minimal
tive, small-molecule modulator of N-type cal- CNS activity (20).
cium channels to overcome the disadvantages
of administration of SNX-111.
High-throughput screening campaigns have
resulted in a number of leads being identified;
whereas others have chosen to modify known
drugs shown to block N-type channels. Work-
ers at Parke-Davis, however, employed a li-
gand-based approach using the three-dimen-
sional solution structure of the peptide (18).
Compounds such as (13)were designed where
key binding motifs are attached to an alkyl- (14) THC
phenyl ether scaffold. The compound had an
IC,, value of 3.3 pit4 in a human N-type
channel assay but showed no selectivity over
the L-type channel. Structure-activity work
on the conotoxins has shown that other re-
gions of the peptide, absent in these syn-
thetic ligands, are responsible for channel
family selectivity (17, 18).
2.3 Cannabinoids
The plant Cannabis sativa has been used by
humans for thousands of years, both for the
effects when ingested and for making rope
from the fibers in the stem. The major constit- Given the widespread illicit use of C. sativa,
uent of pharmacologicalinterest is A,-tetrahy- it was perhaps inevitable that eventually one
2 Drugs Affecting the Central Nervous System
or two cancer patients receiving chemother-

apy would dose themselves with their own sed-
ative in the form of marijuana. An unexpected
blessing from this uncontrolled combination
was a reduction in the nausea experienced
during chemotherapy. A variety of anticancer
agents cause severe nausea and vomiting, in-
cluding nitrogen mustard, adriamycin, 5-aza-
cytidine, cyclophosphamide, and methotrex-
ate: the unique situation arose in which the
remedy was discovered by the patients them-
(16) nabilone
selves (21). Although smoking reefers gives
rapid absorption and close control of the ef-
fects, smoking is itself carcinogenic and can- to concentrate on analogs of the natural li-
not be recommended to those who are unac- gands, notably the methyl derivative of anan-
customed to it; thus, when the physicians in damide (19)' which is resistant to the amide
charge were made aware of their patients' dis- hydrolase that terminates the action of anan-
covery, they devised a controlled clinical trial damide itself and the dimethylheptyl analog
in which measured doses of THC were dis- (20) that is traceable to the earlier modifica-
solved in sesame oil and administered in gela- tions to THC (29). Such analogs tend to have
tin capsules. A placebo was similarly prepared activity similar to that of THC.
for use in a randomized, double-blind, cross-
over experiment (21).The results left no doubt
that a majority of patients benefited from
THC pretreatment, even those who had previ-
ously been refractory to the effects of the stan-
dard antiemetics such as prochlorperazine.
There remained the problem of tachycardia
associated with THC treatment. The multi-
plicity of effects of THC have led to the syn-
thesis of large numbers of analogs (221, partic-
ularly in the hope of finding non-morphine-
like analgesics without addictiveness and
without the other CNS effects of THC. The (17) anandamide
analog nabilone (16) had been shown to exert
less effect than that of THC on the cardiovas-
cular system, while retaining the mixture of
CNS actions, including analgesic, antianxiety,
and antipsychotic properties (23). When tested
as an antiemetic, nabilone proved to be superior
to THC (24) and has been used for this purpose
for more than 30 years. The f i s t 10 years of
clinical experience was reviewed (25).
After the demonstration of THC binding
sites in the CNS (261, a search for an endoge-
nous ligand produced the long-chain ethanol-
amine derivative (17) of arachidonic acid,
known as anandamide (27). Subsequently, the
glycerol ester of arachidonic acid (la), known
as 2-AG, was shown to be a more abundant An interesting twist in the tail is provided
endogenous ligand in the brain than anand- by the observation that anandamide is also a
amide (28). Further development has tended ligand for the so-called enigmatic vanilloid re-
plus a hydrolase and a transport protein, in-

terference with any or all of which might pro-
vide new drugs.
(22) resiniferatoxin
ceptors, previously characterized through

their interactions with two other natural
products, capsaicin (21) and resiniferatoxin
(22) (30) and responsible for the "hot" sensa-
tion caused by compounds in, for example,
chilies. A functional vanilloid receptor was
cloned in 1997 and is activated by heat and
acid as well as the chemical ligands (30). A
combination of the anandamide structure The cannabinoid acids, which are devoid of
with a vanilloid motif, as in AM404 (23), en- psychotropic activity, are promising anti-in-
hances the anandamide transport inhibitory flammatory agents (31) and it is possible that
properties (29). The situation is complex from the next useful therapeutic agent will come
the viewpoint of drug design, not least because from this direction, rather than the sought-
there are two cannabinoid (CB) receptors, after analgesic.
(21) capsaicin
2 Drugs Affecting the Central Nervous System 855
2.4 Asperlicin Asperlicin is moderately potent, poorly sol-

uble in water, and not bioavailable by the oral
Cholecystokinin (CCK) is a peptide hormone, route (41). When discovered it was also, with
present in the gut and CNS; it is one of the morphine, one of the very few nonpeptides
most abundant peptides in the brain (32,331. with affinity for a peptide receptor (peptoids
The whole peptide is composed of 33 amino are discounted in this assessment). It was an
acids, but the C-terminal octapeptide H-Asp- interesting target for synthetic modification,
Tyr(S0,H)-Met-Gly-Trp-Met-Asp-Phe-NH, particularly viewed as a benzodiazepine deriv-
possesses the full range of activities, sufficient ative with potential CNS activity.
for it to be classed as a neurotransmitter (34). Based on the benzodiazepine nucleus, and
Specific, high-affinity binding sites have been an overt mimic of diazepam, one of the first
found on mammalian CNS cell membranes successful synthetic analogs was L-364,286
and in other organs such as pancreas, gall (25), which had potency on CCK-A receptors
bladder, and colon (35). The latter have been similar to that of asperlicin. Better receptor
classed as CCK-A receptors, but the majority affinity was achieved with 3-amide-substi-
of CNS receptors were classed as CCK-B, tuted benzazepines: the 2-indolyl derivative
based on affinity differences for various ago- L-364,718, also known as MK-329 (261, is five
nists and antagonists (36). To confuse the is- orders of magnitude more potent than asper-
sue slightly, the gastrin receptor in the stom- licin (42) at CCK-A receptors and is a valuable
ach is closely related to the CCK-B (now pharmacological tool.
known as the CCK,) receptor (37) and is stim-
ulated by the C-terminal tetrapeptide of CCK:
in the periphery, gastrin receptors are the
same as CCK, receptors (38).
The effects of CCK on intestinal smooth
muscle and pancreas are easy to demonstrate
pharmacologically, unlike the role in the CNS,
which is a matter for conjecture. It was as-
sumed that the CNS activity must be signifi-
cant, given the abundance of the peptide in the
brain, and that the discovery of antagonists
might lead to new drug treatments, as yet un-
specified (39).
Fishing in microbial broths, using radiore-
ceptors as bait, produced asperlicin (241, the
first potent, competitive and selective CCK-A
(CCK,) antagonist, from a culture medium of
Aspergillus alliaceus (40).
Modification of the 3-amide to give a urea

linkage as in (27) led to a reduction in CCK-A
receptor affinity. Importantly, discrimination
(24) asperlicin between CCK-A and CCK-B receptors by (27)
is governed by the stereochemistry at C3, the 3 NEUROMUSCULAR BLOCKING DRUGS

(S)-enantiomer showing greater affinity for
CCK-A receptors. The (R)-enantiomer,known 3.1 Curare, Decamethonium,
as L-365,260, prefers CCK-B receptors, antag- and Atracurium
onizes gastrin-stimulated acid secretion in an- The development and use of muscle relaxants,
imal models, and, among other CNS effects, to allow a reduction in the level of anesthesia
induces analgesia in primates and displays an- during surgery, follows entirely from studies
xiolytic properties (32). of South American arrow poisons (44)and par-
ticularly from the isolation by King (45) of
pure D-tubocurarine (29) in the 19309, from
tube curare. Another of the South American
blowpipe poisons, calabash curare, was used
for similar purposes and developed (46,47), to
give alcuronium (30) from the alkaloid
C-toxiferine 1 (31). Both types of curare para-
lyze skeletal muscle by a similar mechanism,
antagonizing the effect of acetylcholine at the
neuromuscular junction (48).
Further development in this series has very

substantially improved receptor affinity: YM-
022 (28) has IC,, 0.05 nM/kg (38). Clinical tri-
als of compounds in this series have been dis-
appointing because of poor bioavailability, but
the general concept of finding a therapeutic
agent through antagonism of CCK, receptors
is still viable and it is reported that the num-
ber of patents in this area has increased in the
last 5 years (43).
(29) tubocurarine R = H
(32) metocurine R = CH3
The muscle-paralyzingcurare alkaloids are

quaternary salts that are not absorbed when
taken orally. For surgical procedures they
must be administered by intravenous injec-
tion, which results in onset of paralysis in at
most a few minutes: anesthesia is normally
induced before administration of the muscle
relaxant (44), which is followed by artificial
respiration. Although the neuromuscular
blocking agents are potentially lethal when
administered alone, in the environment of an
operating theater they are truly life-saving
3 Neuromuscular Blocking Drugs 857
binding to the acetylcholine receptor at the

neuromuscular junction.
Unlike tubocurarine, decamethonium de-
polarizes the muscle endplate, rendering the
membrane insensitive to acetylcholine (48).
The action of tubocurarine is competitive and
can be overcome with increased concentra-
tions of acetylcholine, brought about by ad-
ministration of an anticholinesterase: the lat-
ter is thus an antidote to tubocurarine, but not
to decamethonium. Despite the lack of an an-
tidote, decamethonium was used very widely
for over two decades. One of its disadvantages
is an overlong duration of action, during
(31) C-toxiferine 1 R = CH3 which time the patient has to be maintained
(30) alcuronium R =CH2CH=CH2
on artificial respiration, because the muscle of
the diaphragm is also susceptible to the ac-
tions of the drug. An early and highly success-
drugs that have made a major impact on sur- ful attempt (51) to shorten the action of deca-
vival rates during surgery. methonium gave suxamethonium (34), a
At the time of King's work in the 1930s diester formed between succinic acid and two
there were no spectroscopic aids to structure
molecules of choline, which hydrolyzes rapidly
elucidation, and it is not surprising that he
in the presence of pseudocholinesterase.
made a small error in the structure assigned to
Tubocurarine suffers from cardiovascular
D-tubocurarine, believing it to have two qua-
ternary nitrogens, a mistake that was not cor- side effects induced by direct interactions with
rected (49) until 1970. The methylation prod- ganglionic acetylcholine receptors and from
uct of D-tubocurarine, known as metocurine stimulation of histamine release, so analogs
(32) is a more potent muscle relaxant. It was have been well worth pursuing. The macrocy-
known for a long time as dimethyltubocura- clic structure of tubocurarine is a difficult syn-
rine because of the error in the structure al- thetic target, but fortunately ring-opened an-
located to compound (29). King's error, in alogs, such as laudexium (351, have high
assigning a bisquaternary structure to a mol- potency and relatively few side effects (52).
ecule with one quaternary and one protonated The main problem with (35)is the duration of
tertiary nitrogen, led to a large number of action, which at about 40 min is too long for
highly active synthetic bisquaternaries. The many operations. Two approaches have been
simplest of these was decamethonium (331, used to shorten the duration of action. The
which was nothing more than two trimethyl- concept of pH-controlled Hofmann elimina-
ammonium end groups connected with a deca- tion was employed successfully (53) in the de-
methylene chain. As one of a series with dif- sign of atracurium (36),which in clinical use
ferent chain lengths (50), decamethonium (54) has the big advantage that the drug dis-
became the prototype for many more complex appears at a constant rate, irrespective of liver
structures with 10 atoms between the quater- or kidney function. Some ester hydrolysis con-
nary centers, which appeared to be optimal for tributes to the destruction of atracurium in
vivo, as might be expected. A slightly later de-
velopment (55) centered on an empirical
search for structures that would undergo ester
hydrolysis more rapidly, resulting in mivacu-
rium (37), which has a slightly shorter dura-
tion of action than that of atracurium, the lat-
(33) decamethonium ter being about 15-20 min.
0
0 /\/COOCH2CH2N(CH3)3
(H3C)3NCH2CH20C0
1 pseudocholinesterase
0
"4-COOH
(H3C)3NCH2CH20C0
0
HOCH2CH2N(CH3)3
decomposition of suxamethonium (34)
(35) laudexium
4 ANTICANCER DRUGS extracts of C. roseus and they too had detected

cytotoxic activity, specifically against acute
4.1 Catharanthus (Vinca) Alkaloids lymphocytic leukemia (57,581.The U.S. group
In 1949 Canadian researchers at the Univer- isolated several alkaloids, including vinblas-
sity of Western Ontario began investigating tine and another closely related alkaloid, vin-
the medicinal properties of the rosy periwin- cristine (39).
kle (Catharanthus roseus), a plant that had Although many other alkaloids have been
been used for many years to treat diabetes isolated from C. roseus, only vinblastine and
mellitus in the West Indies. Despite finding vincristine have been developed for clinical
that the plant extract when given orally had use. The antiproliferative activity of the two
no effect on blood sugar levels in rats or rab- compounds is related to their specific interac-
bits, the researchers noted that when given tion with tubulin, thus preventing assembly of
intravenously, the extract caused the animals tubulin into microtubules and arresting cell
to succumb to bacterial infection and die. This division (59). However, despite this apparent
curious observation prompted further studies, identical mechanism of action and their clear
which showed that the plant extract reduced chemical similarities, vinblastine and vincris-
levels of white blood cells, causing granulocy- tine display very different clinical effects. Vin-
topenia and bone marrow damage, toxic ef- blastine, for example, is used to treat
fects that are encountered with many antitu- Hodgkin's disease and metastatic testicular
mor drugs (56). These findings led the tumors, whereas vincristine is used mainly in
Canadian group to isolate an alkaloid fraction combination with other anticancer drugs for
with potent cytotoxic activity. The active prin- the treatment of acute lyrnphocytic leukemia
ciple was eventually purified and became in children. Toxicity profiles are also different,
known as vinblastine (38), a dimeric indole- in that vinblastine causes bone-marrow de-
dihydroindole alkaloid. pression, whereas peripheral neuropathy of-
Concurrently, researchers at the Lilly Re- ten proves to be dose-limiting in vincristine
search Laboratories had been investigating therapy.
4 Anticancer Drugs
(36) atracurium
pH 7.4
(37) mivacurium
Lilly introduced vinblastine and vincristine tion. Selective ammonolysis of the ester func-
into the clinic in 1960 and 1963, respectively, tion at C-3 and hydrolysis of the adjacent
but this did not preclude the search for im- acetyl group yielded the desacetyl vinblastine
proved derivatives. A chemical modification amide, vindesine (40). Better yields of vin-
program aimed at improving antitumor activ- desine were obtained from the hydrazide (41)
ity and reducing toxicity was initiated in 1972 on treatment with nitrous acid and reacting
(60). Concern about the neurotoxicity dis- the resultant azide (42) with ammonia. The
played by vincristine, its chemical instability, azide (42) proved to be a useful intermediate
and low natural abundance (0.03 glkg dried for the preparation of a range of substituted
plant material) led to vinblastine's being cho- amides, although vindesine proved to be the
sen as a template for semisynthetic modifica- derivative of choice, with significant differ-
(38) R = CH3
(39) R = CHO
ences in the spectrum

- of antitumor activity 4.2 Camptothecin
and toxicity compared to that of the naturally Camptothecin (43) was first isolated by Mon-
occurring alkaloids. Phase I clinical trials
roe Wall and Mansukh Wani in 1966, after
commenced in 1977 and vindesine has been
ethanolic extracts of Camptotheca acuminata,
used for the treatment of non-small cell lung
cancer, lymphoblastic leukemia, and non- a tree native to China, showed unusual and
Hodgkin's lymphomas. In combination with potent antitumor activity (63). Starting with
cisplatin, vindesine ranks among the foremost 19 kg of dried wood and bark, Wall and Wani
treatments for non-small cell lung cancer with painstakingly purified the principal active
respect to response rate and survival (61). component with a combination of hot solvent
Back in the 1950s, the U.S. researchers could extraction, an ll-stage Craig countercurrent
not have guessed that 30 years on, the demand partition process, silica gel chromatography,
for Catharanthus alkaloids would necessitate and crystallization. Camptothecin was charac-
the processing of around 8000 kg of plant ma- terized as a novel pentacyclic alkaloid, present
terial per year (62)! as just 0.01% wlw of the stem bark of C. acumi-
4 Anticancer Drugs
nata. Of particular note was the unusual ac- ical activity, but the 10-hydroxy analog (44)
tivity that camptothecin displayed in L1210 showed greater activity than that of (43) (65).
and P388 mouse leukemia life-prolongation Wall and Wani successfully deployed the
assays. The compound also inhibited the Friedlander reaction between substituted
growth of solid tumors in vivo and the water- 2-aminobenzaldehydes and the tricyclic inter-
soluble sodium salt was progressed to phase I1 mediate (45), to synthesize a variety of ring-
clinical trials before being withdrawn because A-substituted analogs. These studies may
of severe bladder toxicity. have prompted SmithKline Beecham (now
GlaxoSmithKline)to synthesize the water-sol-
uble 10-hydroxycamptothecin analog topote-
can (46) that was first approved in 1996 for the
treatment of recurrent ovarian cancer and, 2
years later, for small cell lung cancer (66). Iri-
notecan (471, developed by Daiichi and Yakult
Honsha in Japan and marketed by Pharmacia,
was also approved in 1996 for the treatment of
advanced colorectal cancer. Irinotecan is inac-
tive as a topoisomerase I inhibitor, but acts as
(43) Camptothecin: R1 = R2 = R~ = H
a prodrug of the active 7-ethyl-10-hydroxy-
(44) 10-hydroxycamptothecin: R1 = R2 = H, camptothecin (48) (67).
R~ = OH
(48) 7-ethyl-10-hydroxycamptothecin: R1 = C2H5, (47) Irinotecan: R1 = CzH5,R2 = H,
R~ = H, R3 = OH
(46) Topotecan: R1 = H, R2 = CH2-N(CH3)2, 1I - N ' J - N ~
R3 = O - C
R~ = OH
0
Interest in camptothecin gained new impe-
4.3 Paclitaxel and Docetaxel
tus in 1985, when it was discovered that the
compound exerts its antitumor activity Regarded as the tree of death by the Greeks
through a novel mechanism of action (64). and used to prepare arrow poison by the Celts,
Camptothecin binds to the covalent complex the yew tree has been associated with death
formed by topoisomerase I and DNA, which and poisoning for centuries (68, 69). The En-
initiates DNA replication and thus stabilizes glish yew, Taxus baccata, was used to make
the enzyme-DNA complex and prevents cell funeral wreaths and it was believed that one
proliferation. The elucidation of the mecha- could die by merely standing beneath the
nism of action provided a means of evaluating boughs of the tree.
camptothecin analogs as topoisomerase inhib- Yew certainly contains highly toxic metab-
itors in vitro and efforts then focused on syn- olites and their potency and fast duration of
thesizing water-soluble analogs with broad- action has often made extracts of yew the poi-
spectrum antitumor activities. The a-hydroxy son of choice for numerous murders and sui-
lactone (ring E) and, in particular, the 20(S)- cide attempts. It is thus ironic that extracts
form proved essential for maintaining biolog- from the Pacific yew, T. brevifolia, after being
tested in the National Cancer Institute's Phase I clinical trials were initiated in
(NCI) screening program during the 1960s, 1983, but these were to proceed at a slow and
yielded what was described (70) as the most tortuous pace and proved all but disastrous
exciting anticancer compound discovered in when the high levels of oil-based adjuvant
the previous 20 years; that is, paclitaxel (49) used to formulate paclitaxel caused severe al-
(originally given the name tax01 by Wall and lergic reactions in many volunteers. Un-
Wani). daunted by the formulation problem and
spurred on by paclitaxel's novel mechanism of
action, clinicians were able eventually to min-
imize the allergic events and demonstrate use-
ful activity. Phase I1 clinical trials began in
1985 despite continuing supply problems, and
4 years later the program received a signifi-
cant boost when McGuire et al. (74) reported
good responses from patients suffering from
refractory ovarian cancer, a disease that kills
some 12,500 women a year in the United
States alone.
In many ways, the development of pacli-
taxel mirrored that of the camptothecin ana-
The initial isolation and characterization of logs, both being dogged for many years by sup-
paclitaxel proved particularly difficult because ply issues, poor pharmacokinetics, and
of (1) its very low natural abundance in T. toxicity, but the subsequent uncovering of
breuifolia bark (although this was the best novel mechanisms of action fueled renewed ef-
known source, the isolated yield was only forts to develop these leads into important
0.02% w/w, equivalent to 650 mg per tree), (2) new anticancer agents (75).
the poor analytical data obtained from the pu- In 1991 Bristol-Myers Squibb in conjunc-
rified compound, and (3) the failure of pacli- tion with the NCI agreed to manage the sup-
taxel to give crystals that were suitable for plies of paclitaxel and were granted a licence to
X-ray analysis (71). The structure of paclitaxel further develop the compound. The following
was published in 1971 (72),but further biolog- year the U.S. Federal Drug Administration
ical testing continued to be troubled by diffi- approved paclitaxel for the treatment of ovar-
culties. The compound showed only modest in ian cancer in patients unresponsive to stan-
viuo activity in various leukemia assays, which dard treatments, and in December 1993 ap-
was no better than that displayed by a number proval was given for the treatment of
of other new compounds at the time. In addi- metastatic breast cancer.
tion to the limited supplies of paclitaxel (the The sourcing of paclitaxel from T. brevifo-
complexity of the molecule precluded chemical lia was a major problem (76) because to treat
synthesis), the compound was very poorly sol- just the groups of patients suffering ovarian
uble in water, which made formulation diffi- cancer in the United States would require
cult. However, various new assays were devel- about 25 kg of compound per year, necessitat-
oped in the 1970s, including the murine B16 ing the felling of some 38,000 trees (70)! Al-
melanoma model, in which paclitaxel showed though the Pacific yew is not a rare tree, it is
very good activity, and another boost came extremely slow growing and such harvesting
when Horwitz et al. (73) discovered that the could not be sustained indefinitely. It has been
compound prevented cell division by a unique estimated that there were enough trees avail-
mode of action. In contrast to the antimitotic able to maintain a supply of paclitaxel for only
vinblastine and podophyllotoxin analogs 2-7 years (77). The isolation of paclitaxel from
(q.v.), which prevent microtubule assembly, other Taxus species has been investigated at
paclitaxel inhibits cell division by promoting length and reasonable quantities have been
assembly of stable microtubule bundles, obtained from the needles of several species
which leads to cell death. including T. baccata. Using the needles has
4 Anticancer Drugs 863
alleviated the supply problem because they the C-13 ester side-chain can be tolerated.
can be harvested without damaging the tree. Thus, the N-t-(butoxycarbonyl)derivative, do-
However, the needles contain much higher cetaxel(54), which appears to be more potent
quantities of several biosynthetic precursors than paclitaxel (81) and has better solubility
of paclitaxel and two of these, baccatin I11 (50) characteristics, has been developed and
and 10-desacetylbaccatin I11 (51) have been launched by Aventis for the treatment of ovar-
used to prepare paclitaxel semisynthetically. ian, breast, and lung cancers.
One approach, developed by Potier et al. (781,
involved acylation of the sterically hindered
C-13 position of baccatin 111 with cinnamic
acid and subsequent double-bond functional-
ization through hydroxyamination, to give
paclitaxel together with various regio- and ste-
reoisomers. A better approach involved pro-
tection of 10-desacetylbaccatin I11 as the tri-
ethylsilyl ether, followed by direct acylation
with the phenylisoserine derivative (521, giv-
ing paclitaxel in 38% overall yield (79). Fur-
ther improvements were made using less
sterically demanding acylating reagents; for
example, acylation with the p-lactam (53)
gave paclitaxel in up to 90% yield (80) and this Various "protaxols," designed to release
may be the preferred method for commercial paclitaxel in situ under physiological condi-
production in the future. tions, have been prepared by acylating the
C-2' hydroxyl group. Nicolaou et al. (82) re-
ported the synthesis of the sulfone (551, which
is soluble and stable in aqueous media, but is
able to release paclitaxel rapidly in human
blood plasma.
EtO T ~ ~ ; Ph
\\
\
COPh
These semisynthetic approaches also pro- Plant tissue culture (70),microbial fermen-
vide access to analogs with potential advan- tation (83),and total synthesis (84,85)provide
tages over paclitaxel itself. Structure-activity other possibilities for the production of pacli-
studies have shown that, although the oxetane taxel and its derivatives, although it is far
ring appears to be essential for activity, wide from certain whether any of them will be com-
variation in the nature and stereochemistry of mercially viable.
OCOC~H~
4.4 Epothilones Bollag et al. (88)at the Merck Research Labo-

ratories discovered that the epothilones stabi-
Epothilones A (56) and B (57), 16-membered
lize microtubule assembly and thus inhibit
macrocyclic polyketide lactones, were first iso-
lated from the cellulose-degrading myxobac- cell division by the same mechanism as that of
terium Sorangium cellulosum by Hoefle, paclitaxel (see above). This observation, to-
Reichenbach, and coworkers (86) as narrow- gether with their less complex chemical struc-
spectrumantifungal and cy~otoxicmetabo- ture, increased water solubility, more rapid
lites. The compounds were then tested by the action in vitro, and effecti~enessagainst mul-
National Cancer Institute in the United States tidrug-resistant tumor cells, has prompted
and found to be highly active against breast significant interest in the epothilones as anti-
and colon cancer cell lines (87). Subsequently, cancer agents.
On learning the absolute stereochemistry
0.42 R of (56) and (571, three academic research
groups embarked on the total synthesis of
-0u / h,,,,
11
X
,,,,\\OH
the epothilones. Nicolaou, Danishefsky, and
Schinzer independently adopted successful,
elegant synthetic approaches involving olefin
metathesis, macrolactonization, Suzuki cou-
pling, or ester-enolate-aldehyde condensa-
0 OH 0 tion (89). Within 3 years of the disclosure of
their absolute stereochemistry, 17 different
(56) epothilone A: X = 0,R = H total syntheses of the natural products were
(57) epothilone B: X = 0,R = CH3 reported. These syntheses paved the way for
(59) BMS 247550: X = NH, R = CH3 the generation of a large number of epothilone
4 Anticancer Drugs 865
analogs for biological evaluation, including the The story has all the classic ingredients, start-
use of solid-phase combinatorial approaches. ing with observation and reasoning, extending
The academic groups focused on modifica- through chance into new areas, and character-
tions around the core macrocyclic lactone, es- ized throughout by persistence and determi-
tablishing important structure-activity rela- nation, particularly when biological activity
tionships, but not improving on the in vitro had to be traced to very minor constituents in
biological activity of the most active natural the crude plant extract.
product, epothilone B (57). In vivo biological Podophyllum peltatum (may apple, or
data were comparatively scarce and, although American mandrake) and P. emodi are. re-
one group reported that epothilones B (57) spectively, American and Himalayan plants,
and D (58) showed activity in murine tumor widely separated geographically but used in
models, researchers at Bristol-Myers Squibb both places as cathartics in folk medicine (94).
have shown that (58)lacks in vivo activity as a An alcoholic extract of the rhizome known as
result of rapid metabolic inactivation (90). It podophyllin was included in many pharmaco-
was postulated that esterase-mediated hydro- poeias for its gastrointestinal effects; it was
lysis of the macrocyclic lactone formed an in- included in the U.S.P., for example, from 1820
active ring-opened species and, therefore, ef- to 1942. At about this time the beneficial effect
forts were focused on replacing the lactone of podophyllin, applied topically to benign tu-
with a more stable macrocyclic lactam moiety. mors known as condylomata acuminata, was
Several macrocyclic lactam derivatives were demonstrated clinically (95). This usage was
synthesized from (57) and (58).Of note was not inspirational, given that there are records
the preparation of BMS-247550 (59) in a of topical application in the treatment of can-
three-step synthesis from epothilone B (571, cer by the Penobscot Indians of Maine and,
utilizing a novel Pd(0)-catalyzed ring-open- subsequently, by various medical practitio-
ing reaction followed by reduction and macro- ners in the United States from the 19th cen-
lactamization. BMS-247550 (59), which is in tury (96). The crude resinous podophyllin is
phase I clinical trials, retains its activity an irritant and unpleasant mixture unsuited
against human cancer cells that are naturally to systemic administration.
-
insensitive to ~aclitaxelor that have devel-
oped resistance to paclitaxel, both in vitro and
The first chemical constituent was isolated
from podophyllin in 1880 and named podo-
in vivo (91). phyllotoxin (97). A structure was proposed in
1932 and after some fine-tuning (98) was
shown to be the lignan (60). As might be ex-
pected, the crude resin contains a variety of
chemical types, including the flavonols quer-
cetin and kaempferol (99). Although these
other constituents undoubtedly have biologi-
cal activity, it is the lignans that have received
most attention and to which we shall devote
the remainder of this section.
Chemists at Sandoz in the early" 1950s rea-
(58) epothilone D soned that crude podophyllin might contain
lignan glycosides with anticancer activity,
which might be more water soluble and less
4.5 Podophyllotoxin, Etoposide,
toxic than podophyllotoxin (92). The reason-
and Teniposide
ing for the latter is not entirely clear, but in
The development of the natural constituents the event they proved to be correct in both
of Podophyllum Resin into effective semisyn- respects. careful isolation gave podophyllo-
thetic and, ultimately, totally synthetic com- toxin p-D-glucopyranoside (61) its 4'-des-
pounds for the treatment of various kinds of methyl analog (62) and some less important
cancer provides one of the most sustained and lignans lacking the B-ring hydroxy group
intriguing stories of drug discovery (92, 93). (100-102). Unfortunately, the sugar deriva-
this, water solubility was a problem with the

podophyllotoxin derivatives (63). Gastrointes-
tinal absorption was greatly improved, how-
ever, as was chemical stability (1041, and pos-
itive effects were observed in a few cancer
patients with the benzylidene derivative (64).
It was at this point that luck played a hand,
backed up by a good deal of determination. A
crude podophyllin fraction, which was simpler
and cheaper to prepare than pure podophyllin
glucoside, was also treated with benzaldehyde
to give a mixture of benzylidene derivatives,
about 80% of which was compound (64). The
(60) podophyllotoxin crude product was found to be more potent
than compound (64) and subsequently to pos-
tives were less active as inhibitors of cell pro- sess a different mode of action from that of the
liferation than were the aglycones, as well as lead compounds: rather than arresting cells in
less toxic; however, as expected, they were metaphase, cells were prevented from enter-
much more water soluble (92). While continuing mitosis altogether (105). The crude mix-
ing work to isolate more natural lignans, a ture was marketed for cancer treatment as
substantial program of structural modifica- Proresid.
tion of the known compounds was under-
taken, with a view to protecting the glucosides
from hydrolytic enzymes and also to improve
cellular uptake. Most of these changes were
ineffective: the per-acylated derivatives, for
example, were insoluble in water and had in-
ferior cytostatic effects (103).
(63) R1 = H,CH3 R2 = various alkyl, aryl

(64) R1 = CH3 R2 = C6H5
Improved biological assay methods (106)

indicated the presence of an unknown, highly
active constituent of Proresid. For example,
Proresid prolonged the life of mice inoculated
with L1210 leukemia cells (93), an effect that
was not observed with the known major con-
Condensation of the glucosides with a vari- stituent. In the early 1960s chromatographic
ety of aldehydes was more useful, in that not and spectroscopic techniques were not as
all the hydroxy groups were blocked. Despite highly developed as they are now and more
4 Anticancer Drugs
than 2 years' work was required to isolate and

identify the unknown component of the mix-
ture, which proved to be the 4'-desmethoxy-1-
epi analog (65) of the podophyllotoxin glu-
coside adduct (92). Present only in very small
amounts in the derivatized extract, it was nec-
essary to devise a synthesis from readily avail-
able materials. It was fortunate that the de-
sired 10 configuration was readily secured
from la-hydroxy-4'-desmethylpodophyllo-
toxin, itself obtained by selective demethyl-
ation of podophyllotoxin: the remainder of the
synthesis would now be considered fairly rou-
tine (107).
and preventing formation of microtubules

(105). Presumably this effect is sufficient to
account for the success of podophyllin in the
treatment of condylomata acuminata, al-
though the crude extract contains many other
candidates for a contribution to the biological
activity. As has been described, a very minor
component of the natural mixture, missing
the 4' hydroxy group, having the lp- instead of
the la-hydroxy configuration and with this
hydroxy group conjugated with p-D-glucose,
must be treated with an aldehyde to produce
the highly active and most important deriva-
tives. These derivatives do not bind to tubulin,
but have been shown to be inhibitors of topo-
isomerase 11, which may account for most of
the observed biological effects, including
Given a large supply of the key intermedi-
DNA strand breaks, that lead to anticancer
ate (66), it was straightforward to prepare a
activity (109).
number of aldehyde derivatives, resulting in
analogs with up to a 1000-fold increase in po-
4.6 Marine Sources
tency (108). The selected adducts were those
prepared from thiophen-2-aldehyde,giving te- Cytosine arabinoside (69), a synthetic analog
niposide (67), and from acetaldehyde, giving of the C-nucleosides spongouridine (70) and
etoposide (68). Both drugs are of value, etopo- spongothymidine (71) from the sea sponge
side in the treatment of small-cell lung cancer Cryptotheca cripta, was the first and, so far,
and testicular cancer, teniposide in the treat- the only marine-derived compound used rou-
ment of lymphomas and leukemias. The thio- tinely as an anticancer agent (110). However,
phene derivative is also of use in the treatment a number of chemically diverse natural prod-
of brain tumors (93). ucts from marine sources have been pro-
The natural products, podophyllotoxin and gressed to clinical trials. The three most ad-
its congeners, are "spindle poisons" that in- vanced compounds are in phase I1 trials;
hibit cell proliferation by binding to tubulin ecteinascidin-743 (721, a tetrahydroisoquino-
line alkaloid isolated from the mangrove as-

cidian Ecteinascidia turbinata, bryostatin-1
(731, a macrolide isolated from the bryozoan
Bugula neritina, and dolastatin-10 (74), a lin-
ear peptide from the sea mollusk Dolabella au-
ricularia. Ecteinascidin-743 (72) was first iso-
lated by Rinehart's group at the University of
Illinois (111)and has been licensed to Phar-
maMar (Zeltia), which plans, together with
Ortho Biotech, to launch the compound in late
2002 initially for the treatment of soft tissue
sarcoma (112).
(72) Ecteinascidin 743
tunately, supplies of ecteinascidin-743 (72)

and also bryostatin-1 (73) have been met by
aquaculture techniques (115),and more viable
synthetic routes are now available for (73)
(116) and dolastatin-10 (74) (117).
(69) Cytosine arabinoside
5 ANTIBIOTICS
In 1929 Alexander Fleming published the re-

sults of his chance finding that a Penicillium
mold caused lysis of staphylococcal colonies on
an agar plate (118). He also showed that the
culture filtrate, named penicillin, possessed
activity against important pathogens includ-
ing Gram-positive bacteria and Gram-nega-
tive cocci. However, it was not until 1940 that
the true therapeutic efficacy of penicillin was
revealed, when Chain et al. (119) successfully
tested the material in mice that had been pre-
(70) Spongouridine: R = H viously infected with a lethal dose of strepto-
(71) Spongothymidine: R = CH3 cocci. Several years later the precise chemical
structure of the main active component, ben-
Yields of marine-derived natural products zylpenicillin (75),was determined and efforts
are invariably low and supply problems have to synthesize the compound were initiated
delayed their development as useful pharma- (120). Benzylpencillin proved to be an elusive
ceutical agents. For example, over 3000 kg of target because of the instability of the p-lac-
the sea squirt E. turbinata is required to pro- tam ring: it was unstable under acid condi-
duce 3 g of ecteinascidin-743,sufficient forjust tions and was deactivated by p-lactamase en-
one cycle of treatment (113) and 1000 kg of B. zymes produced by various Gram-positive and
neritina yields 1.5 g of bryostatin-l(114). For- Gram-negative bacteria.
5 Antibiotics
(73) Bryostatin I
v
(74) Dolastatin 10
The discovery that the fused p-lactam nu- hydroxy group, but the compound is better ab-
cleus, 6-aminopenicillanic acid (6-APA) (76), sorbed by the gastrointestinal tract.
could be obtained from cultures of Penicillium Clavulanic acid (SO), isolated from Strepto-
chrysogenum led to the preparation of new, myces clavuligerus, is similar in structure to
semisynthetic derivatives with improved sta- the penicillins, except oxygen replaces sulfur
bility to gastric acid and p-lactamases, and in the five-membered ring (123). Clavulanic
with activity against a wider range of patho- acid has weak antibacterial activity, but is a
genic organisms (121). Sheehan (122) showed potent inhibitor of p-lactamases (124). A mix-
that compound (76) would react readily with ture of clavulanic acid and the p-lactamase-
acid chlorides to form new penicillin deriva- sensitive amoxycillin was introduced in 1981
tives with novel substituents at the 6-position. as Augmentin and has proved to be an effec-
Methicillin (77), with a sterically demanding tive combination to combat P-lactamase-pro-
2,6-dimethoxybenzamide side-chain, was the ducing bacteria (125). In 2001, 20 years after
first semisynthetic penicillin to show resistance its launch, Augmentin is the best-selling anti-
to staphylococcal p-lactamases, although the bacterial worldwide.
compound was still acid labile. Ampicillin (78) The clinical introduction of the penicillin
has an a-aminophenylatamido side-chain and group of antibiotics prompted an intensive
displays good activity against Gram-negativeor- search for novel antibiotic-producing organ-
ganisms, it is stable to acid and thus can be ad- isms and Selman Waksman demonstrated the
ministered orally, although it is susceptible to value of actinomycetes in this role, discovering
degradation by p-lactamases. Arnoxycillin (79) the aminoglycoside streptomycin (81) from
differs from ampicillin by the addition of a single Streptomyces griseus in 1943 (126). Pharma-
(78) R = COCHPh
I
ceutical companies also embarked on large

programs of screening soil samples for antibi-
otic-producing microorganisms (127). Chlor-
amphenicol(82) was isolated from Streptomy-
ces venezuelae in 1948 and other clinically
important antibiotics followed: chlortetracy-
cline (83), neomycin (84), oxytetracyclin (85),
erythromycin (86), oleandomycin (87), kana-
mycin (88),and rifamycin (89).
In 1948 Giuseppe Brotzu isolated the fun- arranged for Howard Florey at Oxford to re-
gus Cephalosporium acremonium from a wa- ceive a sample of the producing culture. Even-
ter sample collected off the coast of Sardinia. tually, an antibacterial substance was isolated
The culture showed significant antimicrobial and named cephalosporin C (90) (128). The
activity, but Brotzu could not interest the Ital- compound, which had a structure similar to
ian authorities in his discovery. He then that of the penicillins, except it had a dihy-
turned to a friend in England for help, who drothiazine ring fused to the p-lactam core,
5 Antibiotics
showed good resistance to p-lactamases and

was less toxic than benzylpenicillin. However,
plans to market the compound were termi-
nated with the introduction of methicillin (see tive bacteria, although the acetyl ester was
above). susceptible to degradation by esterases and
The discovery that the basic structural thus limited the duration of action. Replace-
building block of cephalosporin C, that is, ment of the acetoxy group by other substitu-
7-aminocephalosporanic acid (7-ACA) (911, ents rendered the products less prone to ester-
could be synthesized led to the preparation of ase attack. For example, the pyridinium
numerous cephalosporin derivatives in a sim- derivative, cephaloridine (95),has a longer du-
ilar way to the synthesis of penicillins from ration of action than that of cephalothin.
6-aminopenicillanic acid (129, 130). Modifica- The first orally active cephalosporin was
tion of the substituent at the 7-position, while cephaloglycin (96), which possessed a phenyl-
retaining the 3-acetoxymethyl group, gave glycine substituent in the C-7 side-chain, al-
cephalothin (92),cephacetrile (93),and cepha- though the labile 3-acetoxymethyl group was
pirin (94), so-called first-generation cephalo- retained. Replacing the acetoxy group with a
sporins with good activity against Gram-posi- proton or chlorine, for example, cephalexin
rin because it has a wider spectrum of activity,

which includes Gram-negative bacteria such
as Haemophilus influenme. Cephamandole
(101) and cefuroxime (102) are parenterally
administered cephalosporins with similar ac-
tivities against clinically important Gram-
negative bacteria and are also resistant to
many types of p-lactamases.
The newer third-generation cephalospo-
rins, including ceftazidime (103),ceftizoxime
(104),and ceftriaxone (105),which all contain
an a-arninothiazolyl group in the C-7 side-
chain, have been developed for treating spe-
cific pathogens such as Pseudomonas aerugi-
nosa. Thienamycin (106), isolated from
(89) Streptomyces cattleya in 1976, represented a
new class of P-lactam antibiotics produced by
(97), cefadroxil (98), cephradine (991, and ce- bacteria where the sulfur of the penicillin nu-
faclor (loo),extended the duration of action of cleus was replaced by a methylene group
these orally active products. Cefaclor has been (131). An N-formylimidoyl derivative, imi-
classified as a second-generation cephalospo- penem (107),was the first example from this
5 Antibiotics
COOH
qlH2
R1= COCHPh R2 = C1
NH2
I
R1 = COCHPh
R1 = COC /n S
NOC(CH3)2
I
COOH
new class of carbapenem antibiotics to become tions, as developed for the penicillins and
available for clinical use (132). Imipenem has cephalosporins, led to compounds with im-
a very broad spectrum of activity against most proved activity against both Gram-positive
Gram-positive and Gram-negative aerobic and and Gram-negative bacteria. A derivative con-
anaerobic bacteria. taining the a-aminothiazoyl group, a side-
Screening bacteria such as Pseudomonas chain component common to the third-gener-
acidophila and Chromobacterium uiolacium ation cephalosporins (see above), showed
for production of p-lactam antibiotics resulted specific activity against Gram-negative aero-
in the discovery of naturally occurring bic bacteria, including Pseudomonas spp., and
monobactams, which had moderate antimi- was stable to most types of p-lactamases. The
crobial activity (133-135). Side-chain varia- compound aztreonam (108) became the first
RHN
'rfS\l
COOH
I
(90) R = COCH2CH2CH2CHNH2
(91) R = H (108)
(formerlyStreptomyces erythreus). As a broad-

(92) R = COCH2 spectrum antibiotic erythromycin has proved
invaluable for the treatment of bacterial infec-
tions in patients with p-lactam hypersensitiv-
(93) R = COCH2CN
ity and is also the drug of choice in the treat-
ment of infections caused by species of
(94) R = COCHz Legionella, Mycoplasma, Campylobacter, and
Bordetella (137).
(109) Erythromycin A, R = H
(114) Clarithromycin, R = CH3
Although safe and effective, erythromycin

is not a perfect antibacterial. The presence of
commercial1y monobactam and hydroxy groups suitably disposed with respect
showed a mode of action similar to that of the to the keto function at C-9 leads to the orma-
other p-lactam antibiotics by blocking bacte- f
rial cell wall synthesis (136). tion of a tautomeric mixture of hemiketals

(138). The 6.9-hemiketal (110) may be dehy-
drated in stomach acid to give the inactive A,
5.2 Erythromycin Macrolides
analog (111),which may undergo further ring
-
Erythromycin (109) was isolated, in 1952, closure to give the 9,12-tetrahydrofuran (112)
from a strain of Saccharopolyspora erythraea that is also inactive (139). The A, derivative
5 Antibiotics
(111)may be responsible for some gastroin- which reads with (2-methoxyethoxy)acetalde-

testinal disturbance (140). To avoid these hyde (143) to give dithromycin. Beckmann rear-
problems by increasing the stability to acid, rangement of the 9-oximefollowed by reduction
the 2'-stearate, estolate, and ethylsuccinate and methylation (144) gives azithromycin (1131,
esters have been prepared (141), but even which shows good activity against Gram-nega-
when the tablets are enteric-coated the bio- tive bacteria, including Haemophilus influen-
availability is erratic and relatively frequent zae. An alternative for prevention of cyclization
dosing is required (137). between the 9-keto and 6-hydroxy is to mask the
An understanding of the acid-catalyzed de- 6-hydroxy group. If the 6-hydroxy is methylated
composition of erythromycin has led to a variety (145), the result is clarithromycin (114), which
of semisynthetic derivatives with improved oral like (1131, has an improved pharmacokinetic
bioavailability (142). Reductive amination of profile compared with that of the parent mole-
the 9-keto function gives erythromycylamine, cule.
dised to a ketone (147). The loss of potency

that would ensue is compensated by two fur-
ther modifications, which improve binding,
formation of a carbamate at positions 11/12,
and extension with a heterocycle-substituted
side-chain. In ABT 773 a similar side-chain is
placed at position 6, with comparable results
(147).
5.3 Streptogramins
The streptogramins are produced by Strepto-
myces species and have been classified into two
groups: Group A are polyunsaturated macro-
cyclic lactones and Group B are cyclic
hexadepsipeptides. Both groups bind bacterial
ribosomes and inhibit protein synthesis at the
(113) Azithromycin elongation step and they act synergistically
against many Gram-positive microorganisms.
Both azithromycin and clarithromycin However, the naturally occurring strepto-
have been used for various bacterial infections gramins are poorly soluble in water and this,
for a number of years. Within the last decade, until recently, has limited their use to treat
resistance has emerged to a range of antibac- bacterial infections. New, water-soluble deriv-
terials, including the macrolides, arising from atives have been developed and the semisyn-
methylation of an adenine in the 23s ribo- thetic dalfopristin (116) and quinupristin
somal RNA target site, which prevents bind- (117) mixture (Synercid) has been approved
ing (146). The invention of the ketolides [e.g., for the treatment of Gram-positive infections,
telithromycin (115)l overcomes MLS, resis- including multidrug-resistant strains of En-
tance by removing the L-cladinose moiety at terococcus faecium, Staphylococcus aureus,
position 3: the exposed hydroxyl is also oxi- and S. pneumoniae (148).
(1 15) Telithromycin
5 Antibiotics 877
water soluble (150), despite the hydrogen-

bonding ability of the polyhydroxylated
hexapeptide.
I
(116) Dalfopristin
5.4 Echinocandins
The fungal metabolite echinocandin B (118)is
one of the lipopeptides, in which a cyclic
hexapeptide is combined with a long-chain (118) echinocandin B, R = linoleyl
fatty acid. Echinocandin B inhibits p-1,3-glu-
can synthesis and as a result has anti-Candida Synthesis of the cyclic hexapeptide is unat-
and anti-Pneumocystis carinii activity (149). tractive for the purpose of securing analogs
As a group, the echinocandins are not orally with improved biological activity because of
bioavailable, are haemolytic, and are not very the unusual nature of the amino acids used
(117) Quinupristin
and the complex stereochemistry generated by now in clinical trial and has the major advan-
the high degree of hydroxylation. However, tage of oral bioavailability (153). Many other
echinocandin B can be produced efficiently by antifungal peptides are under investigation
fermentation of a culture of Aspergillus nidu- (152). The member of this series that is fur-
lans and then deacylated by fermentation thest advanced is caspofungin (MK-991,
with Actinoplanes utahensis (151). The free L-743,872) (121),following its approval by the
amino group thus exposed can be derivatized FDA, early in 2001, for the treatment of as-
with a number of active esters. Synthesis of pergillosis. The two analogs, LY-303366 and
the amide from 4-octylbenzoic acid gives cilo- caspofungin, have been compared against clin-
fungin (119), which has specifically high po- ical fungal isolates i n vitro (154) and the latter
tency against Candida albicans and some has been evaluated in immunosuppressed
other Candida species (151). mice (155).
6 CARDIOVASCULAR DRUGS
(119) cilofungin, R -
6.1 Lovastatin, Simvastatin, and Pravastatin

For systemic use cilofungin had to be given One of the most significant natural product
intravenously and unfortunately ran into discoveries in the last 25 years has been a fun-
problems associated with the cosolvent (PEG) gal secondary metabolite called lovastatin
(152). A better derivative, LY-303366 (120) is (122). Heralded as a major breakthrough in
OH
(121) caspofungin
6 Cardiovascular Drugs 879
the treatment of coronary heart disease (1561, group, naming the compound compactin, re-
lovastatin was introduced onto the market by ported its antifungal activity but failed to re-
Merck in 1987 for the treatment of hypercho- veal its mode of action as an inhibitor of HMG-
lesterolemia, a condition marked by elevated CoA reductase. The search for naturally
levels of cholesterol in the blood. occurring inhibitors of HMG-CoA reductase
gained pace and after spending several years
developing appropriate screens, Merck found
during only the second week of testing a cul-
ture of Aspergillus terreus that displayed in-
teresting inhibitory activity (160). In Febru-
ary 1979 the active component, lovastatin
(mevinolin), was isolated and characterized
(1611, and in November the following year
Merck was granted patent protection in the
United States. Although lovastatin proved to
be identical to monocolin K, a metabolite iso-
lated earlier from Monasus ruber (162), the
chemical structure of the latter compound had
not been reported, whereas Merck filed for
patent protection giving complete structural
Lovastatin works by inhibiting 3-hydroxy- details for lovastatin.
3-methylglutaryl coenzyme A (HMG-CoA) re- The discovery of compactin and lovastatin
ductase, a key rate-limitingenzyme in the cho- prompted efforts to develop derivatives with
lesterol biosynthetic pathway. However, the improved biological properties (163, 164).
first specific inhibitors of this enzyme were Modification of the methylbutyryl side chain
discovered several years earlier by Endo et al. of lovastatin led to a series of new ester deriv-
at Sankyo (157). The compounds, which are atives with varying potency and, in particular,
structurally related to lovastatin, were iso- introduction of an additional methyl group a
lated from Penicillium citrinum and shown to to the carbonyl gave a compound with 2.5
block cholesterol synthesis in rats and lower times the intrinsic enzyme activity of lova-
cholesterol levels in the blood. Development of statin (165). The new derivative, named sim-
the most active compound, designated ML-236B vastatin (124), was the second HMG-CoA re-
(123),is believed to have been curtailed because ductase inhibitor to be marketed by Merck.
of toxicity problems (158). Both lovastatin and simvastatin are prodrugs
and are hydrolyzed to their active open-chain
dihydroxy acid forms in the liver (166). A third
compound, pravastatin (125), launched by
Sankyo and Squibb in 1989, is the open hy-
droxyacid form of compactin that was first
identified as a urinary metabolite in dogs.
Pravastatin is produced by microbial biotrans-
formation of compactin.
The HMG-CoA reductase inhibitors de-
scribed above bind to two active sites on the
enzyme: the hydroxymethylglutaryl binding
domain and an adjacent hydrophobic pocket to
which the decalin moiety binds (167).The rec-
ognition that the ring-opened hydroxy acids
resemble mevalonic acid and that the decalin
Brown et al. a t Beechams also reported moiety could be replaced by 4-fluorophenyl-
the isolation of (1231, but as a metabolite substituted heterocycles led to the launch of
from Penicillium brevicompactum (159). The several new products including fluvastatin
(127) Cerivastatin
(126), the ill-fated cerivastatin (127), and the

so-called turbostatin atorvastatin (128). Al-
though cerivastatin was withdrawn from the
market in 2001 because of fatal adverse drug-
drug interactions, the "statins" remain one of
the fastest growing segments of the pharma-
ceutical industry. The latest member of this
group of cholesterol-lowering drugs, Astra- (128) Atorvastatin
(126) Fluvastatin (129) Rosuvastatin

6 Cardiovascular Drugs 881
Zeneca's rosuvastatin (129), is due to be way for other ACE inhibitors, such as enala-
launched in 2002 and is forecast to achieve pril(132) and lisinopril, which have had a ma-
sales of US $2.8 billion by 2005 (168). jor impact on the treatment of cardiovascular
disease (173).
6.2 Teprotide and Captopril
While studying the physiological effects of
snake poisoning, Ferreira (169) discovered
that specific components in the venom of the
pit viper Bothropsjararaca inhibited degrada-
tion of the peptide bradykinin and potentiated
its hypotensive action. The "potentiating fac-
tors" proved to be a family of peptides that
worked by inhibiting the dipeptidyl carboxypep-
tidase, angiotensin-converting enzyme (ACE)
(170,171). In addition to catalyzing the degra-
dation of bradykinin, ACE also catalyzes the
conversion of human prohormone, angiotensin
1, to the potent vasoconstrictor odapeptide, an-
giotensin 11.However, the significance ofACE in
the pathogenesis of hypertension was not fully
appreciated until the 1970s after Ondetti et al.
(172) had first isolated and then synthesized
the naturally occurring nonapeptide, tepro- 6.3 Adrenaline, Propranolol, and Atenolol
tide (130). The compound proved to be a spe- The true clinical potential of P-adrenoceptor
cific potent inhibitor of ACE and showed ex- blocking agents for treating angina, atrial fi-
cellent antihypertensive properties in clinical brillation, and tachycardias was first recog-
trials, although its use was limited by the lack nized by James Black and colleagues at ICI
of oral activity. (174).Black noted a report from Neil Moran of
Emory University in 1958, showing that di-
Pyr -Trp-Pro-Arg -Pro-Gln-Ile-Pro-Pro chloroisoprenaline antagonized the effects of
adrenaline on heart rate and muscle tension.
The first effective P-adrenoceptor blocker,
pronethalol (133), was synthesized 2 years
The discovery of teprotide led to a search later by the ICI group and marketed for lim-
for new, specific, orally active ACE inhibitors. ited use in 1963. Toxicity problems soon led
Ondetti et al. (172) proposed a hypothetical pronethalol to be replaced by the 1-naphthyl
model of the active site of ACE, based on anal- analog, propranolol (134), which became the
ogy with pancreatic carboxypeptidase A, and first P-adrenoceptor antagonist approved for
used it to predict and design compounds that general use, being more potent and yet devoid
would occupy the carboxy-terminal binding of the partial agonist or intrinsic sympathomi-
site of the enzyme. Carboxyalkanoyl and mer- metic activity shown by many other analogs.
captoalkanoyl derivatives of proline were Compounds with improved selectivity for the
found to act as potent, specific inhibitors of P-adrenoceptor of cardiac muscle (P-l-adreno-
ACE and 2-~-methyl-3-rnercaptopropanoyl-~-
proline (131) (captopril) was developed and
launched in 1981 as an orally active treatment
for patients with severe or advanced hyperten-
sion. Captopril, modeled on the biologically ac-
tive peptides found in the venom of the pit
viper, made an important contribution to the
understanding of hypertension and paved the (133) Pronethalol
ceptor blockers) were to follow, including olite of coumarin (137), itself a common com-
atenolol (135), which became the most fre- ponent of Melilotus sp. Soon after the
quently prescribed P-blocker and one of the compound had been identified, trials were ini-
best-selling drugs of the time. tiated that confirmed the oral anticoagulant
activity in humans and in 1942 it was mar-
keted under the name dicoumarol (177). The
compound had a slow, erratic onset of action
and efforts were initiated to prepare synthetic
analogs that acted faster and had longer dura-
tion of action. A 4-hydroqcoumarin residue,
substituted at the 3-position, proved essential
for biological activity and in 1948, after syn-
thesizing over 150 compounds, a Chydroxy-
coumarin derivative that was longer acting
(134) Propranolol
and more potent than dicoumarol was selected
not for clinical use, but as a rodenticide for
development by the Wisconsin Alumni Re-
search Foundation! The compound (138),
(135) Atenolol
6.4 Dicoumarol and Warfarin

Sweet clover has a long history of medicinal
use, often as an antiflammatory or analgesic
preparation in the form of ointments and
poultices. Melilotus officinalis (yellow sweet
clover, or ribbed melilot) was reputed to have
been a favorite herbal treatment used by King
Henry VIII of England and the plant is still
referred to as King's Clover in some publica-
tions (175).
The plant flourishes in poor soil &d was
cultivated extensively in Europe for cattle fod-
der and for soil improvement. In the early
1920s M. officinalis was planted on the prai-
ries of North Dakota and Alberta, Canada, but
with disastrous consequences. Soon cattle and
sheep throughout these regions began literally
bleeding to death. The mysterious hemor-
rhagic disease was traced to clover fodder that
had not been stored properly and had become
"spoiled," or moldy. However, the insolubility
of the anticoagulant component and the diffi- named warfarin (an acronym derived from the
culty of assaying extracts for biological activ- name of the institute coupled with "arin" from
ity made the task of isolating the active prin- coumarin), became a household name for rat
cipal component intractable (176). It took poison. Concern over the use of oral antico-
almost 20 years before the compound was agulants and the inherent risk of hemor-
identified as 3,3'-methylenebis(4-hydroxycou- rhage inhibited the development of warfarin
marin) (136), an oxidative degradation metab- as a therapeutic agent. However, in 1951, a
7 Antiasthma Drugs
U.S.Army cadet unsuccessfully attempted smooth muscle and protected the animals
to commit suicide by taking massive doses of against allergen-induced bronchospasm (183).
the compound. The incident prompted fur- A clinical pharmacologist on Benger's staff,
ther clinical trials that resulted in warfarin who suffered from chronic asthma, questioned
being used as the anticoagulant of choice for the validity of the animal model and decided
prevention of thromboembolic disease (177). instead to test the compounds on himself. He
The mode of action of the coumarin antico- then prepared a "soup" of guinea pig fur, in-
agulants involves blocking the regeneration of haled the vapors to induce a reproducible
reduced vitamin K and induces a state of func- asthma attack, and assessed the effects of the
tional vitamin K deficiency, thus interfering synthesized khellin derivatives. Many of the
with the blood-clotting mechanism (178). compounds first prepared were insoluble in
water and caused nausea and other unpleas-
ant side effects when taken orally. This led to
7 ANTIASTHMA DRUGS
the test compounds being formulated as aero-
sol sprays and in 1958, an aerosol preparation
7.1 Khellin and Sodium Cromoglycate
of a chromone-2-carboxylic acid derivative
The toothpick plant, Ammi visnaga, had been (140) was found to exert a protectant effect,
used for centuries in Egypt as an antispas- albeit short lived, against bronchial allergen
modic agent to treat renal colic and ureteral challenge without showing the bronchodilator
spasm. In 1879 one of the plant's main constit- activity seen with other compounds. The com-
uents was isolated, crystallized, and named pound was completely inactive in the guinea
khellin (139) (179). Subsequently, the pure pig asthma model and afforded its protectant
compound was shown to relax smooth muscle effect in humans only when inhaled as an
and in 1938 the chemical structure was char- aerosol.
acterized as a chromone derivative (180). In
1945 a medical technician took khellin to treat
renal colic and found instead that it acted as a
potent coronary vasodilator and relieved his
angina (181). This chance discovery, together
with earlier observations, led to khellin being
used as a coronary artery vasodilator and for
treating bronchial asthma (182). However, its
clinical use was severely limited by some un-
pleasant gastrointestinal side effects.
About two new compounds were tested
each week and in 1965, after synthesizing
some 670 analogs, a bischromone was pre-
pared that gave good protection, even when
inhaled up to 6 h before bronchial allergen
challenge (184). The compound sodium cro-
moglycate (141) was obtained by condensing
diethyl oxalate with the bis(hydroxy acetophe-
none) (142) and cyclizing the resultant
bis(2,4-dioxobutyric acid) ester (143) under
acidic conditions (185). The essential chemical
features required for activity appeared to be
Five years later, a small British pharma- the coplanarity of the chromone nuclei, the
ceutical company, called Benger Laboratories, flexible dioxyalkyl link, and the carboxyl
initiated a program to synthesize khellin ana- groups in the 2-positions. It is believed to act
logs as potential bronchodilators for treating by stabilizing tissue mast cells against degran-
asthma, and had prepared a series of com- ulation, thereby preventing release of inflam-
pounds that relaxed guinea pig bronchial matory mediators (186).
Sodium cromoglycate entered clinical trials

in 1967 and emerged to become a first-line pro-
phylactic treatment for bronchial asthma.
The coronary dilator properties of khellin
have not been ignored and at least one suc-
cessful program was initiated to prepare ana-
logs for testing as potential antiangina drugs
(187, 188). Benziodarone (144) was the first
useful compound to emerge from the Labaz (144) benziodarone
laboratories in Belgium based on the benzofu-
ran ring system. However, the compound
control of supraventricular and ventricular
caused hepatotoxicity in man and was soon
arrhythmias during the 1980s (188).
superseded by amiodarone (145), a more po-
tent coronary dilator for treating angina. In 7.2 Ephedrine, Isoprenaline, and Salbutamol
1970 the first report of antiarrhythmic activ-
ity in the clinic was published (189) and ami- The Chinese have been using a plant extract
odarone became established for prophylactic known as ma huang to treat asthma and hay
7 Antiasthma Drugs
(145) amiodarone
fever for thousands of years. The extract is substituted by an ethanolamine side-chain.

prepared from several species of Ephedra, a The branched methyl substituent on the side-
small leafless shrub found in China. Following chain was associated with prolonged duration
experiments at the Peking Union Medical Col- of action (i.e., ephedrine), whereas aromatic
lege and then at the University of Pennsylva- hydroxylation (in isoprenaline) prevented
nia and the Mayo Clinic in the United States, penetration across the blood-brain barrier
the active ingredient, ephedrine (1461, was in- and thus prevented stimulation of the CNS
troduced into Western medicine in 1926 as an (191). However, 1,2-dihydroxy substituents
orally active bronchodilator for the treatment were found to promote enzymic degradation,
of acute asthma (190,191). and replacement of the 3-hydroxy group by a
hydroxymethyl substituent was required to
extend the duration of action. In 1969 salbu-
tarn01 (149) was launched by Glaxo as a long-
er-lasting, selective &-adrenoceptor agonist
for the treatment of bronchial asthma (194)
and, recently, a lipophilic ether analog, salme-
terol(150), was introduced with an even longer
duration of action that has potential advantage
in the prevention of nocturnal asthma.
Ephedrine is related to another natural
product that has been used to treat asthma,
that is, the adrenal hormone adrenaline (147)
(epinephrine). Adrenaline is a potent agonist
of both a-and P-adrenoceptors and thus pro-
duces arterial hypertension as an undesirable
side effect. In 1951 a synthetic alternative, iso-
prenaline (148), was introduced and for al-
most 20 years it was considered the drug of
choice for treating bronchospasm associated
with acute asthmatic attack (191). Isoprena-
line is a specific P-adrenoceptor agonist and,
although it has no vasoconstrictor activity, the
compound does have marked cardiac stimu-
lant properties and a short duration of action.
Ahlquist's concept (192) of two types of adre-
noceptor was developed further by Lands et al.
(193),who established the existence of PI- and
P,-adrenoceptor subtypes. Clear structure-ac-
tivity relationships emerged with the prepara- Despite the many chemical alterations that
tion of compounds related to adrenaline and have been carried out on the phenylethano-
ephedrine; the basic requirement for p-adre- lamine "template," the key chemical features
noceptor agonist activity was an aromatic ring associated with modern P-agonists can be seen
8 ANTIPARASITIC DRUGS
8.1 Artemisinin, Artemether, and Arteether

Artemisia annua (sweet wormwood, qing ha01
has been used in Chinese medicine for well over
1000 years. The earliest recommendation is for
the treatment of hemorrhoids, but there is a
written record of use in fevers dated 340 A.D.
Modern development dates from the isolation of
a highly active antimalarial, artemisinin (qing-
haosu), in 1972, and has been carried out almost
entirely in China. Much of the original literature
is therefore in Chinese, but there is an excellent
review on qinghaosu by Trigg (196) and an ac-
count of the uses ofA. annua (197).This section
is largely a summary of these two articles.
Artemisinin (152) is a sesquiterpene lac-
to have originated from the naturally occur- tone with an unusual peroxide bridge. One of
ring compounds, adrenaline and ephedrine. the earliest modifications involved catalytic
reduction of the peroxide, resulting in loss of
7.3 Contignasterol one oxygen and total loss of antimalarial activ-
ity (196) in the adduct (153). The role of the
The use of inhaled corticosteroids such as flu-
peroxide bridge in producing antimalarial ef-
ticasone propionate to treat asthma and rhini-
fects was not fully understood, but it appeared
tis has been well documented and will not be
essential for activity, so much of the early
repeated here. Less well known is an unusual,
work on analogs conserved this structural fea-
highly oxygenated marine-derived steroid
ture as an empirical finding. The mechanism
isolated from the sponge Petrosia contignata
that possesses a unique cyclic hemiacyl side-
chain (151). The compound was isolated by
Andersen and coworkers (195) at the Univer-
sity of British Columbia and found to possess
anti-inflammatory properties in vivo. Conti-
gnasterol is being developed by Inflazyme, in
collaboration with Aventis, for the treatment
of asthma and other inflammatory diseases
and has progressed to phase I1 clinical trials.
(152) artemisinin
HO"'
(151) Contignasterol (153)

8 Antiparasitic Drugs 887
of action of artemisinin has since been eluci- tivity. In the presence of acid, a highly reactive
dated (198, 199), although it is not without carbocation intermediate allows S,1-type
controversy (200, 201). The drug has a high substitution with a variety of nucleophiles.
affinity for hemozoin, a storage form of hemin For example, boron trifluoride catalyzes reac-
that is retained by the parasite after digestion tions with methanol and ethanol to give arte-
of hemoglobin, leading to a highly selective ac- mether (156) and arteether (1571, respec-
cumulation of the drug by the parasite. Arte- tively, two of the most important derivatives
misinin then decomposes in the presence of (196). Both are more potent than the parent
iron, probably from the hemozoin, and re- compound and have improved solubility in oil.
leases free radicals, which kill the parasite. Artemether has been chosen for development
The peroxide bridge is therefore a crucial part in the West under the name Paluther.
of the drug molecule, as was suspected from
structure-activity studies. Elucidation of the
mechanism of action has led to the synthesis of
a range of simple analogs capable of iron-cat-
alyzed decomposition, some of which have
good antimalarial activity (202).
In retrospect, it is not surprising that the
peroxide-bridged compound (154), isolated
from Artabotrys uncinatus, also has antima-
larial activity (197). Because peroxides of this
kind are likely to be formed from a variety of
precursors in dried plant material (see below),
(155) R=H
there may well be many more antimalarials of (156) R = CH3 artemether
this kind to be found. (157) R = CH2CH3arteether
(158) R = COCH2CH2COONasodium artesunate
Water solubility can be greatly improved by

the standard ploy of esterification with suc-
cinic acid and conversion to the sodium salt.
Applied to compound (155), this technique
gives sodium artesunate (158),a water-solu-
ble prodrug that may be given intravenously
(196). It may be assumed that hydrolysis oc-
curs in vivo to give back (155) as the active
antimalarial because (156) has been shown to
be unstable in aqueous solution and because
Artemisinin is an excellent antimalarial, analogous carboxylic acids with a nonhydro-
approximately equal in potency to chloro- lyzable ether link are relatively inactive.
quine, with a good therapeutic index except on There are two reasons for the great interest
the fetus. The preparation of semisynthetic being shown in artemisinin and its deriva-
derivatives has been stimulated primarily by a tives. First, there is little cross resistance with
requirement for improved solubility because Plasmodium falciparum between the mem-
artemisinin is relatively insoluble in both wa- bers of this series and the quinoline-based an-
ter and oil. timalarials like chloroquine (203). On the con-
Reduction of (152) with sodium borohy- trary, significant potentiation of effect is
dride occurs at the lactone carbonyl, leaving observed in combination with chloroquine an-
the peroxide intact (196, 197). The resulting alogs such as mefloquine (204). Second, the
cyclic hemiacetal, dihydroartemisinin (155), high lipid solubility of, for example, arte-
which is a more potent antimalarial than the mether ensures rapid penetration into the
parent compound, shows typical acetal reac- CNS, so these sesquiterpene lactones are first-
line drugs for the treatment of cerebral ma- during World War I, stimulating a major pro-
laria caused by P. falciparum (197), which is gram of research into synthetic analogs.
otherwise fatal.
It seems highly likely (205)that most of the
artemisinin found in dried plant material is
formed by autoxidation after the death of the
plant. From the medicinal chemist's point of
view this is unimportant, but some plant bio-
chemists might have doubts about the descrip-
tion of artemisinin as a "natural product." In
our view, air drying in sunlight is a natural,
although not a botanical, process. It is proba-
ble that many other plant-derived peroxides
are formed in a similar way.
Whole plant extracts often show promising (159) quinine
activity that may not be traceable to single
components. This is obviously not true of Ar- The chemical techniques available to chem-
temisia annua extracts, but it is interesting to ists in the period 1820-1920, although im-
note that other constituents, notably me- proving rapidly, did not allow a structure to be
thoxylated flavones, have potentiating effects proposed for quinine with any confidence: the
on the antimalarial activity of artemisinin first completely correct proposal (211) came in
(206). 1922 and was finally confirmed by total syn-
The reported effect of artemisinin on sys- thesis (212) as late as 1945. However, part
temic lupus erythematosus (196) is intriguing, structures were known, such as the 6-me-
given the history of use of quinine-type anti- thoxyquinoline moiety, from long before, and
malarials in this disease. were sufficient to allow the synthesis of mim-
ics. The first clinically successful mimics were
the 8-aminoquinolines.
8.2 Quinine, Chloroquine, and Mefloquine
In the early years of the 20th century, syn-
The use of Cinchona bark (e.g., Cinchona suc- thetic organic chemistry was a young disd-
cirubra) by South American indians to treat pline, largely governed by empirical rules.
fevers and the subsequent importation of the Progress toward synthetic analogs of complex
bark into Europe by Jesuit priests in the 17th natural structures was governed as much by
century is well known (207). At that time ma- synthetic feasibility as by a desire for close
laria was widespread, even as far north as mimicry. The first quinine analogs were,
eastern Scotland, and there was no effective therefore, a combination of the accessible
treatment for "the ague." Although quinine 6-methoxyquinoline part of the quinine struc-
(159) is not very potent or long acting, a good ture, with elements of the first successful an-
sample of Cinchona bark contains about 5% of timicrobial agents, such as 9-aminoacridine.
the alkaloid (208). This high concentration Nitration followed by reduction could be used
permitted genuinely therapeutic doses of bark -
to generate a number of new molecules from a
to be given and allowed the pure alkaloid to be variety of parent heterocycles. It is recorded
isolated (209) as early as 1820. During the (213) that 4-, 6-, and 8-aminoquinolines have
next 100 years quinine was the only effective antimalarial properties and, quite extraordi-
treatment for malaria known to Europeans. narily, two of these chemical classes are still
Without quinine, life in the tropics was impos- used today, have quite different uses as anti-
sible for those without natural immunity to malarials, and quite possibly have different
malaria. "One thing that was compulsory was modes of action.
the taking of five grains of quinine a -
The first of the 8-aminoauinolines to be in-
day. . . . And if you didn't take it and got ill traduced into medicine was pamaquine (1601,
your salary was liable to be stopped" (210). not long after World War I (214). Despite
Supplies of quinine to Europe were threatened greater toxicity than that of quinine, this class
8 Antiparasitic Drugs
of drugs was found to have radical curative As has been explained, the major stimulus
ability against the relapsing malarias. Several for research into synthetic antimalarials was
hundred analogs were tested during World not so much the therapeutic inadequacy of
War I1 and of these, primaquine (161) sur- quinine as the potential lack of availability in
vives to the present day for short-term use as a times of social upheaval. During World War 11,
radical curative (215). the United States encouraged the planting of
Cinchona in Costa Rica, Peru, and Ecuador
(216). The total synthesis of quinine was too
difficult in the 1940s and is unlikely to become
economically viable even in the new millen-
nium. This problem was partly overcome with
quinacrine, which was used widely in World
War 11, although quinacrine has the defects
described above. The conceptual derivation of
chloroquine (163) from quinacrine is obvious
(160) pamaquine and apparently happened twice, in Germany
and the United States, the latter about 10
years after the Germans had discarded the
drug as being too toxic! The story of the redis-
covery of chloroquine is fascinating, as an ac-
count of human muddle and misjudgment, fi-
nally leading to an extraordinarily valuable
drug (216).
(161) primaquine
Quinacrine (162) is an obvious embodi-

ment of the principle outlined above; as a de-
rivative of both quinine and 9-aminoacridine
it combined a known antimalarial with a
known antimicrobial. The result was a useful,
(163) chloroquine
relatively nontoxic antimalarial, although it
stained the skin and eyeballs yellow (216). De-
spite this side effect and a high incidence of Over decades of sublethal exposure the re-
gastrointestinal disturbance, quinacrine was sistance of all types of malaria has increased to
widely used during World War I1 by European a point where chloroquine no longer offers cer-
troops in East Asia. The availability of the retain protection (217). With the partial excep-
sults of medicinal chemistry research to both tion of quinine and dihydroquinine (218), re-
sides in wartime is a curious feature of anti- sistance to antimalarials had reached the
malarial development, highlighted below. stage at the time of the Vietnam war where
more research was required. The development
of mefloquine (164) was a continuation of the
World War I1 effort, with a gap of about 20
years. Resistance to chloroquine had devel-
oped widely during that period, but surpris-
ingly less so to quinine, given the obvious sim-
ilarities in structure. This observation
stimulated a reappraisal of quinolines, known
as quinoline methanols, which bear a hydroxy
(162) quinacrine (mepacrine) group on the a-carbon of a substituent at-
tached to the 4-position (219). Up to 1944, a

total of 177 quinoline methanols had been syn-
thesized and tested, resulting in one com-
pound (165) with activity superior to that of
quinine. In human volunteers there was a
high incidence of phototoxicity associated
with (165), so research on quinoline meth-
anols in 1944 had ceased in favor of the
4-amino series, which included chloroquine.
Reappraisal of about 100 of the World War I1
compounds confirmed the high activity and
phototoxicity of (165) and also showed the
high potency of an analog (1661, which had
reduced phototoxicity (219). These data, to-
-
gether with results from about 200 newer this position, so that duration of action was
prolonged, which was considered desirable.
compounds, fostered the belief that phototox-
icity was separable from antimalarial activity. Second, the W chromophore was enlarged,
Extensive evaluation of (166) in humans with which would increase the likelihood of drug-
chloroquine-resistant Plasmodium falcipa- induced photosensitivity. The phenyl sub-
rum infections showed promise, but with a sig- stituent was thus replaced by trifluoromethyl
nificant incidence of toxic reactions; the dose in the 2-position (220). Before the first such
required was also inconveniently large. derivatives were tested, further analogs were
Two hypotheses concerning the effect of prepared with an additional trifluoromethyl
the 2-phenyl substituent were proposed. One group on the benzene ring. This was serendip-
was that metabolic oxidation was blocked at itous because the first series of 2-trifluorom-
ethyl analogs had low potency and were also
photosensitizing. The series with two triflu-
oromethyl groups, one at position 2 and an-
other in the 6-, 7-, or 8-position were all potent
and free from phototoxicity (221). The most
potent was mefloquine (164), a very successful
drug but one that produces unacceptable CNS
effects in a small proportion of users (222);
parasite resistance has also been observed in
parts of Southeast Asia (217). There is now a
serious attempt by the World Health Organi-
zation to find new antimalarials.
Physicians are pragmatic when choosing
(164) mefloquine
therapy for patients whose suffering is not al-
leviated by accepted methods. A drug that has
been shown to be toxicologically safe may be
utilized in a new area for the flimsiest of rea-
sons. Thus Page (223) described his use of
quinacrine in two cases of lupus erythemato-
sus as being based on "[a] chance observation
. . . ," although he did not describe the obser-
vation that led to his decision. He did, how-
ever, record that quinine had been tried previ-
ously and "prevented extension of the
lesions," so this may have been the basis for
his rationale. In any event, the beneficial ef-
fects of quinacrine were remarkable and ap-
9 Conclusion 891
peared to be related to the degree of yellowing number of Streptomyces species, obtained

of the skin that, as described earlier, is a com- from all over the world (228).
mon side effect of the use of quinacrine in ma- The avermectins, particularly, have been
laria. the subject of intense commercial interest be-
Among Page's group of patients with lupus cause they possess potent activity against both
erythematosus were two with rheumatoid ar- nematode and arthropod parasites of livestock
thritis, whose symptoms also responded to (229). A full discussion of structure-activity
treatment with quinacrine. The following relationships would be out of place here, not
year, other physicians (224) conducted a trial least because the data are voluminous. so we
of quinacrine on a larger group of patients shall concentrate on the development of iver-
with rheumatoid arthritis; the results encour- mectin, which has been a major success.
aged Haydu (225) to test chloroquine on simi- Structural designation of avermectins is
lar patients, again with positive results. A year quaintly based on three series: A, B; a, b; and
later, two more physicians (226) compared 1, 2. These are illustrated diagrammatically.
quinacrine with chloroquine and found the Greater activity resides in the B series, with a
latter to be better tolerated, the majority of free OH at position 5. There is little difference
patients gaining some benefit. Both quina- in potency between the a and b series. In the
wine and chloroquine caused gastrointestinal more potent B series there are important dif-
disturbances, which led to a trial (227) of hy- ferences between the 1 series and the 2 series;
droxychloroquine (167), an unsuccessful anti- Bl is the more active orally, whereas B, is the
malarial but with less effect on the gut, thus more potent by injection. There are also differ-
allowing larger doses to be given. Hydroxy- ences in their spectrum of activity (230). The
chloroquine has remained part of the standard spectrum of activity was kept as broad as pos-
drug therapy for rheumatoid arthritis ever sible by hydrogenation of a mixture of aver-
since. mectins Bla and Blb to give ivermectin (169),
which contains at least 80% of 22,23-dihy-
droavermectin Bla and not more than 20% of
22,23-dihydroavermectin Blb.
Ivermectin was developed for, and has been
highly successful in, the treatment and control
of parasites in cattle, horses, sheep, pigs, and
dogs. Following studies in humans with river
blindness (onchocerciasis) (231-233), the de-
velopers of ivermectin (Mectizan) have partic-
ipated in a major program aimed at eradica-
(167) hydroxychloroquine
tion of the disease. The sufferers inhabit some
of the poorest parts of Africa and cannot pay
So far, the choice of quinine-like drugs to
for their treatment, so the drug has been do-
treat rheumatoid arthritis has been based on
nated by Merck and Co. Since 1996 more than
preliminary selection as antimalarials. Be-
20 million treatments have been given (234).
cause the two types of action are presumably
The drug does not kill the adult worms that
unconnected, there might be some value in a
cause onchocerciasis (2351, but is useful in in-
screening program aimed directly at rheuma-
terrupting the life cycle (236). Ivermectin is
toid disease.
also of value in treatment of scabies (237). A
8.3 Avermectins and Milbemycins
great deal of information on the biological as-
pects of the use of ivermectin has recently
There is no major distinction between the been summarized (238).
avermectins and milbemycins, which are
based on the same complex polyketide macro- 9 CONCLUSION
cycle (168): the avermectins are oxygenated at
(2-13 and bear a disaccharide on this oxygen. Natural product research has been the single
They have been isolated from cultures of a most successful strategy for discovering new
milbemycins R = H
In the avermectins the series are designated a s follows (Y = CH3):
A, Z = CH3
B,Z=H
a, X = CH(CH3)CH2CH3
b, X = CH(CH3)2
1, V-W =CH=CH
2, V-W = CH2CH(OH)
For further details of these descriptors, in the milbemycins, see Ref. 228.
I n ivermectin (169),V-W = CH2CH2,X = CH(CH3)CH2CH3(major) or

CH(CH3)2 (minor), Y = CH3 and Z = H
pharmaceuticals and has contributed dramat- valin, Reminyl) (170),originally isolated from
ically to extending human life and improving the bulbs of the Arnaryllidaceae family (snow-
clinical practice. As long as Nature continues drops, daffodils, etc.), which has found use in
to yield novel, diverse chemical entities pos- the symptomatic treatment of Alzheimer's
sessing selective biological activities, natural Disease (239). It is a reversible and competi-
products will play an important role as leads tive inhibitor of acetylcholinesterase that also
for new pharmaceuticals. An interesting re- interacts allosterically with nicotinic acetyl-
cent example is the alkaloid galantamine (Ni- choline receptors to potentiate the action of
9 Conclusion
'.J
H3C
3
=063
H3C
OH
(169) X = CH(CH3)CH2CH3
(major)or CH(CH& (minor)
agonists. By acting to enhance the reduced be tested against more biological targets (243,
central cholinergic function associated with 244, although this approach sometimes pro-
this disease, significant improvements in cog- duces more data than can be conveniently - in-
nition and behavioral symptoms have been ob- tegrated into a research program. An alterna-
served in patients. In this case it is the alkaloid tive view is that the elucidation of the
itself that is used as the active compound and biological effects of chosen compounds, in
it will be interesting to see whether develop- some detail, will yield insight into biological
ment leads to better drugs. There are as yet processes that may open avenues for medici-
relatively few publications in this area, al- nal chemistry research that is not based on
though Sanochemia is interested (240,241). pure chance. This view is based on the recog-
nition that secondary metabolites have been
produced and ruthlessly selected, by evolu-
tion, over a long period of time. Either way,
the medicinal chemist has a wonderful oppor-
tunity to continue utilizing the rich chemical
diversity offered by nature, as is shown in two
recent reviews that explore this topic in some
detail (245,246).
The best approach for the identification of
natural product leads is a matter of debate.
Some very inventive techniques have been
used in the bioassay-guided method; for exam-
ple, by spraying TLC plates with reactive me-
dia that respond by producing a color change
in the presence of an active compound. An al-
Over 90% of bacterial, fungal, and plant ternative is to use an ethnobotanical or ethno-
species are still waiting to be investigated pharmacological technique, whereby the accu-
(242). High throughput screening methods mulated wisdom of many generations of
will allow even greater numbers of samples to native plant users may be harnessed in the
search for better medicines for all. These two 12. J. W . Lewis, Adv. Biochem. Psychopharmacol.,
techniques may be combined, so that the na- 8, 123 (1974).
tive people describe the uses to which they put 13. J. Hughes, T. W . Smith, H. W . Kosterlitz, L. A.
the plant and the researchers devise a bioas- Fothergill, B. A. Morgan, and H. R. Morris,
say that is used to find the active components. Nature, 258,577-579 (1975).
The problem with any bioassay-guided tech- 14. J. A. H. Lord, A. A. Wakerfield, J. Hughes, and
nique, however, is that the inactive constitu- H. W . Kosterlitz, Nature, 267,495-499 (1977).
ents are not identified. This represents a con- 15. 0.Schaumann, Arch. Exp. Pathol. Pharmacol.,
siderable waste, given that the plant has had 196,109-136 (1940).
to be collected, preserved, and identified. An 16. See Fkf.7, pp. 209-301.
alternative view is that it is best to extract all 17. B. Cox, Curr. Rev. Pain, 4,448-498 (2000).
the constituents, with a view to screening in 18. B. Cox and J. C. Denyer, Expert Opin. Ther.
whichever way is appropriate, at that time or Pat., 8, 1237-1250 (1998).
in the future. With modern high-performance 19. A. G. Gilman, T . W . Rail, A. S. Nies, and P.
liquid chromatography facilities it is possible Taylor, Goodman and Gilman's The Pharma-
to reduce a plant to its secondary metabolites, cological Basis of Therapeutics, 8th ed., Perga-
as single compounds, in a few days: the prod- mon Press, New York, 1990, p. 550.
ucts are then able to be screened in a high 20. L. Lemberger, Clin. Pharmacol. Therap., 39,
throughput manner in an equally short time 1-4 (1986).
and the compounds can be reevaluated when 21. S. E. Sallan, N . E. Zinberg, and E. Frei,
new screens become available. One thing is N. Engl. J. Med., 293, 795-797 (1975).
certain: the variety of natural product struc- 22. R. K. Razdan, in P. Krogsgaard-Larsen, S.
tures, after perhaps 300 million years of natu- Brogger Christensen, and H. Kofod, Eds., Nat-
ral selection, far exceeds the bounds of human ural Products and Drug Development, Munks-
imagination, unlike the typical output from gaard, Copenhagen, 1984, pp. 486-499.
combinatorial chemistry! 23. L. Lemberger and H. Rowe, Clin. Pharmacol.
Ther., 18, 720-726 (1976).
REFERENCES 24. T. S. Herman, L. E. Einhorn, S. E. Jones, C.
1. G. M. Cragg, D. J. Newman, and K. M. Snader, Nagy, A. B. Chester, J. C. Dean, B. Furnas,
S. D. Williams, S. A. Leigh, R. T . Dorr, and
J. Nut. Prod., 60, 52 (1997).
T . E. Moon, N. Engl. J. Med., 300,1295 (1979).
2. R. Gerardy and M. H. Zenk, Phytochemistry,
32,79-86 (1993). 25. A. Ward and B. Holmes, Drugs, 30, 127-144
(1985).
3. M. J. Stone and D. H. Williams, Mol. Micro-
biol., 6,29-34 (1992). 26. W . A. Devane, F. A. Dysarz, R. M. Johnson,
L. S. Melvin, and A. C. Howlett, Mol. Pharma-
4. R. J. Bryant, Chem. Znd., 146-153 (1988).
col., 34,605-613 (1988).
5. C. E. Inturissi, M. Schultz, S. Shin, J. G.
27. W . A. Devane, L. Hanus, A. Breuer, R. G. Per-
Umans, L. Angel, and E. J. Simm, Life Sci., 33
twee, L. A. Stevenson, G. Griffin,D. Gibson, A.
(Suppl. 11, 773 (1983).
Mandelbaum, A. Etinger, and R. Mechoulam,
6. W . Sneader, Drug Discovery: The Evolution of Science, 258, 1946-1949 (1992).
Modern Medicine, John Wiley & Sons, Inc.,
28. N. Stella, P. Schweitzer, and D. Piomelli, Na-
New York, 1985, pp. 78-80 summarizes the
ture, 388, 773-778 (1997).
confusion surrounding the early work.
7. A. F. Casy and R. T . Pariitt, OpioidAnalgesics, 29. A. D. Khanolkar and A. Makryannis, Life Sci.,
Plenum, New York, 1986, p. 407. 65,607-616 (1999).
8. R. Grewe and A. Mondon, Chem. Ber., 81,279 30. A. Szallasi and V . Di Marzo, Trends Neurosci.,
(1948). 23,491-497 (2000).
9. See Ref. 7, p. 153. 31. S. H. Burstein, Pharmacol. Ther., 82, 87-96
10. K. W . Bentley and D. G. Hardy, Proc. Chem. (1999).
Soc., 220 (1963). 32. M. G. Bock, Drugs of the Future, 16,631-640
11. G. F. Blane, A. L. A. Boura, A. E. Fitzgerald, (1991) provides a succinct summary.
and R. E. Lister, Br. J. Pharmacol., 30, 11 33. R. S. L. Chang,V . J. Lotti, R. L. Monaghan, J.
(1967). Birnbaum, E. 0. Stapley, M. A. Goetz, G. Al-
References
bers-Schonberg, A. A. Patchett, J. M. Liesch, 53. J. B. Stenlake, R. D. Waigh, G. H. Dewar, R.

0. D. Hensens, and J. P. Springer, Science, Hughes, D. J. Chapple, and G. G. Coker, Eur.
230,177-179 (1985). J. Med. Chem., 16,515-524 (1981).
34. P. R. Dodd, J. A. Edwardson, and G. J. Dock- 54. R. Hughes in J. Norman, Ed., Clinics i n h a e s -
ray, Regul. Pept., 1, 17 (1980). thesiology, Vol. 3, W. B. Saunders, London,
35. R. B. Innis and S. H. Snyder, Proc. Natl. Acad. 1985, pp. 331-345.
Sci. USA, 77,6917-6921 (1980). 55. J. E. Caldwell, T. Heier, J. B. Kitts, D. P. Ly-
36. D. R. Hill, N. J. Campbell, T. M. Shaw, and nam, M. R. Fahey, and R. D. Miller, Br. J. An-
G. N. Woodruff, J. Neurosci., 7, 2967-2976 aesth., 63, 393-399 (1989).
(1987). 56. R. L. Noble, C. T. Beer, and J. H. Cutts, Ann.
37. R. A. Gregory, Bioorg. Chem., 8, 497-511 N. Y.Acad. Sci., 882-894 (1958).
(1979). 57. I. S. Johnson, H. F. Wright, and G. H. Svoboda,
38. J. Dunlop, Gen. Pharmacol., 31, 519-524 J. Lab. Clin. Med., 54,830 (1959).
(1998). 58. G. H. Svoboda, Lloydia, 24, 173 (1961).
39. P. S. Anderson, R. M. Freidinger, B. E. Evans, 59. A. C. Sartorelli and W. A. Creasey, Annu. Rev.
M. G. Bock, K. E. Rittle, R. M. Dipardo, W. L. Pharmacol., 9,51 (1969).
Whitter, D. F. Veber, R. S. L. Chang, and V. J. 60. K. Gerzon, "Dimeric Catharanthus Alkaloids,"
Lotti, Znt. Cong. Ser. Excerpta Med., 766 (Gas- in J. M. Cassady and J. D. Douros, Eds., Anti-
trin Cholecystokinin), 235-242 (1987). cancer Agents Based on Natural Product Mod-
40. M. A. Goetz, M. Lopez, R. L. Monaghan, els, Academic Press, New York, 1980, pp. 271-
R. S. L. Chang, V. J. Lotti, and T. B. Chen, J. 317.
Antibiot., 38,1633-1637 (1985). 61. J. B. Sorensen and H. H. Hansen, Znvestiga-
41. B. E. Evans, 2.Gastroenterol. Verh.,26, 269- tional New Drugs, 11,103-133 (1993).
271 (1991). 62. J . Mann, Murder Magic and Medicine, Oxford
42. B. E. Evans, K. E. Rittle, M. G. Bock, R. M. University Press, Oxford, 1992, pp. 213-214.
Dipardo, R. M. Freidinger, W. L. Whitter, G. F. 63. M. E. Wall, M. C. Wani, C. E. Cook, K. H.
Lundell, D. F. Veber, and P. S. Anderson, Palmer, H. T. McPhail, and G. A. Sim, J. Am.
J. Med. Chem., 31,2235-2246 (1988). Chem. SOC.,88,3888 (1966).
43. I. M. McDonald, Expert Opin. Therap. Pat., 11, 64. Y.-H. Hsiang, R. Hertzberg, S. Hecht, and L.
445-462 (2001). Lui, J. Biol. Chem., 260,14873-14878 (1985).
I. G. Marshall and R. D. Waigh in A. L. Harvey, 65. M. E. Wall andM. C. Wani, "Camptothecin and
Ed., Drugs from Natural Products, Ellis Hor- Analogues" in Human Medicinal Agents from
wood, Chichester, UK, 1993, pp. 131-151. Plants, ACS Symposium Series 534, American
H. King, J. Chem. Soc., 1381 (1935). Chemical Society, Washington, DC, 1993, pp.
149-169 and references therein.
P. Karrer in D. Bovet, F. Bovet-Nitti, and G. B.
Marini-Bettolo, Eds., Curare and Curare-like 66. W. D. Kingsbury, J. C. Boehm, D. R. Jakas,
Agents, Elsevier, Amsterdam, 1959, pp. 125- K. G. Holden, S. M. Hecht, G. Gallagher, M. J.
136. Caranfa, F. L. McCabe, L. F. Faucette, R. K.
Johnson, and R. P. Hertzberg, J. Med. Chem.,
P. G. Waser, Helv. Physiol. Pharmacol. ActaII
34,98-107 (1991).
(Suppl. VIII) (1953).
67. S. Sawada, S. Okajima, R. Aiyama, K. Nokata,
W. C. Bowman, Pharmacology of Neuromuscu-
T. Furuta, T. Yokokura, E. Sugino, K.
lar Function, J. Wright, Bristol, 1990. Yamaguchi, and T. Miyasaka, Chem. Pharm.
A. J. Everett, L. A. Lowe, and S. Wilkinson, Bull., 39,1446-1450 (1991).
J. Chem. Soc. Chem. Commun., 1020-1021 68. J. Caesar, The Battle for Gaul, Book 6, Section
(1970). 31, A. Wiseman and P. Wiseman translators,
R. B. Barlow and H. R. Ing, Br. J. Pharmacol. Chatto and Windus, London, 1980, p. 126.
Chemother., 3,298 (1948). 69. T. Bryan-Brown, Q. J. Pharm. Pharmacol., 5,
D. Bovet, F. Bovet-Nitti, S. Guarini, V. Longo, 205-219 (1932).
and R. Fusco, Arch. Intern. Pharmacol. Ther., 70. D. G. I. Kingston, "Tax01and Other Anticancer
88, 1-50 (1951). Agents from Plants" in J. D. Coombes, Ed.,
E. P. Taylor and H. 0.J. Collier, Nature, 167, New Drugs from Natural Sources, IBC Tech-
692 (1951). nical Services, 1992, pp. 101-108.
71. M. Suffness, "Taxol: From Discovery to Ther- 88. D. M. Bollag, P. A. McQueney, J. Zhu, 0.
apeutic Use" in J. A. Bristol, Ed., Annual Re- Hensens, L. Koupal, J. Liesch, M. Goetz, E.
port of Medicinal Chemistry, Vol. 28, Academic Lazarides, and C. M. Woods, Cancer Res., 55,
Press, New York, 1993, pp. 305-314, provides a 2325-2333 (1995).
good review of the discovery and development 89. For an excellent review of the "Chemical Biol-
of tax01 and related derivatives. ogy of Epothilones," see K. C. Nicolaou, F. Ros-
72. M. C. Wani, H. L. Taylor, M. E. Wall, P. Cog- changar, and V. Dionisios, Angew. Chem. Znt.
gon, and A. T. McPhail, J.Am. Chem. Soc., 93, Ed. Engl., 37, 2014-2045 (1998) and refer-
2325-2327 (1971). ences therein.
73. P. B. Schiff, J. Fant, and S. B. Horwitz, Nature, 90. R. M. Borzilleri, X. Zheng, R. J. Schmidt, J. A.
277,665-667 (1979). Johnson, S.-H. Kim, J. D. DiMarco, C. R. Fair-
74. W. P. McGuire, E. K. Rowinsky, N. B. Rosen- child, J. Z. Gougoutas, F.Y. F. Lee, B. H. Long,
hein, F. C. Grunbine, D. S. Ettinger, D. K. and G. D. Vite, J.Am. Chem. Soc., 122,8890-
Armstrong, and R. C. Donehower, Ann. Intern. 8897 (2000).
Med., 111,273-279 (1989). 91. F. Y. F. Lee, R. Borzilleri, C. R. Fairchild, S.-H.
75. M. E. Wall and M. C. Wani, Cancer Res., 55, Kim, B. H. Long, C. Reventos-Suarez, G. D.
753-760 (1995). Vite, W. C. Rose, and R. A. Kramer, Clin. Can-
76. G. M. Cragg, S. A. Schepartz, M. Suffness, and cer Res., 7, 1429-1437 (2001).
M. R. Grever, J. Nut. Prod., 56, 1657-1668 92. H. Stahelin and A. von Wartburg in E. Jucker,
(1993). Ed., Progress in Drug Research, Birkhauser-
77. D. G. I. Kingston, Pharmacol. Ther., 52, 1-34 Verlag, Basel, Vol. 33, 1989, pp. 169-266.
(1991). 93. H. Stahelin and A. von Wartburg, Cancer Res.,
78. L. Mangatal, M.-T. Adeline, D. Guenard, F. 51, 5-15 (1991) present a shorter and more
Gueritte-Voegelein, and P. Potier, Tetrahe- readable account.
dron, 45,4177-4190 (1989). 94. M. G. Kelly and J. L. Hartwell, J.Natl. Cancer
79. J. N. Denis, A. E. Greene, D. Guenard, F. Znst., 14,967-986 (1954).
Gueritte-Voegelein, L. Mangatal, and P. Po- 95. I. W. Kaplan, New Orleans Med. Surg. J., 94,
tier, J. Am. Chem. Soc., 110, 5917-5919 388 (1942).
(1988). 96. J. L. Hartwell and A. W. Schrecker in L. Zech-
80. C. Palomo, A. Arrieta, F. Cossio, J. M. Aizpu- meister, Ed., Progress in the Chemistry of
ma, A. Mielgo, and N. Aurrekoetxea, Tetrahe- Organic Natural Products, 1958, pp. 83-166
dron Lett., 31, 6429-6432 (1990). provide a detailed review of the earlier devel-
81. F. Gueritte-Voegelein, D. Guenard, F. Lavelle, opments and background.
M.-T. Le Goff, L. Mangatal, and P. Potier, 97. V. Podwyssotzki, Arch. Exp. Pathol. Pharma-
J. Med. Chem., 34,992-998 (1991). col., 13,29 (1880).
82. K. C. Nicolaou, C. Riemer, M. A. Kerr, D. Ride- 98. J. L. Hartwell and A. W. Schrecker, J. Am.
out, and W. Wrasidlo, Nature, 364, 464-466 Chem. Soc., 73,2909-2916 (1951).
(1993).
99. K. S. Pankajarnani and T. R. Seshadri, Proc.
83. A. Stierle, G. Strobel, and D. Stierle, Science,
Ind. Acad. Sci., 36A, 157 (1952) through
260,214-217 (1993).
Chem. Abstr., 48,2702 (1954).See Ref. 77 for a
84. K. C. Nicolaou, Z. Yang, J. J. Liu, H. Ueno, wider discussion.
P. G. Nantermet, R. K. Guy, C. F. Claibome, J.
Renaud, E. A. Couladouros, K. Paulvannan, 100. A. Stoll, J. Renz, and A. von Wartburg, J.Am.
and E. J. Sorensen, Nature, 367, 630-634 Chem. Soc., 76,3103-3104 (1954).
(1994). 101. A. Stoll, A. von Wartburg, E. Angliker, and J.
85. R. A. Holton, H. B. Kim, C. Somoza, F. Liang, Renz, J. Am. Chem. Soc., 76, 6413-6414
R. J. Biediger, P. D. Boatman, M. Shindo, C. C. (1954).
Smith, and S. Kim, J. Am. Chem. Soc., 116, 102. A. von Wartburg, E. Angliker, and J. Renz,
1597-1600 (1994). Helv. Chim. Acta, 40, 1331-1357 (1957).
86. G. Hofle, N. Bedorf, K. Gerth, and H. Reichen- 103. I. Jardine in J. M. Cassady and J. D. Douros,
bach, Ger. Pat. DE 91-4138042 (1993);Chem Eds., Anticancer Agents Based on Natural
Abstr., 120, 52841 (1993). Product Models, Academic Press, New York,
87. M. R. Grever, S. A. Schepartz, and B. A. Chab- Vol. 16, 1980, pp. 319-351 provides a useful
ner, Semin. Oncol., 19,622-638 (1992). review of the middle years.
References
104. H. Emmenegger, H. Stahelin, J. Rutschmann, 125. A. P. Ball, A. M. Geddes, P. G. Davey, I. D.

J. Rertz, and A. von Wartburg, Drug Res., 11, Farrell, and G. R. Brookes, Lancet, 1,620-623
3274333,459-469 (1961). (1980).
105. H. Stahelin, Planta Med., 22,336-347 (1972). 126. Ref. 6, pp. 321-324, provides an interesting ac-
106. H. Stahelin, Med. Exp. (Basel), 7, 92-102 count of this discovery.
(1962). 127. See Ref. 6, pp. 324-300.
107. M. Kulm and A. von Wartburg, Helv. Chim. 128. G. G . F. Newton and E. P. Abraham, Nature,
Acta, 52,948-955 (1969). 175,548 (1955).
108. C. KellerJuslen, M. Kuhn, A. von Wartburg, 129. H. J. Smith, "Design of Antimicrobial Chemo-
and H. Stahelin, J. Med. Chem., 14, 936-940 therapeutic Agents," in Smith and William's
(1971). Introduction to the Principles of Drug Design,
109. B. H. Long and A. Minocha, Proc. Am. Assoc. 2nd ed., Wright, London, 1988, pp. 285-288.
Cancer Res., 24,321 (1983). 130. E. H. Flynn, Ed., Cephalosporins and Penicil-
110. 0. J. McConnell, R. E. Longley, and F. E. lins Chemistry and Biology, Academic Press,
Koehn, in V. P. Gullo, Ed., The Discovery of New York, 1972.
Natural Products with Therapeutic Potential, 131. G. Albers-Schonberg et al., J. Am. Chem. Soc.,
Butterworth-Heinemann, Boston, p. 109 100,6491-6499 (1978).
(1994). 132. W. J. Leanza, K. J . Wildonger, T. W. Miller,
111. K. L. Rinehart, T. G. Holt, N. L. Fregeau, J. G. and B. G. Christensen, J. Med. Chem., 22,
Stroh, P. A. Keifer, F. Sun, L. H. Li, and D. G. 1435-1436 (1979).
Martin, J. Org. Chem., 55,4512-4515 (1990). 133. A. Imada, K. Kitano, K. Kintaka, M. Muroi,
112. Reuters News Service, 11 July (2001). and M. Asai, Nature, 289,590-591 (1981).
113. J. Adams and P. J. Elliott, Oncogene, 19,6687- 134. R. B. Sykes, C. M. Cimarusti, D. P. Bonner, K.
6692 (2000). Bush, D. M. Floyd, N. H. Georgopapadakou,
114. G. R. Pettit, F. Gao, D. Sengupta, J. C. Coll, W. H. Koster, W. C. Liu, W. L. Parker, P. A.
C. L. Herald, D. L. Doubek, J. M. Schmidt, J. R. Principe, M. L. Rathnum, W. A. Slusarchyk,
Camp, J. J. Rudloe, and R. A. Nieman, Tetra- W. H. Trejo, and J. S. Wells, Nature, 291,489-
hedron, 47,36013610 (1991). 491 (1981).
115. M. Jaspars, Chem. Znd., 51-55 (1999). 135. R. B. Sykes, D. P. Bonner, K. Bush, N. H. Geor-
gopapadakou, and J. S. Wells, J. Antimicrob.
116. E. J. Corey, D. Y. Gin, and R. Kania, J. Am.
Chemother., 8 (Suppl. E), 1-16 (1981).
Chem. Soc., 118,9202-9203 (1996).
136. R. B. Sykes and D. P. Bonner, "Monobactam
117. Y. Hamada, K. Hayashi, and T. Shiori, Tetra-
Antibiotics: History and Development," in
hedron Lett., 32,931-934 (1991).
J. D. Williams and P. Woods, Eds., Aztreonam,
118. A. Fleming, Br. J. Exp. Med., 10, 226-236 The Antibiotic Discovery for Gram-negative Zn-
(1929). fections, Royal Society Medicine International
119. E. Chain, H. W. Florey, A. D. Gardner, N. G. Congress Symposium Series No. 89, Royal So-
Heatley, M. A. Jennings, J. Orr-Ewing, and ciety Medicine, London, 1985, pp. 3-24.
A. G. Sanders, Lancet, 2,226-228 (1940). 137. N. Bahal and M. C. Nahata, Ann. Pharmaco-
120. Ref. 6, pp. 298-315, provides a good review of ther., 2 6 , 4 6 4 5 (1992).
the discovery and development of penicillin an- 138. J. Barber, J. I. Gyi, G. A. Morris, D. A. Pye, and
tibiotics. J. K. Sutherland, J. Chem. Soc. Chem. Com-
121. Ref. 19, pp. 1065-1085, summarizes the phar- mun., 1040-1041 (1990).
macological properties of the more important 139. P. Kurath, P. H. Jones, R. S. Egan, and T. J.
commercial penicillins. Perun, Experientia, 27,362 (1971).
122. J. C. Sheehan, "Molecular Modification in 140. S. Omura, K. Tsuzuki, T. Sunazuka, S. Marui,
Drug Design," in Advances in Chemistry Se- H. Toyoda, N. Inatomi, and Z. Itoh, J. Med.
ries, No. 45, American Chemical Society, Chem., 30,1941-1943 (1987).
Washington, DC, 1964, pp. 15-24. 141. L. D. Bechtol, V. C. Stephens, C. T. Pugh, M. B.
123. T. F. Howarth, A. G. Brown, and T. J. King, Perkal, and P. A. Coletta, C u r . Ther. Res., 20,
J. Chem. Soc. Chem. Commun., 266-267 610 (1976).
(1976). 142. H. A. Kirst and G. D. Sides, Antimicrob. Agents
124. C. Reading and P. Hepburn, Biochem. J.,179, Chemother., 33, 1413-1418 (19891, provide a
67-76 (1979). useful, brief review.
143. P. Luger and R. Maier, J. C ~ s tMol.

. Struct., 9, 161. A.W . Alberts et al., Proc. Natl. Acad. Sci. USA,
329 (1979). 77,39573961 (1980).
144. G. M. Bright, A. A. Nagel, J. Bordner, K. A. 162. A. Endo, J. Antibiot., 32,852-854 (1979).
Desai, J. N. Dibrino, J. Nowakowska, L. Vin- 163. A. W . Alberts, Am. J. Cardiol., 62, 105-155
cent, R. M. Watrous, F. C. Sciavolino, A. R. (1988),and references therein.
English, J. A. Retsema, M. R. Anderson, L. A. 164. S. M. Grundy, "HMG Co A Reductase Inhibi-
Brennan, R. J. Borovoy, C. R. Cimochowski, tors: Clinical Applications and Therapeutic
J. A. Faiella, A. E. Girard, D. Girard, C. Her- Potential," in B. M. Rifkind, Ed., Drug Treat-
bert, M. Manousos, and R. Mason, J. Antibiot., ment of Hyperlipidemia, Marcel Dekker, New
41,1029-1047 (1988). York, 1991, pp. 139-167.
145. S. Moromoto, Y . Takahashi, Y . Watanabe, and 165. W . F. Hoffrnann, A. W . Alberts, P. S. Ander-
S. Omura, J. Antibiot., 37, 187-189 (1984). son, J. S. Chen, R. L. Smith, and A. K. Willard,
146. For an overview, see R. Leclercq, J. Antimi- J. Med. Chem., 29,849-852 (1986).
crob. Chemother., 48,9-23 (2001). 166. E. E. Slater and J. S. MacDonald, Drugs, 36
147. S. Douthwaite and W . S. Champney, J. Anti- (Suppl. 3), 72-82 (1988).
microb. Chemother., 4 8 , l - 8 (2001). 167. C. E. Nakamura and R. H. Abeles, Biochemis-
148. G. Bonfiglio and P. M. Furneri, Expert Opin. t ~24,1364-1376
, (1985).
Invest. Drugs, 10,185-198 (2001). 168. Reuters News Service, 24 Sept (2001).
149. J. S. Tkacz i n J. Sutcliffe and N. H. Georgopa- 169. S. H. Ferreira, Br. J. Pharmacol., 24,163-169
padakou, Eds., Emerging Targets in Antibac- (1965).
terial and Antifungal Chemotherapy, Chap-
170. S. H. Ferreira, L. J. Greene, V . A. Alabaster,
man and Hall, New York, 1992, pp. 504-508.
Y . S. Bakhle, and J. R.Vane, Nature, 225,379-
150. J. M. Balkovec, R. M. Black, M. L. Hammond, 380 (1970).
J.V . Heck, R. A. Zambias, G. Abruzzo, K. Bar-
171. S. H. Ferreira, D. C. Bartelt, and L. J. Greene,
tizal, H. Kropp, C. Trainor, R. E. Schwartz,
Biochemistry, 9,2583-2593 (1970).
D. C. McFadden, K. H. Nollstadt, L. A. Pit-
tarelli, M. A. Powles, and D. M. Schatz, J. Med. 172. M. A. Ondetti, B. Rubin, and D. W . Cushman,
Chem., 35,194-198 (1992). Science, 196, 441-444 (1977).
151. R. Gordee and M. Debono, Drugs of the Future, 173. R. A. Maxwell and S. B. Eckhardt, Drug Dis-
14,939 (1989). covery: A Casebook and Analysis, Humana
Press, Clifton, N J , 1990, pp. 1 9 3 4 .
152. A. J. De Lucca, Expert Opin. Invest. Drugs, 9,
273-299 (2000). 174. Ref. 6, pp. 105-114, chronicles the develol;-
ment of the P-adrenoceptor blocking agents.
153. S. Y . Ablordeppey, P. Fan, J. H. Ablordeppey,
and L. Mardenborough, Curr. Med. Chem., 6, 175. D. Potterton, Ed., Culpeper's Colour Herbal,
1151-1195 (1999). Foulsham, London, 1983, p. 123.
154. M. A. Pfaller, F. Marco, S. A. Messer, and R. N. 176. K. P. Link, Harvey Lectures, Series 39, 1944,
Jones, Diagn. Microbiol. Infect. Dis., 30, 251- pp. 162-216.
255 (1998). 177. K. P. Link, Circulation, 19,97-107 (1959).
155. G. K. Abruzzo, C. J. Gill, A. M. Flattery, L. 178. See Ref. 19, pp. 1317-1322.
Kong, C. Leighton, J. G. Smith,V . B. Pikounis, 179. Mustafa,C. R. Acad. Sci. Paris, 89,442 (1879).
K. Bartizal, and H. Rosen, Antimicrob. Agents
180. E. Spath and W . Gruber, Ber. Dtsch. Chem.
Chemother., 44,2310-2318 (2000).
Ges., 71, 106 (1938).
156. E. E. Slater and J. S. McDonald, Drugs, Suppl.
181. G. V . Anrep and G. Misrahy, Gaz. Fac. Med.
3,72-82 (1988).
Cairo, 13,33 (1945).
157. A. Endo, M. Kuroda, and Y . Tsujita, J. Anti-
biot., 29, 1346-1348 (1976). 182. G. VAnrep,G. S. Barsourn, M. R. Kenawy, and
G. Misrahy, Lancet, 557-558 (1947).
158. D. J. Gordon and B. M. Riffind,Ann. Int. Med.,
107,759-761 (1987). 183. G. B. Kauffman, Educ. Chem., 21, 42-45
159. A. G. Brown, T . C. Smale, T . J. King, R. Hasen- (1984).
kamp, and R. H. Thompson, J. Chem. Soc. Per- 184. See Ref. 62, p. 192.
kin Trans. 1, 1165-1170 (1976). 185. H. Cairns, C. Fitzmaurice, D. Hunter, P. B.
160. P. R.Vagelos, Science, 252,1080-1084 (1991), Johnson, J. King, T . B. Lee, G. H. Lord, R.
gives a brief, chronological account of the dis- Minshull, and J. S. G. Cox, J. Med. Chem., 15,
covery o f lovastatin. 583-589 (1972).
References
186. See Ref. 19, pp. 630-632. inal and Pharmaceutical Chemistry, 7th ed.,
187. B. N. Singh, Am. Heart J., 106, 788-797 J. B. Lippincott, Philadelphia, 1977, pp. 247-
(1983). 268.
188. B. N. Singh, N . Venkatesh, K. Nademanee, 208. F. A. Fluckiger and D. Hanbury, Pharma-
M. A. Josephson, and R. Karman, Prog. Car- cographia, A History of the Principal Drugs of
diovasc. Dis., 31,249-280 (1989). Vegetable Origin, Met With in Great Britain
189. J. van Schepdael and H. Solvay, Presse Med., and British India, Mamillan, London, 1879,
78,1849-1855 (1970). pp. 361-362.
190. See Ref. 62, pp. 189-191. J . Pelletier and J. Caventou, Ann. Chim. Phys.,
XV, 292 (1820).
191. See Ref. 6, pp. 98-105.
192. R. P. Ahlquist, Am. J. Physiol., 153, 586-600
Anonymous, quoted by C. Allen, Tales from the
(1948).
Dark Continent, Warner, London, 1992, p. 30.
193. A. M . Lands, F . P. Luduena, and H. J . Buzzo, P. Rabe, Berichte, 55,522 (1922).
Life Sci., 6,2241-2249 (1967). R. B. Woodward and W . E. Doering, J. Am.
194. See Ref.173, pp. 333-348. Chem. Soc., 67,860 (1945).
195. D. L. Burgoyne, R. J. Anderson, and T . M . F. Schonhofer et al., 2.Physiol. Chern., 274, 1
Allen, J.Org. Chem., 57,525428 (1992). (1942).
196. P. I. Trigg, in H . Wagner, H . Hikino, and N. R. P. Miffdens, Naturwissenschaften, 14, 1162-
Farnsworth, Eds., Economic and Medicinal 1166 (1926).
Plant Research, Academic Press, London, Vol. See Ref. 19, pp. 988-991.
3, 1989, pp. 19-55. G. R. Coatney, Am. J. Trop. Med. Hyg., 12,
197. W. Tang and G. Eisenbrand, Eds., Chinese 121-128 (1963).
Drugs of Plant Origin, Springer-Verlag, Ber- P. Winstanley and P. Olliario, Expert Opin. In-
lin, 1992, pp. 161-175. vest. Drugs, 7,261-271 (1998).
198. S. R. Meshnick, A. Thomas, A. Ram, C.-M. X u , Anonymous, Bull. World Health Org., 61,
and H.-Z. Pan, Mol. Biochem. Parasitol., 49, 169-178 (1983).
181-190 (1991).
L. H. Schmidt, R. Crosby, J . Rasco, and D.
.99. S. R. Meshnick, Y.-Z. Yang, V . Lima, F. Vaughan, Antimicrob. Agents Chemother., 13,
Kuypers, S. Kamchonwongpaisan, and Y . 1011-1030 (1978).
Yuthavong, Antimicrob. Agents Chemother.,
37, 1108-1114 (1993). R. M . Pinder and A. Burger, J. Med. Chem., 11,
267 (1968).
00. P. L. Olliaro et al., Trends Parasitol., 17, 122-
126 (2001). C. J. Ohnmacht, A. R. Patel, and R. E. Lutz,
J. Med. Chem., 14,926 (1971).
01. G. H. Posner and S. R. Meshnick, Trends Para-
sitol., 17, 266-267 (2001). 222. J. E. Rosenblatt, Mayo Clin. Proc., 74, 1161-
02. For example: J . Cazelles et al., J. Chem. Soc. 1175 (1999).
Perkin Trans. 1, 1265-1270 (2000); M. D. Ba- 223. F . Page, Lancet, 755 (1951).
chi et al., Bioorg. Med. Chem. Lett., 8,903-908 224. A. Freedman and F. Bach, Lancet, 321 (1952).
(1998).
225. G. 0.Haydu,Am. J. Med. Sci., 225,71(1953).
33. J. Karbwang, K. N . Bangchang, A. Thanavibul,
D. Bunnag, T . Chongsuphajaisiddhi, and T . 226. J . Forestier and A. Certonciny, Rev. Rhum.
Harinasuta, Lancet, 340, 1245 (1992), report Mal. Osteoartic., 21, 395 (1954).
some clinical experience to support the data in 227. A. L. Scherbel, S. L. Schuchter, and J . W . Har-
Refs. 196 and 197. rison, Cleve. Clin. Q., 24, 98 (1957); see also
A. N. Chawira, D. C.Warhurst, B. L. Robinson, A. L. Scherbel, Am. J. Med., 75, 1 (1983).
and W . Peters, Trans. R. Soc. Trop. Med. Hyg., 228. H. G. Davies and R. H. Green, Chem. Soc. Rev.,
81,554-558 (1987). 20,211-269 (19911, provide structural details
G. D. Brown, personal communication. of a large number of analogs.
B. C. Elford, M. F. Roberts, J . D. Phillipson, 229. H. G. Davies and R. H. Green, Nut. Prod. Rep.,
and R. J. M . Wilson, Trans. R. Soc. Trop. Med. 3,87-121(1986).
Hyg., 81,434-436 (1987). 230. W . C. Campbell, M. H . Fisher, E. 0.Stapley, G.
A. I. White in C. 0. Wilson, 0. Gisvold, and Albers-Schonberg, and T . A. Jacob, Science,
R. F. Doerge, Eds., Textbook of Organic, Medic- 221,823-828 (1983).
231. K. Awadzi, K. Y. Dadzie, H. Schulzkey, 238. C. N. Burkhart, Vet. Hum. Toxicol., 42,30-35
D. R. W. Haddock, H. M. Gillies, and M. A. (2000).
Aziz, Ann. Trop. Med. Parasitol., 79,63 (1985). 239. L. J. Scott and K. L. Goa, Drugs, 60, 1095-
232. B. M. Greene, H. R. Taylor, E. W. Cupp, R. P. 1122 (2000).
Murphy, A. T. White, M. A. Aziz, H. Schulzkey,
S. A. Danna, H. S. Newland, L. P. Gold- 240. M. A. H. Mucke, J. Froehlich, and U. Jordis,
schmidt, C. Auer, A. P. Hanson, S.V. Freeman, WO 0032199 (2000).
E. W. Reber, and P. N. Williams, N. Engl. 241. U. Jordis, J. Froehlich, M. Treu, M. Hirnschall,
J. Med., 313, 133-138 (1985). L. Czollner, B. Kaelz, and S. Welzig, WO
233. F. A. Drobniewski, Microbiology Europe, 0174820 (2001).
24-28 (1993). 242. J. D. Coombes, Ed., New Drugs from Natural
234. F. 0 . Richards, E. S. Miri, M. Katabanva, A. Sources, IBC Technical Services, London,
Eyamba, M. Sauerbrey, G. Zea-Flores, K. 1992, pp. 59-62,93-100.
Korve, W. Mathai, M. A. Homeida, I. Mueller, 243. G. G. Yarbrough, D. P. Taylor, R. T. Rowlands,
E. Hilyer, and D. R. Hopkins, Am. J. Trop. M. S. Crawford, and L. Lasure, J.Antibiot., 46,
Med. Hyg., 66, 108-114 (2001). 535-544 (1993).
235. K. Awadzi, S. K. Attah, E. T. Addy, N. 0.
244. W. H. Moos, G. D. Green, and M. R. Pavia,
Opoku, and B. T. Quartey, Trans. R. Soc. Trop.
"Recent Advances in the Generation of Molec-
Med. Hyg., 93,189-194 (1999).
ular Diversity," in J. A. Bristol, Ed., Annual
236. B. A. Boatin, J. M. Hougard, E. S. Alley, Report of Medicinal Chemistry, Vol. 28, Aca-
L. K. B. Akpoboua, L. Yameogo, N. Dembele, demic Press, New York, 1993, pp. 315-324.
A. Seketeli, and K. Y. Dadzie, Ann. Trop. Med.
Parasitol., 92, S47S60 (1998). 245. Y.-Z. Shu, J. Nut. Prod., 61,1053-1071 (1998).
237. B. Leppard and A. E. Naburi, Br. J. Dermatol., 246. D. J. Newman, G. M. Cragg, and K. M. Snader,
143,520-523 (2000). Nut. Prod. Rep., 17, 215-234 (2000).
Index
Terms that begin with numbers transition state analogs, ADAM

are indexed as if the number were 650-651 geometriclcombinatorial
spelled out; e.g., "3D models" is Acetic acid search, 295
located as if it were spelled CML representation, 372 ADAMs, 652
"ThreeD models." Acetylcholine, 772 ADAPT, 53
cation-T bonding, 724-725 Adenosine deaminase inhibitors,
conformationally restricted 717
structure-based design, analogs, 697-698 mass-spectrometric screening
437-438 muscarinic agonist analogs, for ligands to, 604
A-80987 143-144 transition-state analogs,
structure-based design, 438, Acetylcholinesterase inhibitors, 750-752
439 718 X-ray crystallographic studies,
A-306552,675 CoMFA study, 58-59 482
Absolv program, 389 pseudoirreversible, 772-774 S-Adenosyl-L-homocysteine,
Absorption, distribution, metab- substrates from acetylcholine 733,740
olism, and excretion analogs, 697-698 S-Adenosyl-L-methionine, 733
(ADME) target of structure-based drug ADEPT suite, 225,226,237
as bottleneck in drug discov- design, 449-450 ADME studies, See Absorption,
ery, 592 volume mapping, 140 distribution, metabolism,
estimation systems, 389-390 X-ray crystallographic studies, and excretion (ADME)
molecular modeling, 154-155 482 ADMET studies, See Absorption,
Absorption, distribution, metab- N-Acetylcysteine distribution, metabolism,
olism, excretion, and toxic- toxicological profile prediction, excretion, and toxicity
ity (ADMET),389 838,840 (ADMET)
and druglikeness screening, 245 Acetylsalicylic acid (aspirin), ADP analogs, 763-764
and virtual screening filter 762-763 Adrenaline, 885,886
cascade, 267 Acquire database, 386 p-Adrenoreceptor antagonists,
ABT-418,808-810 Actimomycin D See p-Blockers
ABT-627,811-812 thermodynamics of binding to Afferent, 387
ABT-773,876 DNA, 183 Affinity calculation, 118-122
Academic databases, 387-388 Actinoplanes utahensis, 878 Affinity capillary electrophore-
Acarbose, 849 Active Analog Approach, 58,60, sis-mass spectrometry,
Accelrys databases, 384-385 639 599-600
Accord, 377,385 flow of information in, 146 Affinity chromatography-mass
Accord Database Explorer, 385 and molecular modeling, 134, spectrometry, 598599
Acebutol 151 Affinity grids, 292-293
renal clearance, 38 and systematic search, Affinity labels, 756-759,
ACE inhibitors, 718,881 144-145 760-764
asymmetric synthesis, 807,809 Active-site directed, irreversible Affinity NMR, 571
comparative molecular field inhibitors, 755 2-AG, 853
analysis, 151-153 Activity AG31
conformationally restricted binding affinity contrasted, structure-based design,
peptidomimetics, 640- 641 131-135 428-429
molecular modeling, 131, Activity-guided fractionation, AG85
132-133, 145 597 structure-based design, 428
multisubstrate analogs, Activity similarity, 255 AG331
746-748 Acyclovir, 717, 719, 756 structure-based design,
receptor-relevant subspace, 204 Acyl halides 428-429
structure-based design, filtering from virtual screens, AG337
432-433 246 structure-based design, 428
901
Index
Alprenolol 6-Aminopenicillanicacid
structure-based design, renal clearance, 38 (6-HA),869,870
431-432 Altracurium Aminopeptidases
AG2037 lead for drugs, 856-858 transition state analog inhibi-
structure-based design, AM1 wave function, 15, 102 tors, 652
431-432 AM404,854 8-Aminoquinolines, 888-889
Agenerase Arnaryllidaceue (snowdrops, daf- Amiodarone, 884,885
structure-based design, 440, fodils, etc.), 892 Amitriptyline, 692
441 Amastatin, 738 Ammi visnaga (toothpick plant),
Agent, 398 AMBER energy function, 264, 883
Aggregate concept, in molecular 307-308 Amoxycillin, 869,870
modeling, 90-91 performance in structure pre- AMP analogs, 764
AIDS, See HIV protease inhibi- diction, 315
Ampicillin, 869, 870
tors seeding experiments, 319
Amprenavir, 648,659
AIDS database, 386 AMBERIOPLS force field, 80,
structure-based design, 440,
AIMB, 255 81,103
ALADDIN, 259,363 in molecular modeling, 118 441
in molecular modeling, 111, American mandrake, drugs de- cu-Amylase
113 rived from, 865 X-ray crystallographic studies,
Alaninates Amides 482
binding to chymotrypsin, 35 enzyme-mediated asymmetric P-Arnyloid
Alanine racemase inhibitors, 717 bond formation, 804-805 X-ray crystallographic studies,
Alcohol dehydrogenase exchange ratesltemperature 482
QSAU studies, 5 coefficients, in NMR, 512 Analog design, 687-689
Alcohols pharmacophore points, 249 alkyl chain homologation, 689,
pharmacophore points, 249 Amines 699-704
QSAU studies, 27-29 pharmacophore points, 249, bioisosteric replacement and
Alcuronium, 856,857 250 nonisosteric bioanalogs,
Aldehydes Aminidine 689 - 694
filtering from virtual screens, pharmacophore points, 249 chain branching alteration,
246 Amino acid mimetics, 640 689
Aldose reductase inhibitors Amino acids fragments of lead molecule,
novel lead identification, 321 chemical modification re- 689,707-710
target of structure-based drug agents, 755 interatomic distances varia-
design, 447-449 chirality, 784 tion, 689, 710-712
Aldosterone, 746 classical resolution, 797 limitations of, 532
Alkyl amines conglomerate racemates, rigid or semirigid analogs,
polarization energy, 173 802-803 689,694-699
protonation, 179-180 Aminoacyl-tRNA synthetases ring-position isomers, 689
Alkyl chain homologation ana- binding of alkyl groups to, 8 ring size changes, 689
logs, 699-704 y -Aminobutyric acid amino- stereochemistry alteration and
Alkyl halides transferase (GABA-T) in- design of stereoisomers/geo-
filtering from virtual screens, hibitors, 488, 718, 766-767 metric isomers, 689,
246 y -Aminobutyric acid (GABA), 704-707
Allinger force field, 80 766 substitution of aromatic ring
Allosteric effectors analogs, 690 for saturated, or the con-
of hemoglobin, 421-424 geometric isomer analogs, verse, 689
and lock-and-key hypothesis, 5 705-706 Anandamide. 853
N-Allylmorphine, 850 molecular modeling, 149 Anchor and grow algorithm,
Almond program, 202 7-Aminocephalosporanic acid 294,296
Alogp, 390 (7-ACA),871,874 AND logical operator, 406
Alpha-amylase, 482 2-Amino-3-chlorobutanoic acid Androgen receptor
N-Alpha-(2-naphthylsulfonylgly- nonclassical resolution, 803 X-ray crystallographic studies,
cyl)-4-amidinophenylal[NA- Aminoglutethimide, 717 482
PAP] piperidide chromatographic separation, Angiotensin-converting enzyme
structure-based design, 442, 792 (ACE),650,881. See also
444 classical resolution, 798, 799 ACE inhibitors
Index
active site molecular model- Arabinose binding protein Aspergillus alliaceus, 855
ing, 131,132-133 genetic algorithm study of Aspergillus nidlans, 878
target of structure-based drug docking, 88-89 Aspergillus terreus, 879
design, 432-433 Arachidonic acid, 762, 763 Asperlicin
Angiotensin I, 432-433,746,881 Arecoline analogs, 693-694 fragment analogs, 708
Angiotensin 11, 432, 746, 881 Argatroban lead for drugs,855-856
non-peptide antagonists, structure-baseddesign, 442,444 Aspirin, 762-763,764
668-669,670 Arginase inhibitors, 736-737 Assay Explorer, 387
Anhydrides Arginine Association thermodynamics
filtering from virtual screens, chemical modification re- drug-target binding, 170-171,
246 agents, 755 177-179
Anomalous Patterson maps, 477 Aromatase inhibitors, 717, 770 Asymmetric centers, 784,785
Antiasthma drugs Aromatic-aromatic interactions, Asymmetric synthesis, 784,
natural products as leads, 286 804-820
883-886 Aromatics enzyme-mediated, 804-807
Antibacterial enzyme inhibitors, analogs based on substitution Asymmetric transformation
717 of aromatic for saturated crystallization-induced,
Antibiotic drugs ring; or the converse, 798-799
natural products as leads, 699-704 Atenolol, 882
868-878 growth inhibition by, 38 renal clearance, 38
Antibiotic resistant pathogens, molecular comparisons, 139 Atom-atom mapping, 380,398
770 ArrayExpress, 345 Atom-centered point charges,
Anticancer drugs Artabotrys uncinatus, 887 101-102
enzyme inhibitors, 717, 718 Arteether, 887 Atomic counts, 54
molecular modeling, 151 Artemether, 887-888 Atom list, 398
natural products as leads, Artemisia annua (sweet worm- Atom-pair interaction potentials,
858-868 wood), 886,888 120
Anticoagulant protein C Artemisinin, 849,886-888 Atom stereochemistry, 365,398
X-ray crystallographic studies, Artificial intelligence, 398 Atom-type E-State index, 26
482 Artificial neural networks Atorvastatin, 744, 880
Anticoagulants, 882-883 for druglikeness screening, ATP analogs, 763-764,765
Antifolate targets 247-248,250 Hf ,KC-ATPase inhibitors, 718
structure-based design, in molecular modeling, 126 Na+,K+-ATPaseinhibitors, 718
425-432 in QSAR, 53,62,67 Atracurium, 857,859
Antifungal enzyme inhibitors, for structural genomics study, Atrasentan (ABT-6271,811-812
717 353 Atrial natriuretic factor, 650
Antiparasitic drugs Arylsulfonamidophenethano- Atrolactate, 762
natural products as leads, lamine analogs, 703 Atropine
886-891 p-Arylthio cinnamide antago- lead for drugs, 851
Antiprotozoal enzyme inhibi- nists, 566-567 Augmentin, 869
tors, 717 ASCII (American Standard Code Aura-Mol, 388
Antisickling agents, 419-421 for Information Exchange), AUSPYX, 387
Antiviral enzyme inhibitors, 717 398 AutoDock
Aparnin Ascomycin affinity grids, 293
molecular modeling, 124 binding to FKBP, 552, explicit water molecules, 303
AP descriptors, 55, 56 553-554 flexible ligands, 263
Apex, 256,387 Asinex catalog, 385 Lamarckian genetic algo-
Apex-3D, 60 Aspartate transcarbamoylase rithm, 299
Application tier, 392,398,406 (ATCase)inhibitors, Monte Carlo simulated an-
Aquaporin 1 743-744 nealing, 297
X-ray crystallographic studies, Aspartic acid protein flexibility, 301
482 chemical modification re- Automap, 398
Aqueous solubility agents, 755 Available Chemicals Directory
and structure-based design, Aspartic peptidase inhibitors (ACD), 385,386
408 transition state analogs, virtual screening application,
Arabidopsis thaliana 647-649 254
genome sequencing, 344 virtual screening, 315 Avermectins, 891,892
Index
Azathioprine, 717 Bezafibrate, 422 BIRB-796

Azithromycin, 848,849, BIBP 3226,670,673 structure-based design,
875-876 BIBR 953 458-459
Aztreonam, 873-874 structure-based design, 443, Bisubstrate analog enzyme in-
448 hibitors, 741-742
Babel, 372 BIBR 1048 Bitset, 399
Baccatin 11,863 structure-based design, 442, BLAST (Basic Local Alignment
Backtracking, 382,399 443 Search Tool), 347
in virtual screening, 67 BIBU52,213 BLEEP potential, 312
Bacterial luminescence Bifenthrin, 41, 42 performance in structure pre-
inhibition by ROH, 27 Binary data, 373,399 diction, 314
Bacterial natural products, 848, Binary QSAR, 4 BLOB (Binary Large Object)
893 Binding afiinity, 286 data type, 399
Bacteriorhodopsin activity contrasted, 131-135 Blood substitutes
electron cryomicroscopy, 612 calculation, 118-122 structure-based design, 424
homology with GPCRs, 123, Binding constant, 286 BMS-193884,674,676
150 Binding property classes, 192 BMS-247550,864,865
Barnard Chemical Information, Binding site models, 130 Bohm scoring function, 264
388 Bioactivity databases, 386 Boltzmann law, 93,310311
Bcl-xL Bioafiinity screening Boltzmann probability, 94
target of spin-label NMR by electrospray FTICR MS, Boltzmann weighted average,
screening, 573-574 601-603 97,98
BCUT descriptors Bioavailability, 716 Bond stereochemistry, 365,
defined, 399 BioCatalysis database, 384 399-400
estimation systems, 388 Biochemical force fields, 175-176 Born-Oppenheimer approxima-
for molecular similarityldiver- Bioinformatics, 333-337 tion, 79
sity methods, 193-194, databases, tools, and applica- Born-Oppenheimer surface, 85
203-204 tions, 345-349 Bothropsj a r m a (pit viper), 881
with pharmacophore finger- defined, 399 Bovine liver DHFR, QSAR inhi-
prints, 223-224 and functional genomics, bition studies, 34
for target class-focused ap- 338340 Bovine pancreatic trypsin inhibi-
proaches, 228-229,232-233 future developments, 354 tor
BCX 1812 and sequences, 364 long range electrostatic ef- .
structure-based design, 452 standardization, 337 fects, 177
BDE index, 11 and structural genomics, Bradykin, 746,881
Beilstein Database, 362,385 352-354 targeted library design, 69, 70
Beilstein Online, 385 for target discovery, 335, BRIDGE, 113
Benign prostatic hypertrophy 338-345 BRN (Beilstein Registry Num-
expression probabilities, 343 Bioinformatics knowledge ber), 378, 400
Benzamidines model, 349-352 Bromoaspirin, 763
inhibitors of trypsin, 120 Bioisosteric replacement ana- Bromobutide, 42
Benzene logs, 689-694 Bromperidol
electron density, 135 Biological data, 399 HIV protease inhibitor, 111,
intermolecular interactions, Biological evaluation 112
174 in structure-based design, 419 Bryostatin-1,868,869
Benziodarone, 884 BioScreen NPISC, 385 Bufuralol
Benzomorphans, 850 BIOSTER database, 202,384 renal clearance, 38
Benzopyran Biosterism, 689-690 Bugula neritina, 868
enzyme-mediated oxidation, Biotin Building block hypothesis, 88
806 genetic algorithm study of Bupivacaine hydrochloride, 805,
Benzylpenicillin,868,870 docking to streptavidin, 89 806
(R)S-Benzylsuccinic acid, 746 interaction with streptavidin, classical resolution, 795, 796
Bestatin, 652, 654, 728 181-183 Buprenorphine, 850-851
p-Blockers, 881-882 Biotransformationsdatabase, 384 Business rules, 378,400,403
enantiomers, 786 Biphenyls BW12C
nonrenal clearance, 39 privileged structures, 252 antisickling agent, 419-420
renal clearance, 38 Biphenyl tetrazole, 231 BW1476U89,742-743
Index
CACTVS, 254 Carboxypeptidase A Cathepsin D

Calabash curare, 856 similarity of active site to combinatorial docking studies,
Calcineurin A ACE, 433,746-747 318
X-ray crystallographic studies, Carboxypeptidase inhibitors non-peptide inhibitors, 227
483 conformational changes on Cathepsin K
Cambridge Crystallographic Da- binding, 261 transition state analog inhibi-
tabase, 110 genetic algorithm study of ac- tors, 654-655, 656
Cambridge Structural Database, tive site, 89 Cation-P interactions, 286, 313
354,387 molecular modeling, 116 enzyme inhibitors, 724-725
X-ray crystallography applica- Cardiovascular drugs CATS descriptors, 192
tion, 479
natural products as leads, CAVEAT, 111,113
CAMP
878-883 CAVITY, 106-107,108
molecular property visualiza-
CART, 67 CBS reduction, 814
tion, 137
D-Camphor Cascade clustering cDNA clone libraries, 341-342
connection table, 367 with molecular similarityldi- cDNA microarray chips,
tabular molecular file formats, versity methods, 205 344-345
370-371 CASETox, 246 Cefaclor, 872
CAMP phosphodiesterase I1 in- CAS Numbers, 378,400 Cefadroxil, 872
hibitors CAS ONLINE, 361 Cefetamet pivoxil, 849
molecular modeling, 130 similarity searching, 383 Cefozopran, 849
Camptotheca acuminata, 860 Caspase-1inhibitors Cefpimizole, 849
Camptothecin target of structure-based drug Cefsulodin, 849
drugs derived from, 860-861 design, 443 Ceftazidime, 872,873
Candoxatril, 815,818 transition state analogs, 655 Ceftizoxime, 872,873
Cannabinoids Caspase-3 inhibitors Ceftriaxone, 872,873
lead for drugs, 852-854 transition state analogs, 655 Cefuroxime, 872,873
Cannabis sativa, 852 Caspase-7 inhibitors Cell-based partitioning methods,
Canonical numbering, 378,400 transition state analogs, 655 203
Capillary electrophoresis CASP experiment (Critical As- CellCept
for enantiomer separation, 787 sessment of techniques for structure-based design,
Capsaicin, 854 446-447
protein Structure Predic-
Captopril, 646,650,746-747, Central Library, 378,387
tion), 123,353
881 Central nervous system (CNS)
Caspofungin, 848,878
asymmetric synthesis, 807, drugs, See CNS drugs
809 CASREACT, 385
Cephacetrile, 871,874
structure-based design, CAS Registry, 363,385
Cephalexin, 871-872
432-433,434 Catalyst (program),259
Cephaloglycin, 871-872
Carbamates 3D shape-based searching, Cephaloridine, 871,872
pharmacophore points, 249 199,201 Cephalosporin C, 870471,874
Carbenin, 849 for novel lead identification, Cephalosporins, 717,870-871
Carbo index, 202 321 preventing bacterial degrada-
Carbonic anhydrase inhibitors, Catalyst/Hip/Hop, 60 tion, 718
718 Catalyst/Hypo, 60 Cephalosporium acremonium,
4-fluorobenzenesulfonamide Catechol methyltransferase 870
binding to, 538 X-ray crystallographic studies, Cephalothin, 871,874
molecular modeling, 120 484 Cephamandole, 872,873
virtual screening for, 316 CATH, 353 Cephapririn, 871, 874
Carbonic anhydrase I1 inhibi- X-ray crystallography applica- Cephradine, 872
tors, 718 tion, 494 Cerivastatin, 880
novel lead identification by Catharanthus roseus, vinca alka- Cetirizine, 783
virtual screening, 320 loids from, 858 Cetirizine dihydrochloride
X-ray crystallographic studies, Catharanthus (vinca)alkaloids, chromatographic separation,
483-484 858-860 790-791
Carboxylic acids Cathepsin B cGMP
pharmacophore points, 249 transition state analog inhibi- molecular property visualiza-
privileged structures, 252 tors, 654 tion, 137
Index
CGS 27023 Chemical libraries, See Libraries Chiral catalysts, 814-820

structure-based design, 444, Chemical Products Index, Chiral centers, 783-785
446 391-392 Chiral derivatizing agents, 788
Chain branching alteration ana- Chemical property estimation Chiral flags, 365,366
logs, 699-704 systems, 388390 Chirality, 781-787,820-821
Chapman databases, 387 Chemical reactions, 366 asymmetric synthesis,
Charge-charge interactions, 82 searching, 379-384 804-820
Charge-coupled devices (CCDs) Chemical representation, chromatographic separations,
for electron cryomicroscopy, 363-373 787-793
623 Chemical shift, in NMR, 511, classical resolution, 793-799
for X-ray crystallography, 474 512 enzyme-mediated asymmetric
Charge-dipole interactions, 82 changes on binding, 536-537 synthesis, 804-807
Charge parameterization, perturbations as aid in NMR nonclassical resolution,
101-102 screening, 562-568
799-804
Charge state determination Chemical-shift mapping, in
Chiral pool, 807-810
NMR spectroscopy for, 526 NMR, 543-545
Charge transfer energy, 173 Chemical similarity, 382 Chiral reagent, 813-814
CHARMM, 298,299,307-308 Chemical space, 244, 383,400 Chiral stationary phase,
in molecular modeling, 118, exploring with molecular simi- 787-788,790-791
126 larity/diversity methods, Chlopromazine, 692
CheD, 387 188, 191 Chloramphenicol, 870
ChemBase, 362 reduction by virtual screening, molecular modeling, 150
CHEMCATS, 385 244-245 4-Chloro-l,3-benzenediol
CHEMDBS3D, 260,363 Chemical Structure Association, allergenicity prediction, 834
ChemDraw, 362 360 Chloromethyl ketones
ChemEnlighten, 387 Chemical structures protease inhibitors, 761-762
ChemExplorer, 384 file conversion, 372-373 p-Chlorophenylalanine
ChemFinder for Word, 384,388 searching, 379 -384 classical resolution by crystal-
ChemFolder, 388 Chemical suppliers searching, lization, 798-799, 800
Chemical Abstracts (CAS) regis- 384 Chloroquine, 889,890
try file, 50 CHEM-INFO, 360 Chlortetracycline, 870
Chemical Abstracts Service da- ChemInform, 386 Cholchicine
tabases, 254,361,385 Cheminformatics, 359,400 toxicological profile prediction,
Chemical business rules, 378, Cheminformatics Glossary, 360 841,842
403 ChemPort program, 385 Cholecystokinin, 855
Chemical information comput- Chemscape, 387 X-ray crystallographic studies,
ing systems, 357-363 ChemScore 484
chemical property estimation consensus scoring, 266 Chorismate mutase inhibitors
systems, 388-390 empirical scoring, 310 transition state analogs,
chemical representation, ChemSpace, 199 753-754
363-373 ChemText, 362 Chromatographic separation
databases, 384-388 ChemWindow, 388 of chiral molecules, 787-793
data warehouses and data Chem-X, 60, 111 Chromobacterium violactum,
marts, 390393,402-403 CheflChemDiverse 873
future developments, 393-397 3D pharmacophores, 195-196, Chromosomes
glossary of terms used, 206 in genetic algorithms, 87
397-412 optimization approach, 217 Chymotrypsin inhibitors
registering chemical informa- and property-based design, affinity labels, 761, 762
tion, 377-379 234 molecular modeling, 118
searching chemical structures1 Cherry picking, combinatorial QSAR studies, 5, 35-36
reactions, 379-384 libraries, 216-217,237 CICLOPS, 223
storing chemical information, Chesire, 378,387 Cilazapril
373-377 Chicken liver DHFR, QSAR in- asymmetric synthesis, 807,
Chemical information manage- hibition studies, 31-32 809
ment databases, 384 Chilies, capsaicin in, 854 Cilofungin, 877
Chemical information manage- Chime, 369,371,387 Cinchona bark, quinine from,
ment systems, 384 Chiral auxiliary, 810-813 888
Index
CIP (Cahn-Ingold-Prelog) stere- Coagulation factor 10 Comparative binding energy

ochemistry, 365,400 X-ray crystallographic studies, analysis (COMBINE), 53
Cisplatin 484 and docking methods,
vindesine with, 860 COBRA, 255 304-305
cisltrans stereochemistry, 399 R-Cocaine Comparative molecular field
Clarithromycin, 849,874, dopamine transporter inhibi- analysis (CoMFA), 53-54
875-876 tor, 268 assessment of predictability,
Classical resolution, of c h i d Codeine, 849,850 151-153
molecules, 793-799 Coformycin, 750-752 3D, 58-60
Clavulanic acid, 718,869,870 Cognex and docking methods, 304
Cleaning and transforming data, structure-based design, 449 field mapping, 107
400 Colforsin daropate, 849 molecular field descriptors,
Clenbuterol Collagenase 56-57
chromatographic separation, NMR binding studies, 555,
and molecular modeling, 138,
787,788 556
147
Client-server architecture, target of structure-based drug
400-401 design, 443 Comparative quantitative struc-
Clipping, 378,401 CombiBUILD, 227 ture-activity relationships
Clique search techniques, 262 CombiChem Package, 386 -
database development, 39
CLOB (Character Large Object) CombiDOCK, 217,227 database mining for models,
data type, 401 combinatorial docking, 318 39-41
Clofibric acid CombiLibMaker, 378,387 Competitive inhibitors, 728-729
antisickling agent, 421, 422 Combinatorial chemistry, 283, Complementarity, 134
CLOGP, 18,389 358,591-592 Comprehensive Medicinal
Clog P, 17-18,36 defined, 401 Chemistry database, 379,
Cloning, 127 and molecular modeling, 155 386
cDNA clone libraries, 341-342 and natural product screen- Computational Chemistry List,
Clotrimazole, 717 ing, 848 360
Clustering methods, 379,401 Combinatorial chemistry data- Computational protein-ligand
for combinatorial library de- bases, 387 docking techniques,
sign, 220 Combinatorial docking, 317318 262-264
in molecular modeling, 90-91 Combinatorial libraries, 214 Computing technologies,
with molecular similarityldi- comparisons, 221-223 334-335, 337. See also
versity methods, 205 design for molecular similarity Chemical information com-
CML (Chemical Markup Lan- methods, 190,214-228 puting systems
guage), 371372,401,405, encoding and identification COMSiA, 53,60
412 with mass spectrometry, CONCORD, 363,366,387,401
CNS drugs 596-597 3D coordinate generation, 267
complementarity, 134 integration, 224-225 3D descriptors, 55, 110
natural products as leads, LC-MS purification, 592-594 virtual screening application,
849-856 optimization, 217-221 254
pharmacophore point filters, peptidomimetics, 657 Concordance, 390,401
250 screening for ligands to two Conformational analysis
polar surface area, 245 receptors simultaneously, in molecular modeling, 87,
CNS program, 478 601-602 93-94
Coagulation factor 2 structure-based design, NMR spectroscopy for,
X-ray crystallographic studies, 225-228 525-526
484-485 structure/purity c o n h a t i o n and systematic search, 89-93
Coagulation factor 7 with mass spectrometry, Conformational clustering,
X-ray crystallographic studies, 594-596 92-93
485 with virtual screening, 317 Conformational flexibility, 288
Coagulation factor 7a S,S-Combretadioxolane, 816, Conformationally restricted ana-
X-ray crystallographic studies, 819 logs, 694-699
485-486 Combretastatin A-4,816,819 Conformationally restricted pep-
Coagulation factor 9 CoMFA, See Comparative molec- tides, 636-643
X-ray crystallographic studies, ular field analysis Conformational mimicry,
486 Compactin, 744,879 140-142
Index
Conformational mimicry index, Cox-1 inhibitors, 718 Cyclosporin, 848

142 X-ray crystallographic studies, molecular modeling, 106
Conglomerate racemates, 486 NMR spectroscopic binding
799-800,801,802-803 COX-2 inhibitors, 718 studies, 539
Connection tables, 365-368,371, mass-spectrometric binding Cyclosporin A
401 assay screening, 604 binding to FKBP, 552-553
file conversion, 372373 seeding experiments, 319 y -Cystathionase inhibitors,
ISIS database, 376 X-ray crystallographic studies, 719-720
Connectivity, See Molecular con- 486 Cysteine
nectivity CP-96,345, 670,672 chemical modification re-
o -Conotoxins C-QSAR database, 39 agents, 755
lead for drugs, 851-852 Cysteine peptidase inhibitors
Crambin
NMR spectroscopy, 518-523 transition state analogs,
molecular modeling, 124
ConQuest search program, 387 652-655
Crixivan
Conscore constraint score, 218 Cysteine protease inhibitors
structure-based design, affinity labels, 762
Consensus scoring, 265-266,
438-439 Cytochrome P450
291,319-320
and molecular modeling, CROSS-BOW, 361 homology modeling, 123
117-118 Crossfire Beilstein, 385 Cytochrome P450 reductase
Consistent force fields, 102 Cross-linked enzyme crystals, X-ray crystallographic studies,
Constrained minimization, 804 486
143-144 Crosslinking agents, 424-425 Cytosine arabinoside, 717,
Contact matrix, 125-127 Cross validation, 57, 64 867-868
Contignasterol, 886 Cryoprobes
CONTRAST, 361 in NMR screening, 577 D2163,804,806
Conus magus, conotoxins from, in NMR spectroscopy, 515 Daemon, 392,402
851 Cryptotheca cripta, 867 Daffodils, drugs derived from,
Convertases Crystallization 892
homology modeling, 123 for asymmetric transforma- Dalfopristin, 876-877
CONVERTER, 366,402 tion of enantiomers, Spiro-DAMP, 696
CoQSAR, See Comparative 798-799 4-DAMP
quantitative structure-adiv- for enhancing chromato- semirigid analogs, 695-696
ity relationships graphic separation of enan- Daptomycin, 848
CORINA, 366,402 tiomers, 792-793 DARWIN, 299
3D coordinate generation, 267 in nonclassical resolution, explicit water molecules, 303
3D descriptors, 55 799-804 Databases
virtual screening application, CScore, 117 for bioinformatics, 345349
254 Curare cDNA microarray chips, 345
Cosine coefficient, 68 lead for drugs, 856-858 chemical information manage-
COSMIC force field, 80 Cyclic lactarns ment, 384
Coulomb's law, 80,82,285 conformationally restricted commercial systems for drug-
and dielectric problem, 83 peptidomimetics, 640-642 sized molecules, 3 8 4 3 8 7
Coumarin, 882 Cyclic protease inhibitors, 636 comparative QSAR, 39-41
Counting schemes Cyclin-dependent kinase 2 comparing expressed sequence
in druglikeness screening, (CDK2) tags with, 342
245-246 H717 inhibitor pharmaco- history of, 360-363
Coupling constants, in NMR, phore, 253 knowledge discovery in,
511,512 Cyclo(Gly6) 393395
changes on binding, 536-537 genetic algorithm exploration natural products, 387,597
for conformational analysis, 525 of conformational space, 88 for pharmacophore screening,
COUSIN, 361,373,387 Cycloheptadecane 254-255
and combinatorial library in- potential smoothing study, 86 proprietary and academic,
tegration, 224 Cyclooxygenase 112 inhibitors, 387-388
Covalent bonds, 6,170 See COX-1 inhibitors; sequence and 3D structure,
Covalently binding enzyme in- COX-2 inhibitors 387
hibitors, 720, 754-756 Cyclophilin, 552 storing chemical information
inactivation of, 756-760 D-Cycloserine, 717, 719 in, 373-377
Index
for X-ray crystallography, Diastereomers, 784 X-ray crystallographic studies,

478-479 chromatographic separation, 486
Database tier, 393,403,407 788 Dihydropteroate synthetase in-
Data cartridges, 395,402 Dice coefficient, 68 hibitors, 717
Data compression, 402 Dicoumarol, 882 X-ray crystallographic studies,
Data dictionary table, 375 Dictionary of Natural Products, 486
Data marts, 391393,402 597 1,4-Dihydropyridines
Data mining, 402,410,411,412 Dideoxyinosine, 717 chromatographic separation,
future prospects, 394-395 Dielectric problem, 83-84 788,789
with QSAR, 66-67 Dienestrol, 706-707 Dihydroquinine, 889
Data warehouses, 390-393, Diethylstilbestrol Dihydrotestosterone, 36, 768
402-403 stereoisomer analogs, 706-707 Diller-Merz rapid docking ap-
Dative bonds, 170,365 Diffusion-filtered NMR screen- proach, 292,295
Daunomycin ing, 570-571 assessment, 303
thermodynamics of binding to a-Difluoromethylornithine, 717 combinatorial docking, 317
DNA, 183 Digital Northern, 342 Diltiazem
DayCart, 386 Dihydroartemisinin, 887 nonclassical resolution, 803,
DayCGI, 386 Dihydrofolate reductase inhibi- 805
Daylight Chemical Information tors, 545, 717 Dimension tables, 390,403
Systems databases chemical-shift mapping of N,N-Dimethyldopamine
descriptors, 192 binding, 545 alkyl chain homologation ana-
in virtual screening, 254 comparative molecular field logs, 701
10-Deacetylbaccatin11,803 analysis, 153 bioisosteric analogs, 690, 692
Decamethonium, 58 genetic algorithm study of ac- semirigid analogs, 695
fragment analogs, 708-710 tive site, 89 A6,,,,-DimethylheptylTHC, 852
lead for drugs, 856-858 genetic algorithm study of Dimethyl sulfoxide (DMSO)
Deceptive fitness functions, 88 docking, 88-89 force field models for, 176
Decision support systems, 403 interaction with methotrex- Dimethyltubocurarine, 857
Decision tree approach, 247-248 ate, 120 Diphenylmethane, 231
Deconvolution, 401 interaction with tri- privileged structures, 252
Deduplication, 378,403 methoprim, 151,183 2,3-Diphosphoglycerate
Dehydroalantolactone interaction with trimetrexate, (2,3-DPG),104,421
allergenicity prediction, 836 531,557459 2,3-Diphosphoglycerate
Demexiptiline, 692-693 mass-spectrometric binding (2,3-DPG)analogs, 103, 104
DENDRAL, 393 assay screening, 604 Dipolar electrostatic forces, 172
De novo design, 113 molecular modeling, 114, 115, Dipole-dipole interactions, 6,82
(R)-Deoxycoformycin, 750-752 116,147,151 Dipole-induced dipole interac-
(S)-Deoxycoformycin, 751 QSAR studies, 5 tions, 173
DEREK, 246 QSAR studies of inhibition by Directed tweak algorithm, 260
Dement Information databases, diamino, 5Y, 6-Z-quinazo- Directionality, 140
386 lines, 34-35 DISCO, 58,60,256
Dement Selection database, 386 QSAR studies of inhibition by and molecular modeling, 147
Dement World Drug Index diamino-5X-benzyl pyrimi- Discodermolide
(WDI),379, 386, 387 dines, 39 genotoxicity prediction, 843
Dement World Patents Index QSAR studies of inhibition by Disintegrins, 652
(WF'I), 386 triazines, 31-33 Disoxaril
10-Desacetylbaccatin111,863 target of structure-based drug structure-based design,
Descriptor pharmacophores, design, 425-426 454-455
60- 63 volume mapping, 140 Dispersive interactions, 82, 174.
Design in Receptor (DiR)ap- X-ray crystallographic studies, See also van der Wads
proach, 236 486 forces
DHFR, See Dihydrofolate reduc- Dihydromuscimol, 690 Dissimilarity approaches,
tase Dihydroorotase 189-190,206-208
Diamino, 5Y, 6-Z-quinazolines transition state analogs, 752 Dissociation constant, 286
QSAR studies of DHFR inhibi- Dihydroorotate dehydrogenase Distamycin
tion, 34-35 inhibitors binding perturbations, 544
Index
Distance geometry methods DNA topoisomerase 1 protein-ligand docking soft-

in molecular modeling, 126, X-ray crystallographic studies, ware, 261
142,147 487 and QSAR, 304-305
in QSAR, 60 Docetaxel, 849,863 searching configuration and
in virtual screening, 263 DOCK conformation space,
Distance geometry QSAR tech- anchor and grow algorithm, 294-300
nique, 53 296 seeding experiments, 318-319
Distance matrix, 135 assessment, 303,304 special aspects, 300-306
Distance measures combinatorial docking, 318 in structure-based virtual
molecular modeling, 135-137 consensus scoring, 266 screening, 260-267
molecular similarityldiversity empirical scoring, 310 as virtual screening tool,
methods, 201-202 force field-based scoring, 308 266-267
Distance range matrix, 135-136 water's role, 302-303,
force-field scoring, 264
Dithromycin, 875 313-314
geometriclcombinatorial
DiverseSolutions, 387 Docking problem, 289
for molecular similarityldiver- search, 295 DockIT, 261
sity methods, 193-194, ligand handling, 293 Dockvision, 261
203-204 molecular modeling, 112, 113, Dolabella auricularia, 868
Diversity analysis, 358 115,116 Dolastatin-10, 868,869
Diversity methods, See Molecu- molecular modeling of small DoMCoSAR approach, 305
lar similarityldiversity cavity, 106, 107 Donepezil
methods penalty terms, 313 structure-based design, 449
Diversity-property derived performance in structure pre- L-Dopa, 785
(DPD) method, 201,203 diction, 314 analogs, 690
Dixon plots, 731-732 protein and receptor model- Dopamine
D,L descriptors, for chiral mole- ing, 267 semirigid analogs, 697
cules, 783 protein flexibility, 301 Dopamine-transporter inhibitors
DMB-323 receptor representation in, pharmacophore model, 256,
NMR binding studies with 291 258
HIV protease, 560-562 rigid docking, 262-263 virtual screening, 267-269,
DMHB sampling/scoring methods 270
structure-based design, 424 used, 261 D-Optimal designs, 65-66
DMP 450,659 seeding experiments, 319 Dose-response curves, 8
structure-based design, with site-based pharmaco- Dothiepin, 692-693
438-439 phores, 236 Doxepin, 692-693
DNA DOCK4.0 DragHome method, 305
molecular modeling, 154 PMF scoring, 265 DRAGON, 388-389
NMR structural determina- weak inhibitors, 319 DREAM++, 318
tion, 535 Dockcrunch project, 317 Drill-down, 391,403
noncovalent bonds in, 170 Docking methods. See also Scor- Dronabinol, 849
supercoiling modeling, 95 ing functions; various dock- Drug databases, 385-386. See
synthesis inhibition by phe- ing programs; Virtual also Databases
nols, 40 screening Drug Data Report, 379,386
DNA-binding drugs assessment, 303-304 Drug development, 509-510
chemical-shift mapping of basic concepts, 289-290 serial design costs, 359
binding to, 544-545 combinatorial, 317-318 Druglikeness screening. See also
molecular modeling, 116 flexible ligands, 293-294,322 Lipinski's "rule of 5"
NMR spectroscopy, 547-552 and homology modeling, molecular similarity/diversity
thermodynamics of binding, 305-306 methods, 191
183 and molecular modeling, similarity searching, 383
DNA gyrase inhibitors 113-118 virtual screening, 245-250
novel lead identification, 321 and molecular size, 312313 Drug-receptor complexes,
DNA helicase pcra NOE docking in NMR, 170-179
X-ray crystallographic studies, 545-546 low energy state of, 5
487 penalty terms, 313 Drug resistance
DNA polymerase inhibitors, 342, protein flexibility, 300-302, antibiotic resistant pathogens,
717 322 770
Index
essential pathways versus sin- Electronic parameters Encryption, 403

gle enzyme inhibitor, 495 in QSAR, 11-15,50 Endorphins, 634,850-851
Drugscore function, 311,312 Electron lenses, 612 model receptor site, 149
assessment. 303 Electron probability distribu- Endothelin
performance in structure pre- tion, 101 antagonists, 211, 672-674,
diction, 314 Electron-topological matrix of 675,676
seeding experiments, 319 congruence, 147 conformationally restricted
in virtual screening, 315 Electron-withdrawing substitu- peptidomimetics, 637, 639
Drug screening, See Screening ents, 11-15 NMR spectroscopy, 523-524,
Drug-target binding forces, growth inhibition by, 41 526-527
170-171 Electrospray FTICR mass spec- ENERGI approach, 127
association thermodynamics, trometry, 601-603 Energy driven/stochastic search
170-171,177-179 Electrostatic interactions, strategies, 292,296-2300
energy components for inter- 171-172,285 Energy of association, 177
molecular noncovalent in- charge parameterization, English yew, paclitaxel from,
teractions, 171-174 101-102 861-862
example drug-receptor inter- and docking scoring, 308 Enkephalins, 634,850-851
actions, 181-183 enzyme inhibitors, 721, 723 conformationally restricted
free energy calculation, long range, 177 peptidomimetics, 129, 637,
180-181 molecular modeling, 81-85, 639
molecular mechanics force 108-110,140 model receptor site, 149
fields, 174-177 and molecular property visual- Ensemble, 94
Drug targets ization, 137 Enthalpy of association
and bioinformatics, 351-352 and QSAR, 6-7,52 drug-receptor complexes,
estimated number of, 50 Electrotopological indices, 4 170-171
x-ray crystallography of pub- Elimination algorithms, 207 Entoviruses
lished structures, 482-493 EMBL Nucleotide Sequence Da- target of structure-based drug
DYLOMMS, 107 tabase, 335 design, 454-456
Embryo tail defects, 40 Entrainment, 802
E. Coli EMD 122946,676 Entropy, 94
mutagenicity prediction, 829 Empirical scoring, 264, 307, Entropy of association
Eadie-Hofstee plot, 727, 729, 731 308-310 drug-receptor complexes,
ECEPP force field, 118 Enalapril, 650,747,881 170-171
Echinocandins, 877-878 asymmetric synthesis, 807, Enumerated structure, 368
Ecteinascidia turbinata, 868 809 Enumeration, 401,403
Ecteinascidin-743, 848, 867-868 conformationally restricted Enzyme-induced inactivators,
ECTL (Extracting, Cleaning, peptidomimetics, 640-641 756
Transforming, and Loading) Enalaprilat, 650, 747 Enzyme-inhibitor complexes,
data, 377-379,403 conformationally restricted 721-722
Edman sequencing, 518 peptidomimetics, 640-641 Enzyme inhibitors, 715-720. See
Edrophonium, 58 Enantiomeric excess, 784 also specific Enzymes
Efaproxaril (RSR-13),422 enrichment by crystallization, affinity labels, 756-759,
Eflornithine, 768, 769 800-802 760-764
Eigenvector following method, Enantiomers, 365, 366. See also design of covalently binding,
292,301 Chirality 720, 754-756
Einstein-Sutherland equation, 24 with agonist-antagonist prop- design of noncovalently bind-
Elan, 387 erties at same receptor, 705 ing, 720-754
Electron cryomicroscopy, chromatographic separations, examples used in disease
611-628 787-793 treatment, 717
image processing and 3D re- defined, 783-785 ground-state analogs, 720,
construction, 624-628 Enantioselective metabolism, 740 -741
image selection and prepro- 786-787 inactivation of covalently
cessing, 623-624 Enantioselectivity, 784 binding, 756-760
three-dimensional, 615-616 Encoding mechanism-based, 759-760,
Electron-donating substituents, and genetic algorithm, 88 764-771
12-15 natural products with mass multisubstrate analogs, 720,
growth inhibition by, 41 spectrometry, 596-597 741-748
Index
Enzyme inhibitors (Continued) Etodolac FeatureTrees, 316,321

pseudoirreversible, 771-774 classical resolution, 796-797 Fibonacci search method, 11
rapid, reversible, 720, Etoposide, 717,867 Fibrinogen
728-734 Etorphine, 851 virtual screening studies,
slow-, tight-, and slow-tight- Euclidean distance, 68,202 212-213
binding, 720,734-740,749 EUDOC Field-based descriptors, 201
transition-state analogs, 720, assessment, 303 Field effects, 140
748-754 ligand handling, 293 Field mapping, 107
Enzyme-mediated asymmetric European Bioinformatics Insti- Fields, 404
synthesis, 804-807 tute (EBI),335 Filtercascade, 267
Enzymes sequence databases, 387 Filters, for searching, 315-316,
as drug targets, 5 Everolimus, 849 376,380,392,404
kinetics, 725-728 Evolutionary algorithms, 299 Finasteride, 717,768-770
pathways and inhibitor de- with QSAR, 53-54,61 Fingerprint Generation Pack,
sign, 495 Exact match search, 378, 388
and structural genomics, 352 379-381,403 Fingerprints, 376,378,399,404
Ephedra, 885 Exchange repulsion energy, molecular similarity methods,
Ephedrine, 884-886 172-173 188
(-)-Epibatidine,819-820,821 Exemestane, 770,771 FIRM, 67
Epothilone A,864 Exhaustive mapping, 398 Fitness functions, 87-88
toxicological profile prediction, Expert Protein Analysis System, FK506
838-839 335 binding to FKBP, 552-555
Epothilone B, 864-865 Expressed sequence tags, 338 NMR spectroscopic binding
Epothilone D, 865 expression level significance, studies, 539
Epoxides 342-344 FK506 binding protein inhibi-
filtering from virtual screens, profiling, 341-342 tors, 552-555
246 Expression analysislprofiling, de novo design, 113
Equivalence class, 403 334 flexible docking studies, 265
Erythro-9-(2-hydroxy-3-nonyl) genome-wide, 344345 hydrogen bonding in, 288
(EHNA) for target discovery, 340-345 target of NMR screening stud-
high-affinity adenosine deami- Extended stereochemistry,365, ies, 565-566,571
nase ligand, 604 404 weak inhibitor screening, 319
Erythromycin, 870,871 External registry number, 379, X-ray crystallographic studies;
Erythromycin macrolides, 404 487
874-876 Extrathermodynamic relation- Flat database storage, 362-363,
Erythro- prefix, 784 ship, 26 404
Erythrose E,Z system, 365,399 Flat file storage, 360-362,404
enantiomers, 784 Flecainide
E, constant (Taft), 23-24 Factorial designs, 65-66 enantiomers, 786
E-selectin Factor Xa inhibitors, 103,738 FlexE, 301
NMR screening binding stud- 3D pharmacophores, 199 FlexibaseFLOG, 263
ies, 572 non-peptide peptidomimetics, Flexible ligands
E-State index, 26,54 662.665 in docking methods, 293-294,
Esters site-based pharmacophores, 322
pharmacophore points, 249 235-236 and geometric/combinatorial
Estradiol, 706,771 target of structure-based drug search, 295
Estrogen receptor l a design, 442 in virtual screening, 263-264
X-ray crystallographic studies, Fact tables, 390,401,404 Flexmatch index, 376
487 Failed Reactions database, 385 Flexmatch search, 404
Estrogen receptors Families, 93 FlexS, 316
mass-spectrometric binding Family competition evolutionary novel lead identification, 320,
assay screening, 604 algorithm, 299 321
Ethacrynic acid FASTA, 347 PlexX
antisickling agent, 421 Fast ion bombardment, 586,587 assessment, 303,304
Ethidium bromide Fastsearch index, 376377,399, consensus scoring, 266,320
thermodynamics of binding to 404 empirical scoring, 264,310
DNA, 183 FBSS, 202 explicit water molecules, 302
Index
hydrogen bonding, 319 4-Point pharmacophores, 408 GABA aminotransferase

incremental construction, molecular similarity methods, (GABA-T)inhibitors, 718,
295-296 189,196-198,205 766-767
molecular modeling, 115 privileged, 231 X-ray crystallographic studies,
novel lead identification, 320 virtual screening, 210 488
performance in structure pre- FPL-67047 p-D-Galactoside
diction, 314 structure-based design, 453 saturation transfer difference
protein and receptor model- Fractional factorial designs, 66 in binding to agglutin I, 569
ing, 267 Fragment analogs, 707-710 Galantamine (galanthamine),
receptor representation in, Fragment-based ligand docking, 848,849,892-893
291 294 nonclassical resolution, 802,
sampling/scoring methods FRED 803
used, 261 ligand handling, 293 GALOPED, 218
seeding experiments, 319 sampling/scoringmethods Gambler
FlexXc extension, 318 used, 261 consensus scoring, 266
Flickering cluster model, of hy- Free energy of association, 286 flexible ligands, 263
drophobic interactions, 15 calculating, 180-181 seeding experiments, 319
Flo, 256 drug-receptor complexes, 5, Gas chromatography, 592
Flobufen, 41-42,42 170-171 for enantiomer separation,
FLOG enzyme-inhibitor complexes, 787
explicit water molecules, 303 722 Gas chromatography-mass spec-
ligand handling, 293 Free energy perturbation, 307, trometry (GC-MS), 585-586
and molecular size, 312313 308 GASP (Genetic Algorithm Simi-
seeding experiments, 318 Free-Wilson approach, in QSAR, larity Program), 256
Flunet 4,29-30 in molecular modeling, 147
structure-based design, 451 Frontal affinity chromatogra- Gas phase association, 177, 178
Fluorescence spectroscopy, 592 phy-mass spectrometry, 601 G-CSF 3
4-Fluorobenzenesulfonamide 5-FSA X-ray crystallographic studies,
binding to carbonic anhy- structure-based design, 424 488
drase, 538 FTDOCK, 115 Gelatinase
5'-p-Fluorosulfonylbenzoyl Ftrees-FS algorithm, 221 NMR binding studies, 555
adenosine (5'-FSBA), 763- Gel permeation chromatogra-
Fujita-Ban equation, 4
764 phy-mass spectrometry, 599
Fujita-Nishioka analysis, 13
5-Fluorouracil, 717, 718 GenBank, 335
Flurbiprofen, 763 Functional genomics, 338-340
growth of, 339
Fluvastatin, 744,879-880 Functional group filters
X-ray crystallography applica-
FOCUS-2D method, 68-69 in druglikeness screening, tion, 481
Fold compatibility methods, 353 246-247 GeneChips, 344
Fold patterns Functional mimetics (peptidomi- Gene expression, 351. See also
limited number of, 353 metics), 636 Expressed sequence tags;
Follicle stimulating hormone Fungal natural products, 848, Expression analysislprofil-
X-ray crystallographic studies, 893 ing
488 Fungal squalene epoxidase in- Gene family approaches, 188,
Force fields hibitors, 717 244
drug-target binding forces, Fungal sterol l4a-demethylase subset selection, 190-191
170-183 inhibitors, 717 Gene family databases, 347-349
molecular modeling, 79-81 Fuzzy bipolar pharmacophore Gene nomenclature, 337
parameter derivation, 102-103 autocorrelograms, 197 Gene Ontology project, 337
Force-field scoring, 264, Fuzzy clustering technique Generic structures, 367, 368,
306-308 with molecular similarityldi- 404-405
Formestane, 770, 771 versity methods, 205 Geneseq database, 346
Formula table, 376 Fuzzy distance, 57-58 Genetic algorithms
FOUNDATION, 112-113 Fuzzy searches, 376 and combinatorial library de-
Fourier transform ion cyclotron sign, 217,218
resonance (FTICR) mass G-4120,663 with docking methods, 292,
spectrometry, 585,601-603 GABA, See y -Aminobutyric acid 298-299
Index
Genetic algorithms (Continued) Glutathione peroxidase empirical scoring, 309

with FOCUS-2D method, X-ray crystallographic studies, explicit water molecules, 303
68- 69 488 hydrogen bonding, 107
inverse folding and threading, Glycinamide ribonucleotide and molecular modeling, 138
124-125 formyltransferase inhibi- Gridding and Partitioning (Gap)
Larnarckian, 299 tors, 742-743 approach, 199,200
in molecular modeling, 87-89, target of structure-based drug GRIDIGOLPE analysis, 304305
117 design, 429-432 Grid tyranny, 91, 144
with QSAR, 53,61 Glycophorin A GRIND, 60
in virtual screening, 263 potential smoothing study of Groove-binding ligands, 5
GeneTox, 829 TM helix dimer, 86 Ground-state analog enzyme
Genie Control Language, 378 Glycoprotein IIb/IIIa (GpIIb/ inhibitors, 720, 740-741
Genitoxants, 840 IIIa) inhibitors Growth hormone receptor
Genome annotation, 481,494 non-peptide peptidomimetics, X-ray crystallographic studies,
Genome-wide expression analy- 662-665 488
sis, 344345 template mimetics, 129, 643, Growth hormone secretagogues,
Genomics, See Functional 645 671-672,675
genomics; Phylogenomics; Glycopyrrolate GS 4071
Structural genomics stereoisomers, 784-785 structure-based design, 452
GEOCORE, 124 Gmelin database, 386 Guanidine
Geometric atom-pair descrip- GOLD pharmacophore points, 249
tors, 210 assessment, 303, 304 Gusperimus, 849
Geometric/combinatorialsearch empirical scoring, 309
strategies, 292,295 flexible ligands, 263 Hall databases, 387
Geometric hashing, 262 genetic algorithm, 299 Haloperidol
Geometric isomer analogs, protein flexibility, 300 HIV protease inhibitor, 111,
704-707 sampling/scoring methods 112
Ghost membranes used, 261 Halopyrimidines
EPR signal changes by ROH, GOLEM, 67 filtering from virtual screens,
27 GOLPE, 54,60 246
Ghrelin, 671, 674 GPCR libraries Hammett constants, 11,50
GHRP-6,671,674 3D pharmacophore finger- Hammett equation, 12, 13, 26,
Gibbs-Helmholtz equation, 286 prints for, 205 50
Gigabyte, 405 GPCR-likeness, 251,252 Hamming distance, 202
GLIDE GPCRs (G-protein-coupledre- Harnmond postulate, 748
sampling/scoring methods ceptors), 668 Hanes-Woolf plot, 727, 729, 731
used, 261 focused screening libraries Hansch approach, to QSAR,
Global stereochemistry, 398 targeting, 209,250 26-27,30
Glucocorticoid receptor homology modeling, 123, 150 Hansch-Fujita-Ban analysis, 31
X-ray crystallographic studies, molecular modeling, 122 Hansch parabolic equation, 3
488 peptidomimetics, 644, 677 Hansch-type parameters, 54
Glucose, 784 7-transmembrane, 229-234 HARPick program, 218,221
Glutamate dehydrogenase, 764 GRAB-peptidomimetic (Group- Hash code, 376,380,405
Glutamate NDMA agonists, 150 Replacement Assisted Bind- and combinatorial library de-
Glutamate NDMA antagonists, ing), 636,658-659,677 sign, 223
150 GRAMM (Global Range Molecu- molecular fragment based, 54
Glutamate receptor 1 lar Matching), 115 Hemicholinium
X-ray crystallographic studies, Granulocyte-macrophage CSF interatomic distance analogs,
488 X-ray crystallographic studies, 710-711
Glutamic acid 488 Hemoglobin
chemical modification re- Graphical representation, 371 molecular modeling of crystal
agents, 755 Graph isomorphism problem, structure, 105, 107
nonclassical bioisostere ana- 380,405 target of structure-based drug
log, 694 GREEN design, 419-425
rigid analogs, 699 force-field scoring, 264 Hepatocyte growth factor activa-
Glutamine-PRPP amidotrans- GRID, 58,315 tor inhibitor 1
ferase inhibitors, 717 3D pharmacophores, 198 mariptase inhibitor, 269,271
Index
Heroin, 849-850 HlV protease inhibitors, 717 Hydrofinasteride, 768,770

HE-State index, 26 binding-site molecular models, Hydrogen bonds, 285286,365
Heterochiral molecules, 782 130 acidic protons and .sr -systems,
Heteronuclear multiple bond comparative molecular field 313
correlation spectroscopy analysis, 153 and empirical scoring,
natural products, 518 consensus scoring study, 266 309-310
Heteronuclear single quantum 3D CoMFA, 59 enzyme inhibitors, 722,724
correlation spectroscopy, de novo design, 113 hydrophobic interactions con-
512 force field-based scoring study, trasted, 319
Hexestrol, 707 307 molecular modeling, 81,
Hierarchical clustering, 220, homology modeling, 123 107-108
401,405 knowledge-based scoring and QSAR, 6
with molecular similarityldi- study, 311 and structure-based design,
versity methods, 205 molecular modeling, 103-104, 409
105,108,109,111,117,120,Hydrolases
High density oligonucleotide ar-
122 target of structure-based drug
rays, 344 NMR binding studies, 533, design, 449-454
High performance liquid chro- 559-562 Hydrolysis
matography (HPLC), non-peptide peptidomimetics, enzyme-mediated asymmetric,
586-589 659- 660 805-806
for combinatorial library novel lead identification by Hydrophobic bond, 15
screening, 592-596,598, virtual screening, 320 Hydrophobic effect, 178,182
599,607 seeding experiments, 318-319 Hydrophobic interactions, 50,
fast, 596 target of structure-based drug 286,287-288
for hydrophobicity determina- design, 433-442 discovery of importance of, 3
tion, 16-17,23 transition state analogs, and empirical scoring, 310
for separation of chiral mole- 647-649 enzyme inhibitors, 724
cules, 783,788-792 and water, 302 hydrogen bonding contrasted,
High-throughput chemistry, HIV reverse transcriptase inhib- 319
358,405 itors, 717 molecular modeling, 85,
chemical libraries for, 367 X-ray crystallographic studies, 108-110
and natural product screen- 488-489 and QSAR, 6,7,1519,23,52
ing, 848 H+,K+-ATPaseinhibitors, 718 Hydrophobicity, 16-17
High Throughput Crystallogra- HKL suite, 478 determination by chromatog-
Hoechst 33258 raphy, 17-18,23
phy Consortium, 418
binding to DNA, 544,547-552 (S)-3-Hydroxy-y-butyrolactone,
High-throughput screening, 283
Homochiral molecules, 782 808,810
mass spectrometry applica-
HOMO (Highest Occupied Mo- Hydroxychloroquine,891
tions, 591,592-596 lecular Orbital) energy, ll, 6R-Hydroxy-1,6-dihydropurine
and molecular modeling, 155 14,54 riboside, 752
molecular similarityldiversity Homology, 348 Hydroxyethylurea, 153
methods, 191 and X-ray crystallography,494 R-(-)-11-Hydroxy-10-methyla-
raw data points obtained by Homology modeling, 261-262 porphine, 705
companies, 50 and docking methods, 305306 Hydroxymethylglutaryl-CoA
with virtual screening, 316 molecular modeling, 123 (HMG-CoA) reductase in-
X-ray crystallography applica- Homo Sapiens hibitors, 718,719,744-746
tion, 472 genome sequencing, 344 (f)-3-(3-Hydroxypheny1)-N-n-
HIN file format, 369 HQSAR,4 propylpiperidine (3-PPP),
HINT descriptors, 56 HTML (HyperText Markup 704-705
Hint!-LogP, 389 Language),371,405 D,L-3,5-Hydroxyvalerate, 745
HipHop, 60,256 HUGO Gene Nomenclature HYPER, 727
Histamine antagonists Committee, 337 HypoGen, 256
molecular modeling, 143 Human Genome Database Hypothetical descriptor pharma-
Histidine protein classes, 262 cophore, 63
chemical modification re- Human genome sequencing, 344
agents, 755 Human serum albumin, See Se- Iceberg model, of hydrophobic
Hit list, 380,405,411 rum albumin interactions, 15
Index
ICM targets: i.e., Dihydrofolate Interleukin 5

affinity grids, 293 reductase inhibitors X-ray crystallographic studies,
homology modeling, 305 finding weak by virtual 491
ligand handling, 294 screening, 319 Interleukin 6
Monte Carlo minimization, not all are drugs,408 X-ray crystallographic studies,
298 structure of free, and struc- 491
novel lead identification, 320 ture-based design, 409 Interleukin 8
ICGovalues, 731-732 In-house databases, 387-388 X-ray crystallographic studies,
QSAR, 8 Inosine monophosphate dehy- 491
ID3,67 drogenase Interleukin 10
iDEA, 390 consensus scoring study, 266 X-ray crystallographic studies,
IDEALIZE, 255 seeding experiments, 318-319 490
target of structure-based drug Interleukin 12
Identification. See also Lead
design, 446-447
identification X-ray crystallographic studies,
Inosine monophosphate dehy-
combinatorial library com- 490
drogenase 2
pounds, mass spectrometry X-ray crystallographic studies, Interleukin 13
application, 596497 489 X-ray crystallographic studies,
mass spectrometry applica- In silico screening, 191,244. See 490
tion, 594-596 also Virtual screening Interleukin-lp- converting en-
Idoxuridine, 717 Insulin-like growth factor 1 zyme (ICE)
Ifosfamide, 783 X-ray crystallographic studies, transition state analog inhibi-
Imines 489 tors, 655
filtering from virtual screens, Insulin-like growth factor 2 Interleukin 1 receptor
246 X-ray crystallographic studies, X-ray crystallographic studies,
Iminobiotin 489 490
binding to avidin, 181, 182 Insulin-like growth factor 1 re- Intermolecular forces, 6
Imipenem, 872-873,874 ceptor Interpro, 349
Imipramine X-ray crystallographic studies, InterProScan, 349
analogs, 692-693 489 Intracellular adhesion molecule
Immobilized enzyme inhibitors, Integrin alphaM 1 (ICAM-1)
720 X-ray crystallographic studies, target of NMR screening stud-
Immunophilins 489 ies using SAR-by-NMR, "
chemical-shift mapping of Interatomic distance variant an- 566-567
binding to, 545 alogs, 710-712 Inventory data, 405
FK506 binding to FKBP, Intercellular adhesion molecule Inverse folding, 123-125
552-555 1 Inverse QSAR, 4
Importance sampling, 98 X-ray crystallographic studies, Inverted keys, 405
Incremental construction 489 Ionic bonds, 6, 170,365
in docking, 292,295-296 Interferon a 1 Ion-induced dipole interactions,
in virtual screening, 262, X-ray crystallographic studies, 173
317-318 490 Ipconazole, 41
Indexes, 376-377,405 Interferon y Irinotecan, 849,861
Indinavir, 648,659 X-ray crystallographic studies, Irreversible inhibitors, 755
structure-based design, 490 ISISBase, 377,387
438-439,440,441 Interleukin 1 ISIS databases, 373,376-377,
Indomethacin, 453 X-ray crystallographic studies, 387
Inductor variables, 25-26 490 descriptors, 192
Influenza virus neuraminidase Interleukin 2 exact match searching, 380
inhibitors, 717 X-ray crystallographic studies, similarity searching, 382-383
InfoChem ChemReactIChem- 490 substructure searching, 382
Synth database, 386 Interleukin 3 ISIS/Direct, 387
InfoChem SpresiReact database, X-ray crystallographic studies, ISIS/Draw, 387
386 490 Isomer search, 388,405-406
Infrared spectroscopy, 592 Interleukin 4 Isoprenaline, 885
Inhibitors. See also Enzyme in- X-ray crystallographic studies, Isotope editing and filtering, in
hibitors; specific inhibition 490-491 NMR, 545,546
Index
Iterative cyclic approaches L-365,260,856 LH-RH antagonist, 634

and combinatorial library de- L-370,518, 660 LH-RH peptidomimetic analog,
sign, 217 L-685,434 640
Ivermectin, 849,891,892 structure-based design, 439 LibEngine, 221
L-732,747 LiBrain, 220
Japanese Patent and Trademark structure-based design, 440, Libraries, 367, 400. See also
Documents database, 386 441 Combinatorial libraries
Jarvis-Patrick algorithm, 205,401 L-735,525, 797 focused screening libraries for
and combinatorial library de- L-746,072,211 lead identification, 250-252
sign, 220,222 L1210 for NMR screening, 574576
Java, 396,406,407 growth inhibition, 37,40-41 QSAR for rational design of,
JG-365,121,122 inhibition of DHFR, 32,34 68-69
Joins, 390,406 L. major DHFR, QSAR inhibi- Lidorestat
Journal content searching, 383 tion studies, 33
structure-based design,
Laboratory information manage-
448-449
Kaempferol, 865 ment systems, 377
Kanomycin, 870,871 p-Lactamase inhibitors, 718, Ligand-based design
Karplus relationship, 525 868-874 NMR screening for, 564-577
Kelatorphan, 650,651 X-ray crystallographic studies, NMR spectroscopy for, 510,
Kennard-Stone method, 65,66 483 517-532
Ketoconazole, 717 p-Lactams, 868-874 Ligand-based virtual screening,
8-Ketodeoxycoformycin, 751 Lamarckian genetic algorithm, 188
Ketol-en01tautomerization 299 Ligand design, 110-118
NMR spectroscopy, 527-528 Lamivudine, 812-813,816 LigandFit
Ketolides, 876 LASSO0 algorithm, 217 sampling/scoringmethods
Ketones Latent inactivators, 756 used, 261
pharmacophore points, 249 Latent semantic structure in- Ligand flexibility, See Flexible
Key-based similarity searching, dexing, 255 ligands
382-383 Laudexium, 857,858 Ligands
Key field, 406 Lead generation, 426 macromolecule-ligand interac-
Keys, 363 Lead identification, 244 tions, NMR spectroscopy,
encryption, 403 focused screening libraries for, 510,517,535-562
molecular similarity methods, 250-252 non-peptidic ligands for pep-
188 virtual screening for novel,
tide receptors, 667-674
Khellin, 883, 884 320-321
visually assisted design, 110
Kinases Lead molecule fragment analogs,
focused screening libraries 707-710 Ligand strain energy, 308
targeting, 250 Leaf nodes, 377 LIGSITE, 291
King's Clover, drugs derived Leave-one-out cross validation, Linear free energy relationship,
from, 882 57,64 12,14
Kitz-Wilson plots, 757 Leave-some-out cross validation, Linear interaction energy
K-means clustering, 401,406 64 method, 120
K-medoids clustering, 406 Legion, 387 Linear notation, 368-369,406
K-Nearest Neighbors, 53,62-63 Lennard-Jones potential, 285 Linear QSAR models, 26-28
KNI-272, 562 Lentinan, 849 descriptor pharmacophores,
Knowledge-based scoring, Leucine aminopeptidase inhibi- 61-62
264-265,307,310-312 tors, 737-738 Linear regression analysis
Knowledge-bases, 352,379 Leukocyte function-associated in QSAR, 8-11,50,53,67
Knowledge Discovery in Data- antigen 1 (LFA-1) Line-shape, in NMR, 512
bases (KDD),394,406 target of NMR screening stud- and ligand dynamics, 528-531
Kohonen's Self-organizing Map ies using SAR-by-NMR, Lineweaver-Burk plot, 727, 729,
method, 65,66 566-567 731
KOWWIN, 389 Leukotreine A4 hydrolase Link nodes, 381,397
Kubinyi bilinear model, 3,31 X-ray crystallographic studies, LINUS (Local Independent Nu-
491 cleating Units of Structure),
Leveling effect, 722 124
Levorphanol, 708 Linux, 396,406,411
Index
Lipinski's "rule of 5" chromatographic determina- MACROMODEL, 94

and combinatorial library de- tion, 17-18,23 Macromolecular structure deter-
sign, 214-215,216 estimation systems, 388,389 mination, 334
in druglikeness screening, 245 for molecular similarityldiver- NMR spectroscopy applica-
for molecular similarityldiversity methods, 193,208 tions, 533535
sity methods, 193,208 and polarity index, 26 Macromolecule-ligand interac-
and NMR screening, 575 Log Perm, 25 tions, See Protein-ligand
Lipocortin I Log TA98,25-26 interactions
X-ray crystallographic studies, Lomerizine, 41,42 Macrophage CSF 1
491 Lometrexol X-ray crystallographic studies,
structure-based design, 491
Lipophilic interactions, 286
429-430 MACROSEARCH, 94
Liquid chromatography
London forces, See van der Magnetization transfer NMR,
for enantiomer separation, Wads forces
787 568-570
Lopinavir, 648,659 Ma huang,884-885
Liquid chromatography-mass asymmetric synthesis, 807,
spectrometry (LC-MS), Mandelate racemase inhibitors,
809
586-591 762,763
Lorentz-Lorenz equation, 24
affinity screening, 598-599 Lovastatin, 878-879 Manhattan distance, 68
fast, 596 Low Mode Search, 292,301 Marcaine
future developments, 607-608 LUDI, 259,295-296,315 classical resolution, 795
gel permeation chromatogra- combinatorial docking, 318 Marijuana, 853
phy screening, 599 empirical scoring, 310 Marine source drugs
pulsed ultrafiltration screenin molecular modeling, 112, antiasthma, 886
ing, 603-606 113 anticancer, 867-868
for purification of combinato- for novel lead identification, Markup languages, 371-372,405
rial libraries, 592-594 321 Markush feature, 381
structure/purity confirmation LUMO (Lowest Unoccupied Mo- Markush structures, 367,368,
of combinatorial libraries, lecular Orbital) energy, ll, 373,406
594596 14,26, 54 MARPAT, 385
Liquid chromatography-NMR- Luteinzing hormone P Masoprocal, 849
MS, 608 X-ray crystallographic studies, Mass spectrometry, 583-592
Liquids 491 affinity capillary electrophore*
force field models for simple, LwrS sis-mass spectrometry,
176 X-ray crystallographic struc- 599-600
Liquid secondary ion mass spec- ture elucidation, 494-495 affinity chromatography-mass
trometry (LSIMS),586,587 LW-50020,849
spectrometry, 598-599
Lisinopril, 650, 881 LY-303366,877
bioaffinity screening using
asymmetric synthesis, 807, LY-315920
electrospray FTICR MS,
809 structure-based design, 454
Lycopene 601- 603
Literature content searching,
positive ion APCI mass spec- encoding and identification of
383
LitLink, 387 trum, 588 combinatorial compounds
Local stereochemistry, 398 tandem mass spectrum, 591 and natural product ex-
Lock-and-key hypothesis, 251, Lymphomas, 718 tracts, 596-597
252 Lysine frontal affinity chromatogra-
deformable models, 5 chemical modification re- phy-mass spectrometry, 601
Locus maps, 140 agents, 755 future directions, 607-608
Log 1/C, 25,27-29 gel permeation chromatogra-
Log CR, 25 MACCSSD, 259,363 phy-mass spectrometry, 599
Logic, in query features, 406 in molecular modeling, 111 LC-MS purification of combi-
Logic and Heuristics Applied to MACCS (Molecular Access Sys- natorial libraries, 592-594
Synthetic Analysis tem), 254,361362 MS-based screening, 597-598
(LHASA), 379 Machine learning techniques pulsed ultrafiltration-mass
Log MW,24 in molecular modeling, 151 spectrometry, 603-606
Log P in QSAR, 62 solid phase mass spectromet-
chloroform-octanol, 17 Macrocyclic mimetics, 635-636 ric screening, 606-607
Index
structurelpurity confirmation Melittin 2-Methyl-1,4-benzenediol

of combinatorial com- molecular modeling, 124 allergenicity prediction, 833
pounds, 594-596 Members, of Rgroups, 368,406 a-Methyldopa, 785
types of mass spectrometers, Membrane-bound drug targets, 5,lO-Methylene-tetrahydrofo-
585 351 late, 426, 427
Material Safety Data Sheets da- Membrane-bound proteins Methyl group roulette, 700
tabase, 386 molecular modeling, 154 Methylphenidate (Ritalin)
Material Safety Data Sheet NMR structural determina- classical resolution, 793-794
searching, 384 tion, 535 nonclassical resolution, 801
Matriptase Membrane-bound receptors, 5 Metocurine, 856,857
virtual screening of inhibitors, Mepartrican, 849 Metoprolol
269-271,272 Meperidine, 708,851 renal clearance, 38
Matrix-assisted laser desorption rigid analog, 696 Metropolis algorithm, 94,98
ionization (MALDI), 586, 6-Mercaptopurine, 717 D,L-Mevalonate, 745
596,606-607 Mercury search program, 387 Mevastatin, 744-745
Matrix metalloprotease inhibi- Merged Markush Service, 386 MHC I receptor
tors, 227,555457 Merimepodip homology modeling, 123
chemical-shift mapping of structure-based design, 447 molecular modeling, 117
binding to, 545 MERLIN, 39,386 Michaelis-Menten constants,
target of NMR screening stud- Messenger RNA, See mRNA 725-728
ies using SAR-by-NMR, 566 Metabolism. See also Absorp- use in QSAR, 7 , 8
target of structure-based drug tion, distribution, metabo- Michaelis-Menten kinetics,
design, 443-445 lism, and excretion (ADME) 725-728
transition state analog inhibi- enantiomers, 786-787 Microarray chips, 334,344-345
tors, 651-652 Metabolism databases, 385,386 Microbial secondary metabolites,
virtual screening, 315 Metabolism screening, 591 848
Maximum Auto-Cross Correla- pulsed ultrafiltration applica- Micropatent, 386
tion (MACC),202 tion, 605 Microsoft Access, 373
Maxmin approach, 208 Metabolite database, 386 Middle tier, 392, 406-407
May apple, drugs derived from, Metadata, 375,376,406 Miglitol, 849
865 Meta-layer searching, 395 Milbemycins, 891,892
Maybridge catalog, 385 Metallopeptidase inhibitors L-Mimosine
MCDOCK transition state analogs, analogs, 690
Monte Carlo simulated an- 649-652 MIMUMBA, 255
nealing, 297 Metamitron, 42 Mini-fingerprints, 255
MD Docking (MDD) algorithm, Metazocine, 708 Minimum topological difference
298 Metconazole, 41,42 (MTD) method, 4,147
MDL Information Systems, Inc. Methadone, 708 Mining minima algorithm, 292,
databases, 386-387 Methamphetamine 299-300
Mechanism-based enzyme inhib- ring substitution analogs, 704 Mitogen-activated protein ki-
itors, 759-760,764-771 R-Methanandamide, 852,853 nase
Mefloquine, 889-890 Methanol target of structure-based drug
artemisinin potentiates, force field models for, 176 design, 456-459
887-888 Methicillin, 869, 870, 871 Mivacurium, 857,859
Meglumine, 796-797 Methionine:adenosyl trans- Mixtures, 367-368
Melagatran ferase, 148 Mizoribine, 849
structure-based design, 442, Methionine hydrochloride MK-329,855
444 nonclassical resolution, 803 MK-383,213
a-Melanotropin Methods in Organic Synthesis MK-499,814-815,818
conformationally restricted database, 385 MK-0677,671,674
peptidomimetics, 637 Methotrexate, 717, 718, 749 MK-678,657
Melatonin interaction with dihydrofolate ML-236B, 879
analogs, 693 reductase, 120 MLPHARE, 478
antagonists, 211-212 structure-based design, 425 MM-25
Melilotus officinalis (ribbed me- N-Methyl-acetemide structure-based design, 423,
Mot), 882 force field models for, 176 424
Index
common patterns, 142-150 Molecular structure descriptors

structure-based design, 423, conformational analysis, 87, in QSAR, 26
424 93-94 Molecular targets, See Drug tar-
MM2 force field, 80, 307 and electrostatic interactions, gets
MM3 force field, 80, 118 81-85 Molecular weight
MM-PBSA method, 315 and force fields, 79-81 for molecular similarityldiver-
Modeling, See Molecular model- known receptors, 103-127 sity methods, 193,208
ing ligand design, 110-118 and QSAR, 24-25
Model mining, 3 molecular comparisons, MOLGEO, 255
Model receptor sites, 149-150 138-142 Molinspiration, 390
Molar refraction, 24, 54 and molecular mechanics, MOLPAT, 110-111
MOLCONN-Z, 55,192,389 79-100 Monasus ruber, 879
Molecular Biolom Database Col- pharmacophore versus binding Monoamine oxidase inhibitors,
lection, 345
site models, 127-135 718
Molecular comparisons, 138-142
potential surfaces, 85-89 Monobactams, 873
Molecular connectivity, 192,407
estimation systems, 388 protein structure prediction, Monocolin K, 879
in QSAR, 26,55,56,61 122-127 Monomer Toolkit, 377378
Molecular docking methods, See and QSAR, 5 Monte Carlo simulated anneal-
Docking methods and quantum mechanics, ing
Molecular dynamic simulations. 100-103 and combinatorial library de-
See also Monte Carlo simu- similarity searching, 135-138 sign, 218
lations site characterization, 105-110 with docking methods, 292,
barrier crossing, 98 and statistical mechanics, 297
with docking methods, 292, 94-95 with virtual screening, 263
298 in structure-based design, 419, Monte Carlo simulations. See
and force field-based scoring, 420 also Molecular dynamic sim-
308 systematic search, 89-94, 116 ulations
hydrogen bonds, 107 unknown receptors, 127-153 barrier crossing, 98
in molecular modeling, 85, 93, and virtual screening, 244 and combinatorial library de-
95-100,116-117,142 Molecular multiple moments, 54 sign, 217
and non-Boltzmann sampling, Molecular property visualiza- de novo design, 113
100 tion, 137-138 with docking methods, 292,
protein flexibility, 301-302 Molecular recognition, 283 297-298
statistical mechanical, 94,95 and hydrophobic interactions, in molecular modeling, 85, 86,
of temperature, pressure, and 15 93,96-99,116-117,142
volume, 96 physical basis of, 284-289 and non-Boltzmann sampling,
thermodynamic cycle integra- Molecular replacement, 477 100
tion, 99 Molecular sequence alignment, statistical mechanical, 94,95
in virtual screening, 263 353 thermodynamic cycle integra-
water's role in docking, Molecular sequence analysis tion, 99
302303 bioinformatics for, 335-336 in virtual screening, 263
Molecular eigenvalues, 54 Molecular shape analysis, 53 Moore's Law, 393
Molecular electrostatic potential, Molecular shape descriptors, 54 Morgan algorithm, 378,407
102 Molecular similarityldiversity Morphiceptin, 144,145
Molecular extensions, 130-131 methods, 54, 188-190 Morphinans, 850
Molecular field descriptors, 54, analysis and selection meth- Morphine, 634
55-57 ods, 203-209 ecological function, 848
Molecular Graphics and Model- combinatorial library design, fragment analogs, 707-708
ing Society, 360 190,214-228 Morphine alkaloids, 849-851
Molecular holograms, 54 descriptors for, 191-203 Mosflm/CCP4,478
Molecular mechanics, 79-100 example applications, 228-237 Most descriptive compound
force fields, 174-177 future directions, 237 (MDC) method, 207-208
Molecular modeling, 77-79, and molecular modeling, mRNA
153-154,358 135-138 and expression profiling,
affinity calculation, 118-122 virtual screening by, 188,190, 340-341
and bioinformatics, 351 209-214 MSDRLICSIS, 361
Index
MS-MS, See Tandem mass spec- enzyme-mediated asymmetric X-ray crystallographic studies
trometry (MS-MS) synthesis, 805 [int B virus], 491
Mulliken population analysis, Narwedine, 802,803 Neuroleptics
101-102 National Cancer Institute data- molecular modeling, 150
MULTICASE SAR method base, 222,254,385486,387 Neuromuscular drugs
toxicity prediction application, National Center for Biotechnol- natural products as leads,
828- 843 ogy Information (NCBI), 856-858
Multidimensional databases, 335 Neuropeptide Y
390,407 sequence databases, 387 X-ray crystallographic studies,
Multidimensional NMR spec- National Toxicology Program, 492
troscopy, 512-514 246,829 Neuropeptide Y inhibitors, 671,
Multidimensional scaling, 201 Natural product mimetics, 636 673,674
Neutral endopeptidase (NEP),
Multidimensional scoring, 291 Natural products
650-651
Multilevel chemical compatibil- antiasthma drug leads,
Nitric oxide synthase, 736
ity, 249 883-886 Nitric oxide synthase inhibitor,
Multiple-copy simultaneous antibiotics drug leads, 738-739
search methods (MCSS), 868-878 Nivalin, 892
298 anticancer drug leads, NK receptor antagonists,
Multiple isomorphous replace- 858- 868 669-670,672
ment (MIR) phasing, 477 antiparasitic drug leads, NMR, See Nuclear Magnetic
Multiple regression analysis 886-891 Resonance (NMR) spectros-
in QSAR, 8-11,50,52,53 cardiovascular drug leads, COPY
Multisubstrate analog enzyme 878-883 NMR timescale, 537
inhibitors, 720, 741-748 CNS drug leads, 849-856 NN-703,671,675
Multi-tier architecture, 392, 407 drugs derived from, NOE, See Nuclear Overhauser
Multiwavelength anomalous dif- 1990-2000,849 effects (NOE),in NMR
fraction (MAD) phasing, extract encoding and identifi- Nolatrexed, 428
474,477-478 cation, 596-597 Non-Boltzmann sampling, 100
Munich Information Center for leads for new drugs, 847-894 Nonclassical bioisosteres,
Protein Sequences (MIPS), neuromuscular blocking drug 690-694
335 leads, 856-858 Nonclassical resolution, of chiral
Muscarinic receptors NMR structure elucidation, molecules, 799-804
distance range matrices, 136 517-518 Noncompetitive inhibitors,
stereoisomer analogs, 705-706 Natural products databases, 387, 730-731
Mutation 597 Noncovalent bonds, 6,170
in genetic algorithms, 87,88 Nearest neighbors methods, 53, energy components for inter-
MVIIA (Ziconotide) 62-63,67 molecular drug-target bind-
NMR spectroscopy, 518-523, Neighborhood behavior, 211 ing, 171-174
526,534 Nelfinavir, 648 Noncovalently binding enzyme
MVT-101,103-104,105,117 asymmetric synthesis, inhibitors, 720-754
Mycophenolate mofetil, 849 817-818 Nonisosteric bioanalogs,
Mycophenolic acid structure-based design, 440, 689-694
structure-based design, 442 Nonlinear QSAR models, 28-29
446-447 Neomycin, 870,871 descriptor pharmacophores,
Myoglobin, 419 Netropsin 62- 63
binding perturbations, 544 Nonlinear regression, 67
Nabilone, 853 Neu5Ac2en Non-overlapping mapping, 398
Nadolol structure-based design, 451 Non-peptide peptidomimetics,
renal clearance, 38 Neural networks, See Artificial 636,657-674
Naftifine, 717 neural networks Nonpolar interactions, See van
Na+,K+-ATPaseinhibitors, 718 Neuraminidase inhibitors, 717 der Wads forces
Nalorphine, 850 flexible docking studies, 265 Nonstructural chemical data,
Naloxone, 850 PMF function application, 314 373
NAPFMLERT, 597 Screenscore application, 319 Norapomorphine
Naproxen target of structure-based drug alkyl chain homologation ana-
classical resolution, 794-795 design, 450-452 logs, 701
Index
Norfloxacin, 41,42 for macromolecular structure OSPPREYS (Oriented-substitu-

Norstatine, 652 determination, 533 ent Pharmacophore PRop-
Norvir and NMR screening, 571-573 ErtY Space), 199,224
structure-based design, 438, NOE docking, 545-546 Overlapping mapping, 398
440 transferred NOE technique, OWFEG (one window free en-
Nostructure, 410 532 ergy grid) method, 308,315
NOT logical operator, 406 Nucleic acid receptors, 5 Oxidation
NPS 1407,812,815 Nucleic acids. See also DNA, enzyme-mediated asymmetric,
Nuclear hormone receptors RNA 806
focused screening libraries biochemical force fields, Oxidoreductases
targeting, 250 175-176 target of structure-based drug
NMR structural determina- design, 445-449
Nuclear Magnetic Resonance
tion, 535 Oxprenolol
(NMR) imaging, 510
Nucleotide intercalation, 183 renal clearance, 38
Nuclear Magnetic Resonance Oxytetracyclin,870
(NMR) screening methods, 0 (graphics program), 478
510,562-577 Object-oriented language, 407 P. carinii DHFR, QSAR inhibi-
capacity issues, 190 Object relational database, 407 tion studies, 32-33
Nuclear Magnetic Resonance Ocreotide, 657 Pacific yew, paclitaxel from,
(NMR) spectroscopy, 351, Octanol/water partitioning sys- 861-862
507-514,592. See also SAR- tem, 16-17 Paclitaxel, 843,848,861-863
by-NMR approach OLAP (OnLine Analytical Pro- Pairwise interactions, 79-80
applications, 516417 cessing), 390,408 PALLAS System, 389
chemical shift mapping, Oleandomycin, 870,871 Paluther, 887
543-545 OLTP (OnLine Transaction Pro- Pamaquine, 888-889
instrumentation, 514-516 cessing), 390,408 Pancreatic polypeptide
with LC-MS, 608 Omapatrilat, 651 molecular modeling of avian,
ligand-based design, 510, OMEGA, 255 124
517532 Ondanetron Papain
macromolecule-ligand interac- nonclassical resolution, 802 QSAR studies, 5
tions, 510, 517, 535-562 OpenBabel, 372 transition state analog inhibi-
metabolic, 510 Open Molecule Foundation, 360 tors, 654
and molecular modeling, 78 Open reading frames Papaver somniferum (opium .
multidimensional, 512-514 housing in DNA databases, POPPY), 848,849
for pharmacophore modeling, 338 Parallel chemistry, 283
531-532 Opium poppy, 848,849 Parallel library, 214
receptor-based design, 510, Optimization approaches Parallel processing, 408
532-562 for combinatorial library de- Parathion, 774
in structure-based drug design, 217-220 Parathyroid hormone
sign, 419,516-517 OptiSim method, 207 X-ray crystallographic studies,
and structure-based library and combinatorial library de- 492
design, 225 sign, 220 Parent structure, 368,404
structure determination of Oracle, 373 Pareto optimality, 220
bioactive peptides, 517-518 Organic structure databases, Partial charge, 366,373
structure elucidation of natu- 385 Partition coefficients, 16-17, 54
ral products, 517-518 Organoarsenical agents, 717 Partition function, 94-95
and virtual screening, 244 Orientation map ( O W ) ,131, Partitioning algorithms, 67
Nuclear Magnetic Resonance 144,146 PASS, 291,390
(NMR)titrations, 545 Oriented-substituent pharma- Patent Citations Index, 386
Nuclear Overhauser effect cophores, 224 Patent databases, 386
(NOE)pumping, 573 Orlistat, 848,849 Patent searching, 383-384
Nuclear Overhauser effects OR logical operator, 406 Pathways, 495-496
(NOE),in NMR, 511,512 use in molecular similarity1 X-ray crystallographic analy-
for conformational analysis, diversity methods, 194 sis, 495-496
525 Ornithine decarboxylase inhibi- Pattern recognition, 408
and distance range matrix, tors, 717, 766, 768, 769 and cluster analysis, 401
136 Oseltamivir, 452, 717 with QSAR, 53
PC cluster computing, 283-284 Pfam, 349 target of structure-based drug
PCModels, 386 Pharmacophore keys, 376, design, 453-454
PD-119229 408-409 X-ray crystallographic studies,
structure-based design, Pharmacophore mapping, 255 492
460-461 Pharmacophore point filters, Phosphonoacetate, 740
PDB file format, 369 196,249-250 N-Phosphonoacetyl-L-aspartate
PDGF beta Pharmacophores, 368 (PALA), 743-744
X-ray crystallographic studies, with BCUT descriptors, Phosphonoformate, 740
492 223-224 2-(Phosphonomethoxy)ethylgua-
Peak intensities, in NMR, 512 binding site models con- nidines
Peldesine trasted, 127-135 chain branching analogs, 702
structure-based design, 460 defined, 252-253,408 (R)-9-[2-(Phosphonome-
Pemetrexed descriptor, for QSAR,60-63
thoxy)propylladenine (R-
structure-based design, 3D searching, 366-367
PMPA), 818-819
429-430 in molecular modeling, 110
for molecular similarityldiver- Phosphoryl transferases
Penicillins, 717,868-870
preventing bacterial degrada- sity methods, 194-201, target of structure-based drug
tion, 718 204-206 design, 456-4561
Penicillipepsin inhibitors NMR-based modeling, Phylogenomics, 347-349
molecular modeling, 116 531-532 Physicochemical descriptors, 54
Penicillium brevicompactum, NMR spectroscopy-based mod- estimation systems, 389
879 eling, 531532 for molecular similarityldiver-
Penicillium chrysogenum, 869 oriented-substituent, 224 sity methods, 193
Penicillium citrinium, 879 site-based, 235-237 for virtual screening, 255
Pentostatin, 717, 750-751,849 virtual screening, 252-260 Physicochemical properties, 373,
PeptiCLEC-TR, 804 PharmPrint method, 223 409
Peptide backbone mimetics, 636, Phase problem Physostigmine, 774
644- 645 in X-ray crystallography, Picornaviruses
Peptide bond isosteres, 644-646 476-478 target of structure-based drug
Peptides, 634 Phencyclidine design, 454-456
biochemical force fields, rigid analogs, 696-697 Picovir
175-176 P-Phenethylamines, 697-698 structure-based design,
NMR structural determina- Phenols 455-456
tion of bioactive, 517-518 DNA synthesis inhibition by,
Pigeon liver DHFR, QSAR inhi-
non-Boltzmann sampling of 40
bition studies, 31-32
helical transitions, 100 growth inhibition by, 38,
Peptidomimetics, 128-129, 40-41 Pipecolic acid, 805
633-634 Phenylacetic acids Pirlindole
classification, 634-636 ionization of substituted, chromatographic separation,
conformationally restricted 12-14 788-789,790
peptides, 636- 643 PhenylethanolamineN-methyl- Pit viper, drugs derived from,
future directions, 674-677 transferase (PNMT) inhibi- 881
molecular modeling, 154 tors, 733-734,740 Pivoting data, 409
non-peptide, 636,657-674 (R,S)-a-Phenylglycidate,762 Plant natural products, 848,893
peptide bond isosteres, (S)-a-Phenylglycidate, 762 Plant secondary metabolites,
644-646 N'-(R-Pheny1)sulfanilamides 848
protease inhibitors, 646-655 antibacterial activity, 10 Pleconaril
speeding up research, 655-657 Phosphatidylcholine monolayers structure-based design,
template mimetics, 643-644 penetration by ROH, 27 455-456
Peramivir Phosphocholine PLOGP, 389
structure-based design, 452 docking to antibody McPC603, PLP function, 266,309
Personal chemical databases, 298 consensus scoring, 320
387-388 Phosphodiesterases hydrophobic interactions, 319
Petabyte, 408 alignment of catalytic domains performance in structure pre-
Pethidine, 851 in gene family, 349 diction, 314
PETRG 390 - A2
Phos~holi~ase
* seeding experiments, 319
Petrosia contignata, 886 homology modeling, 123 PLUMS, 225
p38 MAP kinase Prinomastat target of structure-based drug
consensus scoring study, 266 structure-based design, 444, design, 432-445
seeding experiments, 318319 446 transition state analogs,
PMF function, 265,311,312 PRINTS, 335,349 646-655
performance in structure pre- Privileged structures Protein classes, 262
diction, 314 in molecular similarity/diver- Protein Data Bank, 110,353
seeding experiments, 319 sity methods, 209 sequence database, 387
PMML (Predictive Model template mimetics, 644 X-ray crystallography applica-
Markup Language), 405 in virtual screening, 251-252 tion, 478-479
PNU-107859 PROBE, 126 Protein Database
NMR binding studies, 555 PROCHECK program, 478 and virtual screening, 261-262
PNU-140690,659,812,813 ProDock Protein families
PNU-142372 affinity grids, 293 targeting in libraries for vir-
NMR binding studies, tual screening, 251
Monte Carlo minimization,
555-556 Protein interactions, 334
POCKET, 259
298 Protein-ligand docking pro-
Podophyllin ProDom, 349 grams, 292
drugs derived from, 865-867 Proflavin Protein-ligand docking tech-
Podophyllotoxin, 849,865-866 thermodynamics of binding to niques, 262-264
Podophyllum emodi, 865 DNA, 183 Protein-ligand interactions,
Podophyllum peltatum (May ap- Progesterone receptor 284-289,322
ple), 865 antibody FAB fragment, 128 NMR spectroscopy, 510,517,
Poisson-Boltzmann equation, X-ray crystallographic studies, 535-562
83,84 492 QSAR studies, 5
Polarity index, 26 PROGOL, 67 scoring, 264-267
Polarizability, 85 Project Library, 387 scoring in virtual screening,
Polarizability index, 11 Prolactin receptor 264-266
Polarization energy, 173 X-ray crystallographic studies, Protein-protein interactions, 634
Polar surface area 492 characterizing,637
in druglikeness screening, 245 PRO-LEADS, 299 Proteins. See also Macromolecu-
Policosanol, 849 assessment, 303,304 lar structure determination
Pomona College Medchem, 385 flexible ligands, 263 binding and chirality, 786-787
Potassium channel shaker PRO-LIGAND flexibility and docking,
X-ray crystallographic studies, genetic algorithm with, 89 300-302,322
492 Pronethalol, 881 phylogenetic profiling,
Potential smoothing, 86 Propargylglycine, 719-720 347-348
Potential surfaces, 85-89 Property-based design, 234-235 Protein structures
PPAR y Propranolol, 881-882 prediction, 122-127
X-ray crystallographic studies, enantiomers, 786 in structure-based virtual
492 enzyme-mediated asymmetric screening, 261-262
Pralnacasan synthesis, 805-806 X-ray crystallographic analy-
structure-based design, 443, renal clearance, 38 sis, 496
446 N10-Propynyl-5,8-dideazafolate, Proteome, 352
Pravastatin, 879,880 426,427 Proteomics. 409
Preliminary screening, 111-112 Proresid, 866-867 Pseudoirreversible enzyme in-
Pressure PRO-SELECT hibitors, 771-774
molecular dynamic simula- combinatorial docking, 318 Pseudomonas acidophila, 873
tion, 96 PROSITE, 348349 Pseudopeptides,635-636
Primaquine, 889 Prostacyclin, 762-763 isosteres replacing peptide
Principal components analysis Prostaglandin synthase inhibi- backbone groups, 646
with molecular similarityldi- tors, 718,762-763,764 Pseudoracemate, 799-800,801
versity methods, 192,201 Protaxols, 863 Pseudo-receptor models, 261
in QSAR, 15 Protease inhibitors. See also PSI-BLAST, 335,347
Principal components regres- HIV protease inhibitors X-ray crystallography applica-
sion, 53 affmity labels, 762 tion, 481
Prindolol QSAR studies, 5 Pulsed ultrafdtration-mass spec-
renal clearance, 38 structural genomics, 353 trometry, 603-606
Index
Purine biosynthesis inhibitors, descriptor pharmacophore Racemates, 782

752 concept, 60-63 types of, 799-801
Purine nucleoside phosphorylase and docking methods, Racemization, 783-784
target of structure-based drug 304-305 Radiation damage
design, 459-461 Free-Wilson approach, 4, in electron cryomicroscopy,
Purine ribonucleoside, 751-752 29-30 612-613,614-615,616
Purity verification guiding principals for safe, 66 Raffinate, 791
as bottleneck in drug discov- and library design, 68-69 Ramachandran plot, 92
ery, 592 linear models, 26-28, 51, and conformational mimicry,
LC-MS-based purification, 61-62 141
592-594 model validation, 63-66 Ramipril, 746
mass spectrometry applica- and molecular similarity/di- Ramiprilat, 746-747
Random searching
tion, 594-596 versity methods, 194
in virtual screening, 263
Pyridoxal phosphate-dependent multiple descriptors of molec-
Rapamycin, 848
enzymes ular structure, 54-58 binding to FKBP, 552,554
mechanism-based inhibitors, nonlinear models, 28-29, 51, Rapid, reversible enzyme inhibi-
765-768 62-63 tors, 720,728-734
Pyrimidine biosynthesis inhibi- parameters used, 11-26 Rapid sequence screening, 334
tors, 752 problems with Q2, 64-65 Rare gas interactions, 174
Pyrrolinones receptor theory development, Ras-farnesyltransferase inhibi-
peptide-like side chains, 635, 4-7 tors
642 standard table, 51 non-peptide peptidomimetics,
Pyruvate dehydrogenase inhibi- in structure-based design, 419 665-667,668,669
tors, 717 substituent constants for, template mimetics, 643,645
Pyruvate kinase, 764 19-23 Rats
taxonomy of approaches, ataxia induction by ROH, 29
Qinghaosu (artemisinin), 886 52-54 liver DHFR, QSAR inhibition
Q-jumping MD, 298 tools and techniques of, 7-11 studies, 34
QSAR, See Quantitative struc- training and test set selection, REACCS, 398
ture-activity relationships 65-66 reaction searching using, 383
QSAR and Modeling Society, 360 variable selection. 60-63 Reactant-biased, product-based
QSDock, 295 as virtual screening tool, (RBPB) algorithm, 215,216,
QSiAR, 53,60 66-69 219
QSPR, See Quantitative struc- Quantitative structure-property Reacting centers, 366,383, 398,
ture-property relationships relationships, 53 409
Quadratic shape descriptors, 295 and molecular similarityldi- Reaction Browser~Web,387
Quadrupole time-of-flight hybrid versity methods, 194 Reaction databases, 386
(QqTOF) mass spectrome- Quantum chemical indices, 11, Reaction field theory, 83
try, 585,607 14-15,54 Reaction indexing, 383
QUANTA, 258 Quantum mechanics, 100-103 Reaction Package, 386
Quantitative strudure-activity Quercetin, 865 Reactions, See Chemical reac-
relationships, 1-4,49-52, Query features, 381 tions
358. See also Comparative logical operators, 406 Reaction scheme, 409
quantitative structure-activ- Query structures, 368 Reagent Selector, 387,391-392
ity relationships; 3D quanti- mapping, 380 RECAP (Retrosynthetic Combi-
tative structure-activity re- Quinacrine, 889,890-891 natorial Analysis Proce-
lationships Quinine, 888-891 dure), 249
applications with interactions Quinolines, 889-890 Receptor-based design
at cellular level, 37-38 Quinupristin, 876-877 NMR spectroscopy for, 510,
applications with interactions Quisqualic acid, 694 532-562
in vivo, 3 8 3 9 Qxp pharmacophore generation,
applications with isolated re- Monte Carlo minimization, 259
ceptor interactions, 30-37 298 Receptor-based 3D QSAR, 304
2D, 52,53 Receptor-ligand complexes, 78
data mining, 66-67 Rabbits Receptor-ligand mimetics, 636
defined, 409 narcosis induction by ROH, 27 Receptor mapping, 148-149
Index
Receptor-relevant subspace, 204, RGD peptide sequence mimics, ROSDAL notation, 368,410
222 129,643,645,662-665 Rosuvastatin, 848,880-881
Receptor theory, 4-7 Rgroups, 368,373,397,405, Rosy periwinkle, vinca alkaloids
Reciprocal nearest neighbor, 220 409-410 from, 858
Recursive partitioning, 247-248 and combinatorial library de- Rotatable bonds
Red clover extract sign, 221 in druglikeness screening, 245
LC-MS mass spectrum, 589, Rhinoviruses in molecular modeling, 90-91
590 comparative molecular field Royal Society of Chemistry
Reduction analysis, 153 Chemical Information
enzyme-mediated asymmetric, molecular modeling of antivi- Group, 360
806 ral binding to HRV-14, 120, RPR109353,211
Refining, search queries, 409 122 R,S descriptors, for chiral mole-
target of structure-based drug
REFMAC, 478 cules, 365, 783
design, 454-456 3
Registration, of chemical infor- RS Discovery System, 377,385
Rhodopeptin
mation, 377-379 RSR-13,422,423
template mimetics, 644,645
Registry number, 378-379,409 Ribbed melilot, drugs derived RSR-56,422,423
Relational databases, 363, 373, from, 882 RTECS, 246
409 Rifamycin, 870,872 RUBICON, 386
Relative diversitylsimilarity, 209 Rigid analogs, 694-699 virtual screening application,
Relaxation parameters, in NMR, Rigid body rotations, in molecu- 254
511,512 lar modeling, 90-91 "Rule of 5," See Lipinski's "rule
changes on binding, 536-537 Rigid docking, 262-263,293 of 5"
and ligand dynamics, 528-531 Rigid geometry approximation,
and NMR screening, 571-573 in molecular modeling, 89 S-37435,675
in receptor-based design, 534 Ring-position isomer analogs, Saccharomyces cerevisiae
Relenza 699-704 genome sequencing, 344
structure-based design, 451 Rings Saccharopolyspora erythraea,
Relibase, 315 in druglikeness screening, 245 874
Reminyl, 892 molecular comparisons, 139 Salbutamol, 885,886
Renin inhibitors, 432 in molecular modeling, 91 Salmeterol, 885,886
molecular modeling, 123, 153 Ring-size change analogs, S-Salmeterol
transition state analogs, 647 699-704 enzyme-mediated reduction,
REOS filtering tool, 225 Ritalin 806,808
RESEARCH classical resolution, 793-794 Salmonella
Monte Carlo simulated an- nonclassical resolution, 801 mutagenicity prediction, 829,
nealing, 297 Ritonavir, 648,659 831-832,840,842-843
Resiniferatoxin, 854 asymmetric synthesis, Salt bridges, 285
Restrained electrostatic poten- 807-808,809 and virtual screening, 272
tial, 102 structure-based design, 438, Salts definitions, 376
Result set, 409,411 440 Salts search, 388
Retigotine, 783 Rivastigmine, 774 Sampatrilat, 651
Retinoic acid structure-based design, Saquinavir, 648,659, 717
docking and homology model- 449-450 structure-based design,
ing, 305 RNA 435-437,440
stereoisomer analogs, 707 molecular modeling, 154 SAR-by-NMR approach, 508,
X-ray crystallographic studies, NMR structural determina- 516
492-493 tion, 535 in NMR screening, 564468,
Retinoid X receptor RNA polymerase inhibitors, 717 576
X-ray crystallographic studies, Ro-31-8959,121 Sarin, 774
493 Ro-32-7315, 652, 653 Saturated rings
Retrosynthetic analysis, 409 Ro-46-2005,673, 676 analogs based on substitution
Retrothiorphan, 650, 651 ROCS, 256,259 of aromatic for saturated
Reverse nuclear Overhauser ef- shape-based superposition, ring; or the converse,
fects pumping, 573 260 699-704
Reversible enzyme inhibitors, Roll-up, 410 Saturation diversity approach,
720 Root structure, 368,404,410 223
Index
Saturation transfer difference Serevent, 806 Simulated annealing. See also

NMR, 568-570 Serial analysis of gene expres- Monte Carlo simulated an-
SB203580 sion (SAGE), 344 nealing
structure-based design, 457 Serine and combinatorial library de-
SB209670,211,675 chemical modification re- sign, 217
SB214857,213 agents, 755 with FOCUS-2D method, 68
SB242253 Serine peptidase inhibitors hydrogen bonds, 107
structure-based design, 458 transition state analogs, in molecular similarityldiver-
Scaled particle theory, 84 652-655 sity methods, 205
SCH 47307,667 Serine protease inhibitors with QSAR, 53,61
SCH 57939,808,810 in virtual screening, 263
affinity labels, 762
SCH 66701,667 Simulated moving bed chroma-
common structural motifs,
Schrodinger equation, 79,363 494 tography
Scientific and Technical Infor- for enantiomer separation,
QSAR studies, 5 787,789-793,821
mation Network, 597
Serotonin Simvastatin, 719, 744,879,880
SciFinder, 385
conformationally restricted Single nucleotide polymorphism
SCOP, 353
X-ray crystallography applica- analog, 696 (SNP) maps, 338-340
tion, 494 ring position analogs, 703-704 Single-wavelength anomalous
ScoreDock Serotransferrin p diffraction phasing (SAD),
assessment, 303 X-ray crystallographic studies, 477-478
Scoring functions, 261,264-266, 493 Sirolimus, 848,849
306-312,322 Serum albumin Site-based pharmacophores,
assessment, 312-315 binding of enantiomers, 786 235-237
basic concepts, 289-290 mass-spectrometric binding Size-exclusion chromatography,
and molecular modeling, assay screening, 604 599
115-116 target of NMR screening stud- Sizofilan, 849
overview of, 307 ies, 567-568,573 SKI? 107260,663
penalty terms, 313 SFCHECK program, 478 SLIDE
Screening. See also Combinato- Sgroups, 373,397,405,410 anchor and grow algorithm,
rial chemistry; High- Shake and Bake, 477,478 296
throughput screening; Vir- combinatorial docking, 317
SHAPES
tual screening explicit water molecules, 302
NMR screening libraries, 575
mass spectrometry-based, geometric/combinatorial
and SAR-by-NMR,568
597-598 search, 295
SHARP, 478 ligand handling, 293
solid phase mass spectromet-
SHELX, 478 protein flexibility, 301
ric, 606-607
Sialic acid, 450-451 receptor representation in,
Screenscore, 319
Sculpt, 387 Sialidase 291
SEAL, 316,321 genetic algorithm study of SLN (Sybyl Line Notation), 369,
Search queries, 368 docking, 88-89 410
p-Secretase inhibitors Sickle-cell anemia, 419-425 Slow-binding enzyme inhibitors,
transition state analogs, 649 Side chains 720,734-740,749
Selector, 387 of known drugs, and druglike- Slow-tight-binding enzyme in-
SELECT program, 218-219,221 ness screening, 248-249 hibitors, 720, 734-740
SELECT statement, 404,406 peptide-like, 635, 642 SMART, 349
Self-organizingMap method, Signature functional group filters, 246
65,66 molecular similarity methods, SMILES notation, 254,410
Semirigid analogs, 694-699 188 and canonical renumbering,
Sequence assembly, 342 Similarity searching, 379, 378
Sequence comparison, 334 382483,410. See also Mo- described, 368-369,371
bioinformatics for, 346-347, lecular similarityldiversity use with comparative QSAR,
352-353 methods 39
Sequence databases, 387 in molecular modeling, SmoG, 311
Sequences, 363364 135-138 SN-6999,544
Sequential docking, 317 and QSAR, 67-68 Snowdrops, drugs derived from,
Sequential simplex strategy, 11 SQL for, 395 892
Index
SNX-111,851-852 Stereoplex, 387 nonlinear, 62

SOCRATES, 361 Stereoselective synthesis, See pharmacophore searching for
Sodium cromoglycate, 883-884 Asymmetric synthesis generating, 255,272-273
Soergel distance, 68 Steric parameters and toxicity prediction,
Solid Phase Synthesis database, in QSAR, 23-25,52 828- 843
385 STERIMOL parameters, 24, 50 Structure-based drug design,
Solution molecular dynamics, Steroid 5a-reductase inhibitors, 358,417-419,467-469
528 717,768-770 antifolate targets, 425-432
Solvation effects QSAR studies, 37-38 and combinatorial chemistry,
and docking scoring functions, Steroids 227
307,308,310 affinity for binding proteins, combinatorial library design,
drug-receptor complexes, 147 225-228
177-179 biosynthesis inhibition, 770 and docking studies, 282,
molecular modeling, 83-85 STN Express, 385
321-322
SOLVE, 478 STN International, 385
Somatostatin hemoglobin, 419-425
STO-3G basis set, 175
conformationally restricted Storage, of chemical informa- hydrolases, 449-454
peptidomimetics, 129,637, tion, 373-377 iterative cycles, 282,463
638 Streptavidin NMR spectroscopy for, 419,
receptor agonists found free energies of binding, 286 516-517
through combinatorial genetic algorithm study of bi- oxidoreductases, 445-449
chemistry, 657 otin docking to, 89 phosphoryl transferases,
template mimetics, 643-644, interaction with biotin, 456-461
645 181-183 picornaviruses, 454-456
Sorangium cellulosum, epothi- Streptogramins, 876-877 proteases, 432-445
lones from, 864 Streptomyces, 876,891 and virtual screening, 244
SPC model, 175 Streptomyces cattleya, 872 Structure-based inhibitor de-
Specific structure, 368,403 Streptomyces clavuligerus, 869 sign, 418
Sphere coloring, 296 Streptomyces erythreus, 874 Structure-based virtual screen-
Sphere-exclusion, 207 Streptomyces griseus, 869 ing, 260-267
Spindle poisons, 867 Streptomyces venezuelae, 870 Structure elucidation
Spin-label NMR screening, Streptomycin, 869-870 NMR spectroscopy for,
573-574 Stromelysin 517-525
SPLICE, 89,113 flexible docking studies, 265 Structure table, 376
Spongothymidine, 867-868 NMR binding studies, Structure verification
Spongouridine, 867-868 555-557 as bottleneck in drug discov-
SPRESI, 254 target of NMR screening stud- ery, 592
SPRESI'95,385 ies using SAR-by-NMR, 566 mass spectrometry applica-
SQL (Structured Query Lan- target of structure-based drug tion, 594-596
guage), 395,410 design, 443-444 Subgraph isomorphism, 67,405,
SR-48968,670 Structural data mining, 410 410
SR-120107A,670,673 Structural frameworks of known Subreum, 849
SRS (Sequence Retrieval Sys- drugs Substance P antagonists,
tem), 335 and druglikeness screening, 669-671
Standardization 248-249 Substances, 368,410
bioinformatics, 337 Structural genomics, 283 Substituent constants, for
Star schema, 390,391,410 and bioinformatics, 352354 QSAR, 19-23
Statins, 719,848 and X-ray crystallography, Substrate analog enzyme inhibi-
multisubstrate analogs, 481,494-496 tors, 733
744-746 Structural homology, See Ho- Substructure searching, 255,
Statistical mechanics, 94-95 mology 379,381-382,410-411
Stem cell factor Structural similarity, 255 and QSAR, 67
X-ray crystallographic studies, Structure-activity relationships. SQL for, 395
493 See also Quantitative struc- Substructure search keys, 375,
Stereoisomer analogs, 704-707 ture-activity relationships 376,378,410
Stereoisomers, 365-366, and data mining, 66-67 molecular similarity/diversity
783-785 and molecular modeling, 134 methods, 189,221
Index
Subtilases T. gondii DHFR, QSAR inhibi- TB36

homology modeling, 123 tion studies, 33 structure-based design,
Succinate dehydrogenase, 733 Tabular storage, 369-371 424-425
Succinic semialdehyde dehydro- Tabu search TBC 3214,674,676
genase inhibitors, 718 with docking methods, 292, Team Works, 377
Succinyldicholine 299 Teicoplanin, 849
conformationally restricted in virtual screening, 263 Telithromycin, 848,876
analogs, 699 Tachykinin receptors, 669 Temperature
Sugars Tacrine, 58 molecular dynamic simula-
chirality, 784 structure-based design, 449 tion, 96
Suicide substrate MMP inhibi- Tacrolimus, 848,849 Template mimetics (peptidomi-
tors, 651-652 Tadpoles metics), 643-644
Suicide substrates, 756 narcotic action of ROH, 28-29 Tendamistat
Sulbactam, 718 NMR relaxation measure-
Tagging approaches, 596-597
Sulfonamides ments, 528-529,535
TAK-029,213
pharmacophore points, 249 Teniposide, 867
Sulfones TAK-147 Teprotide, 746,881
pharmacophore points, 249 structure-based design, 450 Terabyte, 411
Sulfonyl halides Tandem mass spectrometry Terbinafine, 717
filtering from virtual screens, (MS-MS),590-591 Testosterone, 36, 768, 771
246 of combinatorial libraries, 592 A,-Tetrahydrocannabinol
Sulphonamides, 717 for structure determination of (THC), 852-853
Supercritical fluid chromatogra- bioactive peptides, 518 Tetrahydrofolate, 425
P ~ Y types of mass spectrometers, Tetrahymena pyriformis
for enantiomer separation, 585 growth inhibition, 27,37-38
787 Tanimoto coefficient, 68,202, spiro-Tetraoxacycloalkanes
Supercritical fluid chromatogra- 411 ring-size analogs, 702-703
phy-mass spectrometry cluster-based methods with, Tetrazoles, 135
(SFC-MS) 206 as surrogates for cis-amide
for combinatorial library puri- and similarity searching, 382, bond, 141-142
fication, 594 410 Thalidomide, 783-784,785
Superstar, 315 for virtual screening, 210 Thebaine, 850,851
Superstructure search, 255,257, Tanimoto Dissimilarity, 220 Theilheimer/Chiras/Metalysi
411 Tanomastat database, 386
Supervised data mining, 66-67, structure-based design, Therapeutic area screening
411 444-445,446 molecular similarityldiversity
Suxamethonium, 857 TargetBASE, 348 methods, 191
Sweet clover, drugs derived Target class approach, 188, Thermodynamic cycle integra-
from, 882 228-234 tion, 99-100,120-121
Sweet wormwood, drugs derived Target discovery. See also Drug Thermolysin inhibitors
from, 886 targets genetic algorithm study of ac-
SWISS-PROT, 335,345-346 bioinformatics for, 335, tive site, 89
SYBYL, 130 338-345 molecular modeling, 117, 120,
Sybyl Programming Language, TAR RNA inhibitors, 103 121,151-153
378,410 Tautomenzation novel lead identification, 321
Synercid, 848,849, 876 NMR spectroscopy, 526-528 transition-state analogs,
SYNLIB, 361 Tautomers, 366 749-750
SYSDOC Tautomer search, 388,405-406 Thick clients, 400-401,411
ligand handling, 293 Taxol, 843,848,861-863 Thienamycin, 872,874
Systematic search HMBC spectroscopy, 518 Thin clients, 363,392,401,411
and Active Analog Approach, NMR spectroscopy, 525-526, Thiobiotin
144-145 531 binding to avidin, 181, 182
and conformational analysis, Taxol side-chain, 803-804 Thioesters
89-93 Taxus baccata (English yew), filtering from virtual screens,
in docking methods, 292 861-862 246
in molecular modeling, 89-94, Taxus brevifolia (Pacific yew), p-ThioGARdideazafolate (P-TG-
116 861-862 DDF), 742-743
Index
Thiol proteases Thromboxane 4,762-763 Toxicity screening

QSAR studies, 5 Thymidine kinase inhibitors, as bottleneck in drug discov-
Thiomuscimol, 690 717 ery, 592
4-Thioquinone fluoromethide, role of water in docking, 303 and functional group filters,
770 X-ray crystallographic studies, 246-247
Thioridazine, 805,806 493 pulsed ultrafdtration applica-
Thiorphan, 650,651 Thymidylate synthase inhibition, 605
Thor database manager, 386 tors, 227, 717 Toxicophores, 829- 831
Thor system, 377 target of structure-based drug associated with allergic con-
exact match searching, design, 425,426-429 tact dermatitis, 830
380381 Thymitaq, 428 C-Toxiferine 1,856,857
Threading, 123-125 Thyroid hormones TPCK, 760-761,762
3D descriptors NMR spectroscopy, 529-531
Tramadol, 782
molecular similaritytdiversity Thyroid receptor beta, 263
chromatographic separation,
methods, 55-58,191-201 Thyroliberin
validation, 211-213 peptidomimetics, 129 792
Three-dimensional electron Thyrotropin-releasinghormone, classical resolution, 795-796
cryomicroscopy, 615616 637 metabolism, 786-787
3D models, 363,366-367, Thyroxine Transesterification
397-398 NMR spectroscopy, 529-531 enzyme-mediated asymmetric,
3D pharmacophores Tight-binding enzyme inhibi- 805-806
filter cascade, 267 tors, 720, 734-740, 749 Transferred NOE technique,
for molecular similarityldiver- Time-of-flight mass spectrome- 532
sity methods, 194-201 try, 585,607 and NMR screening, 572-573
for searching, 381-382 Timolol Transition-state analog enzyme
similarity searching, 189,383 renal clearance, 38 inhibitors, 720, 748-754
for virtual screening, 210, TIP3P model, 175 Transition state analog inhibi-
255-259 Tipranavir, 812, 813 tors, 646
3D quantitative structure-activ- Tirilazad mesylate, 849 peptide bond isosteres, 644
ity relationships (3D- Titrations 7-Transmembrane G-protein-
QSAR), 52,53,58-60 NMR application, 545 coupled receptors, 229-234
and molecular modeling, 115, TNF-a converting enzyme Transpeptidase inhibitors, 717
138 (TACE),652 Transverse relaxation-optimized
3D query features, 368,381382, Tolamolol spectroscopy (TROSY), 515
398 renal clearance, 38 for macromolecular structure
3DSEARCH, 111,259 Tolrestat determination, 533,534
3D structure databases, 387 structure-based design, 448 Trees, 376377,411
3-Point pharmacophores, 376, Tomudex, 427 TrEMBL, 335,346
408 Toolkits, 386,411 Triazines
molecular similarity methods, Toothpick plant, drugs derived QSAR studies of cellular
189,195196,198 from, 883 growth inhibition, 37-38
for virtual screening, 210 TOPAS, 192 QSAR studies of DHFR inhibi-
Threo- prefix, 784 TOPKAT, 246 tion, 31-33
Threose Topographical data, 411 Trimethoprim, 717, 719
enantiomers, 784 Topographical mimetics (pep- interaction with dihydrofolate
Thrombin inhibitors, 227 tidomimetics), 636 reductase, 151,183
combinatorial docking, 318 Topoisomerase I1 inhibitors, 717 structure-based design, 425
force field-based scoring study, Topological descriptors a,@-bis-Trimethylammonium
307 for druglikeness screening, polymethylene compounds,
molecular modeling, 116 247-249 710
non-peptide peptidomimetics, estimation systems, 388-389 Trimetrexate
660-662,663,664 with QSAR, 54-55 interaction with dihydrofolate
seeding experiments, 319 Topotecan, 848,849,861 reductase, NMR spectros-
site-based pharmacophores, Torsional potential, 80 copy, 531,557-559
235-236 Toxicity databases, 246,386 Triple resonance spectra, 514
target of structure-based drug development, 828-829 Tripos, Inc. databases, 387
design, 442-443 Toxicity prediction, 827-843 Tripos force field, 80
t-RNA guanine transglycosylase Uncompetitive inhibitors, Virtual libraries, 237, 283,315
inhibitors 729-730 handling large, 220-221
novel lead identification, 321 Unicode, 411 and QSAR, 61
Trojan horse inactivators, 756 UNITY, 259,377 Virtual rings, 91
Trypsin inhibitors descriptors, 192,201 Virtual screening, 244-245,
molecular modeling, 120 in molecular modeling, 111 271-274,315-317,412. See
QSAR studies, 5, 25 novel lead identification, 320 also Docking methods; Scor-
site-based pharmacophores, UNITY 2D, 212 ing functions
235-236 UNITY 3D, 363,387 applications, 267-271
Trypsinogen inhibitors University of Manchester Bioin- basic concepts, 289-290
molecular modeling, 116 formatics Education and combinatorial docking,
Tryptophan Research site (UMBER), 317-318
chemical modification re- 335 consensus scoring, 265-266,
agents, 755 Unix, 396, 411 291,319-320
TSCA database, 386 Unsupervised data mining, docking as virtual screening
Tubby gene 66-67,412 tool, 266-267
X-ray crystallographic func- Urea druglikeness screening,
tion elucidation, 494 pharmacophore points, 249 245-250
Tube curare, 856 Ureido resonance, 182 filter cascade, 267
D-Tubocurarine USEPA Suite, 390 focused screening libraries for
drugs derived from, 856,857 lead identification, 250-252
fragment analogs, 708-710 VALIDATE, 116,310 hydrogen bonding and hydro-
p-Tubulin Vancomycin, 770 phobic interactions, 319
X-ray crystallographic studies, Vancomycin-peptide complex ligand-based, 188,209-214
483 binding affinity, 119 molecular similarityldiversity
Tumor necrosis factor receptor 1 van der Waals forces, 174,285 methods for, 188, 190,
X-ray crystallographic studies, and docking scoring, 308 209-214
493 enzyme inhibitors, 723-724 novel lead identification,
2D descriptors and molar refraction, 24 320-321
molecular similarityldiversity molecular modeling, 79-80, pharmacophore screening,
methods, 191-194 81, 82, 89 252-260
with QSAR, 54-55 and QSAR, 6 , 7 QSAR as tool for, 66-69
validation, 211-213 van der Waals radius, 79,81, seeding experiments, 318-319
2D pharmacophore searching, 173 structure-based, 260-267
383 Vanillin weak inhibitors, 319
filter cascade, 267 antisickling agent, 419-420 Vista search program, 387
virtual screening, 255 Vanilloid receptors, 853-854 Vitamin D receptor
2D quantitative structure-activ- VanX inhibitors, 770-771, 772 X-ray crystallographic studies,
ity relationships (2D- VARCHAR data type, 412 493
QSAR), 52,53 VARCHAR2 data type, 412 VK19911
2D query features, 397 Vector maps, 140-142 structure-based design, 458
2D structures, 364366,397 Verapamil Voglibose, 849
conversion of names to, 373 classical resolution, 798 VolSurf program, 202
2-Point pharmacophores, 376 Verapamilic acid, 798 Volume
Tyrosine Vidarabine, 717 molecular dynamic simula-
chemical modification re- Vigabatrin, 718,766, 767,782 tion, 96
agents, 755 Vinblastine, 858-859,860 Volume mapping, 139-140
Tyrosine kinase inhibitors Vinca alkaloids, 858-860 Voronoi QSAR technique, 53
molecular modeling, 130 Vincristine, 858-859,860 VRML (Virtual Reality Markup
Vindesine, 859-860 Language), 405
U-85548 Vinorelbine, 849 VX-497
structure-based design, Viracept, 659 structure-based design, 447
436-437,438 structure-based design, 440, VX-745
Ugi reaction, 229,231, 232, 236 442 structure-based design, 458
UK QSAR and Cheminformatics Viral DNA polymerase inhibi-
Group, 360 tors, 717 Warfarin, 882-883
Ukrain, 849 Virtual chemistry space, 67 enantiomers, 786
Warfarin(Continued) WIN-63843 and molecular modeling, 78
HIV protease inhibitor, 659, structure-based design, phase problem, 476-478
661 455-456 and QSAR, 5
nonclassical resolution, 801 Wiswesser line notation, and structural genomics, 481,
WARP, 478 368-369 494-496
Water WIZARD, 255,260 in structure-based drug de-
gas phase association thermo- World Drug Index (WDI),379, sign, 418,419,420
dynamic functions, 178 386,387 and structure-based library
importance of bound in struc- World Patents Index (WPI), 386 design, 225
ture-based design, 409 and virtual screening, 244
molecular modeling, 85 Xanthine-guanine phosphoribo-
X-ray diffraction, 472-473, 614
syltransferase
octanoVwater partitioning sys- X-ray lenses, 612
X-ray crystallographic studies,
tem, 16-17
493
and protein-ligand interac- Xanthine oxidase inhibitors, 718 Yellow sweet clover, drugs de-
tions, 288 XFIT graphics program, 478 rived from, 882
role in docking, 302-303, Ximelagatran Yew tree, paclitaxel from,
313-314 structure-based design, 442 861-862
solvating effect in enzyme in- XML (extensible Markup Lan- YM-022,856
hibitors, 722-723 guage), 371,405,412 Yukawa-Tsuno equation, 14
Wellcome Registry, 222 XMLQuery, 412
White-Bovill force field, 80 X-ray crystallography, 351, Z-100,849
WIN-35065-2 471-473,612 Zanamivir, 717
dopamine transporter inhibi- applications, 479-481 structure-based design, 451
tor, 268 crystallization for, 473-474, Ziconotide
WIN-51711 480-481 NMR spectroscopy, 518-523,
structure-based design, databases for, 478-479 526,534
454-455 data collection, 474-476 Zidovudine, 717
WIN-54954 drug targets with published Zingerone
structure-based design, 455 structures, 482-493 allergenicity prediction, 835
"An essential addit to the libraries o
.. .
. a n outstanding work . . highly
information in d r u g studies and research.
--Toozzmal o f Medicinal Chenzi
This new edition of Dr. Alfred Burger's internationally celebrated

classic helps researchers acquaint themselves with both traditional
and state-of-the-art principles and practices governing new
medicinal drug research and development. Completely updated
and revised to reflect the many monumental changes that
have occurred in the field, this latest edition brings together
contributions by experts in a wide range of related fields to explore
recent advances in the understanding of the structural biology
of drug action, as well as cutting-edge technologies for drug
discovery now in use around the world.
This Sixth Edition of Burger's Medicinal Chemistry and Drug
Discovery has been expanded to six volumes:
Volume 1: Drug Discovery
Volume 2: Drug Discovery and Drug Developme1
Volume 3: Cardiovascular Agents and Endocrines
Volume 4: Autocoids, Diagnostics, and Drugs from New Biology
Volume 5 : Chemotherapeutic Agents
Volume 6: Nervous System Agents
UUNALDA. ABRAHAM, PHD, is Proiessor and Chairman

of the Department of Medicinal Chemistry at the Virginia
Commonwealth University School of Pharmacy. A world-renowned
leader in medicinal chemistry and biotechnology, he is the author
of more than 140 journal citations and twenty-five patents. H e
was selected by the AACP Board of Directors as the recipient of the
2002 Ar? ? P ' D2xxrqon Biotechnology Award.

Vol 1 - Drug Discovery

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Vol 1 - Drug Discovery

Încărcat de

Drepturi de autor:

Formate disponibile

MEDICINAL CHEMISTRY

Burger's Medicinal Chemistry and Drug Discovery

A John Wiley and Sons, Inc., Publication

The Sixth Edition of Burger's Medicinal laboratories, brought to market [Parnate,

ACS 26th Medicinal Chemistry Symposium

HISTORY OF QUANTITATIVE DRUG-TARGET BINDING

6 VIRTUAL SCREENING, 243 Donald J. Abraham

9 CHEMICAL INFORMATION 13 MASS SPECTROMETRY AND

Timothy S. Baker 19 STRUCTURAL CONCEPTS IN

18 CHIRALITY AND BIOLOGICAL INDEX, 901

5.1.1 Inhibition of Crude Pigeon Liver 5.1.18 Inhibition of 5-a-Reductase by 17P-

1 INTRODUCTION scribed by electronic attributes, hydrophobic-

Table 1.1 Types of Intermolecular Forces

5. van der Wads

between independent variables and a depen-

Taking the partial derivative of Equation 1.14

small compared to that in the dependent -- -

2. For any given value of X, the Y values are

Table 1.3 Antibacterial Activity

A larger value of F implies a more significant

COOH ceptibility of a reaction to substituent effects.

Fujita and Nishioka used an integrated ap-

were used in addition to Taft's Es values and aObs(3

of powerful tools to rapidly and accurately

% is set to zero. The n-value for a nitro

Table 1.4 Substituent Constants for QSAR Analysis

Table 1.4 (Continued)

Table 1.4 (Continued)

Table 1.4 (Continued)

Table 1.4 (Continued)

V is a solute volume term; T* represents

Log Perm acid in guinea pig leukocytes by X-vinyl cat-

In QSAR 54, Log P* represents the olive oil/

Log 1/C = m Log(1lA) + constant (1.61) 4.1.4 lnhibition of Bacterial Luminescence

Octanol phase n f7 Bio phase Log 1/C = -a(log P)*+ b log P +

Hansch and coworkers have amply demon-

4.2 Nonlinear Models This is an example of nonspecific toxicity

This allowed the derived substituent con- 5 APPLICATIONS OF QSAR

In all equations, n is the number of data

5.1 Isolated Receptor Interactions 5.1.2 lnhibition of Chicken Liver DHFR b i

a t o= 2.0(+0.87) log P = -0.577

The enhanced activity of the "bridged" sub-

5.1.10 lnhibition of Murine 11210 DHFR by

Figure 1.4. Gaps in spanned space of MR6 for

5.1.12 Binding of X-Phenyl, N-Benzoyh-

5.1.1 5 Binding of X-Phenyl, N-Benzoyh-

5.1.1 3 Binding of X-Phenyl, N-Benzoyl-L-ala-

The disappearance of the MR term in QSAR

5a-Reductase, a critical enzyme in male 5.1.1 6 lnhibition of 5-a-Reductase by 4-X,

5.1.1 7 lnhibition of 5-a-Reductase by 17P-

Log l/Ki = 0.35(?0.09)Clog P

outlier: 2,5- (CF,)

5.1.18 lnhibition of 5-a-Reductase by 17P-

In all these equations, the coefficients with hy-

Using Hammett u constants, Garg et al. re-

The indicator variables I,,, ,and I ,

5.3 Interactions In Vivo

It is apparent from QSAR 1.106 and 1.107, 6.1 Database Development

Table 1.5 Rho Values for Chemical and Biochemical Reactions

Hydrogen Abstraction from Unhindered Phenols

Soon, this parameter was shown to correlate

varying electronic attributes (a > 0 and a+<

predicted by this model. The model suggests