Documente Academic
Documente Profesional
Documente Cultură
pubs.acs.org/molecularpharmaceutics
■ INTRODUCTION
Understanding the pharmacokinetic characteristics of drug
metabolism, and accessibility is less important for most
isoforms. This is also why it works less well when the binding
candidates is crucial both in the early drug discovery and orientation of a molecule within the active site is a more
subsequent development processes. The metabolism and important determinant than reactivity. This is the case for the
elimination of drugs gives a major contribution to their kinetic CYP isoforms 2C9 and 2D6. To take this into account, we
profile and is heavily influenced by interactions with the therefore recently constructed simple pharmacophore correc-
cytochrome P450 (CYP) enzyme family. This family of tions that simply take the distance between each atom and the
ubiquitous enzymes is the major determinant of phase I pharmacophoric element into account (carboxylic acid and its
metabolism. The nine most prevalent isoforms in human are bioisosteres in CYP 2C9 and protonated amines in CYP
1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1, and 3A4, among 2D6).6,13
which 1A2, 2C9, 2C19, 2D6, and 3A4 are considered to be the In principle there are only two contributions that a ligand-
most important for drug metabolism.1 based model for CYP-mediated site of metabolism should
The prediction of the site of metabolism for CYP mediated include: the reactivity and descriptors that mimic the binding
drug metabolism has received a lot of interest in recent years, mode to the active site. Descriptors that mimic the binding
and models have been based on many different techniques.2 mode can be separated into two classes. First are those that
While attempts to construct models that include the protein describe overall accessibility of an atom, which is determined by
structure in the modeling have been made,3−5 they so far seem the orientation of the molecule inside the active site. Examples
to offer very little (if any) improvement in prediction accuracy of such descriptors are distances to pharmacophoric elements,
compared to the best ligand-based models.6−8 This is probably bond counts to the center of the molecule, etc. Second are the
because the CYP enzymes are very flexible,9 making the descriptors that describe the local accessibility of an atom. Such
sampling of the full conformational space too computationally descriptors estimate how likely an atom is to be accessible for a
expensive. reaction to take place, if the binding mode suggests that this
We have developed the 2D structure based SMARTCyp atom will be close to the heme group.
methodology,10−12 which in contrast to other models is In SMARTCyp, all descriptors that mimic the binding mode
completely independent of experimental data. To achieve belong to the first class. In the standard model the relative span
this, the reactivity toward oxidation by the heme group of the is used to describe the orientation. The relative span is defined
CYPs of all atoms is assigned using fragment matching toward a
fragment library for which the reactivities have been
Special Issue: Predictive DMPK: In Silico ADME Predictions in
precomputed using density functional theory (DFT) transition
Drug Discovery
state calculations.
These reactivities in combination with a simple atom Received: September 12, 2012
accessibility descriptor (the relative span) lead to a model that Revised: December 13, 2012
is quite accurate for most CYP isoforms.8 This is because Accepted: January 7, 2013
reactivity is the most important factor in determining the site of Published: January 22, 2013
© 2013 American Chemical Society 1216 dx.doi.org/10.1021/mp3005116 | Mol. Pharmaceutics 2013, 10, 1216−1223
Molecular Pharmaceutics Article
■
sizes 3−8 or larger for each level (a level describes atoms
located a specific number of bonds from the atom of interest),
resulting in 33 variables at each level. Since we are counting RESULTS AND DISCUSSION
atom type, rings, and total atom count, each atom always Introduction of the Solvent Accessible Surface Area
contributes to at least two variables at each level (atom type for Site of Metabolism Prediction. We wanted to
and atom count), and can potentially contribute to three investigate if adding the atomic solvent accessible surface area
different variables (if it is part of a ring). Rings were defined (SASA) to the SMARTCyp score would improve the
using the minimum cycle bases algorithm41 as implemented in prediction accuracy. As a test case, a small subset of the CYP
the CDK.42,43 For each atom we generated circular fingerprints 3A4 data set (50 compounds) was used to determine the
in six levels, where level 0 is the atom of interest and levels 1−5 contribution of atomic SASA, which was then validated by the
encompass the atoms separated from this atom by 1−5 bonds, remainder of the CYP 3A4 data set (425 compounds). The
respectively, as shown in Figure 4 for atom type counts. atomic SASA was computed from the 3D structures of these
Construction of Partial Least-Squares (PLS) Models. 475 compounds. The contribution of atomic SASA to the
SIMCA-P (version 11)44 was used to create the PLS models45 SMARTCyp scoring function was optimized to achieve the best
1218 dx.doi.org/10.1021/mp3005116 | Mol. Pharmaceutics 2013, 10, 1216−1223
Molecular Pharmaceutics Article
top 1, top 2, and top 3 prediction accuracies for the training set. contributing 84% of the model (relative span and atomic SASA
The optimal parameter for the training set was found to be contribute 9% and 7%, respectively).
0.04, resulting in the SMARTCyp score equation shown (eq 1). Hence, we have shown that the addition of atomic SASA to
the SMARTCyp model adds a valuable contribution to the
score = reactivity − 8 × relative span model (∼7% of the prediction accuracy).
− 0.04 × atomic SASA (1) How To Calculate the Atomic SASA without Using 3D
Structures. Since SMARTCyp is a purely 2D-based method,
The score determines the likelihood for metabolism, with a we decided to investigate if it was possible to predict the atomic
low score suggesting an atom is more likely to be metabolized SASA from 2D structure-based properties, avoiding the need to
than an atom with a high score. The reactivity is a measure of generate 3D structures.
the transition state energy for the oxidation reaction As starting point for the building of a predictive model for
(determined by fragment matching against precomputed
atomic SASA, we compiled a diverse selection of 27,838
transition state energies). The relative span is defined as
druglike molecules from the ZINC database.29 The most likely
shown in Figure 1, and the constant 8 was determined to reflect
tautomer and conformation were generated for each molecule
the standard deviation of the original reactivity rules in
in this subset, and their atomic SASA were computed as
SMARTCyp, and has not been changed in this work.10
described above. This data set was split into training and test
The improvement in prediction accuracy was found to be 2−
sets, and atoms were divided into subsets according to the
4.5 percentage points for the training set and 1.2−3.0
percentage points for the test set (see Table 2), suggesting number of neighboring atoms as described in Table 1.
Using circular fingerprints (described above) predictive
models were built from each training set using the partial
Table 2. SMARTCyp Prediction Accuracy on CYP 3A4 Data
least-squares regression method (PLS).45 The final PLS models
with and without 3D Atomic SASA Contribution
are described in Table 3. Applied to the test sets, the models
training set test set have mean absolute errors (MAE) of 0.2−3.8 Å2, with
no SASA SASA no SASA SASA coefficients of determination (r2) ranging from 0.57 to 0.85.
top 1%a
66.0 68.0 64.2 65.4
In the final model (2DSASA), in which the appropriate PLS
top 2%a 77.5 80.0 75.7 78.1
model is applied to each atom, the MAE is 2.8 Å2 and r2 is 0.96.
top 3%a 83.5 88.0 81.2 84.2
The MAEs of the four PLS models are inversely correlated to
a the number of neighbors and the size of the atomic SASA
Top n accuracy is the percentage of compounds for which a site of
metabolism is found among the top n atoms.
distribution of the data sets (see Tables 1 and 3). This is
because data sets with small data ranges typically get smaller
that atomic SASA can make a valuable contribution to absolute errors. As can be seen in Figure 7 and Figure S2 in the
SMARTCyp models. An example of how rankings can change Supporting Information, the atomic SASA of the test set
when including atomic SASA is shown in Figure 5. A variable calculated by 2DSASA more often have large positive errors
than large negative ones. However, this is not reflected in the
average error, which is −0.06 for the test set. The errors for
specific atom types correlate with the occurrence of the atom
types in the different data sets (1−4 neighbors), and hence with
the PLS model applied. More details on the errors for specific
atom types are available in the Supporting Information.
Figure 6 describes the variables that are included in the final
PLS models. The atom type variables (T) contribute to levels
0−1 in all models, level 2 in the 1−3-neighbor models, and
Figure 5. An example of improved site of metabolism prediction when level 3 in the 1-neighbor model. The ring count variables (O)
including atomic SASA. The arrow represents the site of metabolism, contribute to level 0 in all models except the 1-neighbor one
and the numbers represent the top two ranked atoms (black with
atomic SASA and gray without). (an atom with one neighbor cannot be part of a ring). It also
contributes to level 1 in the 1- and 2-neighbor models, and level
2 in the 1-neighbor model. The variable for total number of
atoms (#) contributes to levels 2−3 in all models, and level 4 in
exclusion evaluation on the test set shows that the reactivity is all models except that for 4 neighbors. It is clear that the fewer
ten times more important than the two accessibility descriptors, atoms that are bound to an atom, the more atoms further away
a
Mean absolute error of the training (MAE) and test (MAEpred) sets. bThe predicted atomic SASA are set to zero if the PLS model gives a negative
value. c2DSASA uses the appropriate PLS model for each atom.
Table 5. Prediction Accuracy for the Standard SMARTCyp Model with and without 2D Atomic SASA Contribution on Nine
CYP Isoforms and the Improvements with Atomic SASA
1A2 2A6 2B6 2C8 2C9 2C19 2D6 2E1 3A4
Standarda
top 1% 64.3 71.3 66.0 61.9 55.7 59.6 48.9 64.1 64.4
top 2% 78.5 84.1 74.6 73.8 69.6 74.0 59.8 81.0 75.9
top 3% 85.4 88.7 85.0 80.2 78.8 80.3 68.6 87.1 81.4
+SASAb
top 1% 64.9 72.4 66.2 65.5 58.8 60.1 49.3 64.1 65.4
top 2% 80.0 85.7 76.8 77.5 71.7 76.1 61.1 82.1 78.1
top 3% 88.4 90.5 86.8 84.5 81.4 83.0 70.7 89.0 85.0
a
Equation 1 without SASA. bEquation 1.
by Reynald et al.,51 but it might as well be a result of the data compared to much more complex methods and showed better
sets used in the current work. or similar prediction accuracies for all isoforms.
The same evaluation performed for the CYP 2D6 model on
the CYP 2D6 data set shows that reactivity is even less
important than for the 2C family (55%), whereas the
■
*
ASSOCIATED CONTENT
S Supporting Information
pharmacophore is much more important (35%). This reflects ZINC IDs of all compounds in the data sets, PLS equations for
the substrate preference of CYP 2D6, which has a high the 2DSASA models, SYBYL atom type descriptions of the
preference for protonated amines, and binds these specifically atom types included in the 2DSASA models, atom type
with the amine group far away from the heme. The N- distributions in the 2DSASA data sets, and errors by atom type
dealkylation of protonated amines, which is initiated by a in 2DSASA. This material is available free of charge via the
hydrogen abstraction from the α-carbon atom, is one of the Internet at http://pubs.acs.org.
■
CYP-mediated reactions with lowest energy barrier. Thus, to
make a reactivity-based model such as SMARTCyp predict AUTHOR INFORMATION
CYP 2D6 mediated metabolism correctly, a large contribution Corresponding Author
from the pharmacophore descriptor is required. *Phone: +45 35 33 66 50. Fax: +45 35 33 60 41. E-mail: pry@
Comparison of the Final Models to More Complex sund.ku.dk.
Models. To compare our methods to others, we show the
prediction accuracies in relation to the recent work on the RS- Notes
The authors declare no competing financial interest.
■
predictor methodology by Zaretzki et al.,8 in which models
were built using the MIRank algorithm52 on SMARTCyp
reactivities together with either 148 topological descriptors, or ACKNOWLEDGMENTS
the same descriptors plus 392 quantum chemical descriptors The work was supported by grants from the Alfred Benzon
computed with AM1. They also compared the results to two Foundation, the Danish Council for Independent Research
methods implemented in StarDrop and Schrö dinger. A (Medical Sciences), and Lhasa Limited. The authors wish to
comparison of the top 2 prediction accuracies is shown in thank Nina Jeliazkova for constructing the initial version of the
Figure 9. It shows that our simple 3−4 parameter methods circular fingerprint java code.
compare well to the much more complex RS-predictor based
models, as well as the StarDrop and Schrödinger software
packages.
■ ABBREVIATIONS USED
SASA, solvent accessible surface area; CDK, chemistry
development kit; CYP, cytochrome P450
■ REFERENCES
(1) Guengerich, F. P. Cytochrome P450s and other enzymes in drug
metabolism and toxicity. AAPS J. 2006, 8, E101−E111.
(2) Kirchmair, J.; Williamson, M. J.; Tyzack, J. D.; Tan, L.; Bond, P.
J.; Bender, A.; Glen, R. C. Computational Prediction of Metabolism:
Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms. J.
Chem. Inf. Model. 2012, 52, 617−648.
(3) Danielson, M. L.; Desai, P. V; Mohutsky, M. A.; Wrighton, S. A.;
Lill, M. A. Potentially increasing the metabolic stability of drug
candidates via computational site of metabolism prediction by
CYP2C9: The utility of incorporating protein flexibility via an
ensemble of structures. Eur. J. Med. Chem. 2011, 46, 3953−3963.
(4) Moors, S. L. C.; Vos, A. M.; Cummings, M. D.; Van Vlijmen, H.;
Ceulemans, A. Structure-Based Site of Metabolism Prediction for
Figure 9. Comparison of top 2 prediction accuracies for SMARTCyp Cytochrome P450 2D6. J. Med. Chem. 2011, 54, 6098−6105.
with 2DSASA included to other prediction methods. (5) Rydberg, P.; Vasanthanathan, P.; Oostenbrink, C.; Olsen, L. Fast
Prediction of Cytochrome P450 Mediated Drug Metabolism.
ChemMedChem 2009, 4, 2070−2079.
■
(6) Rydberg, P.; Olsen, L. Ligand-Based Site of Metabolism
CONCLUSIONS Prediction for Cytochrome P450 2D6. ACS Med. Chem. Lett. 2012,
3, 69−73.
In this work we created 2DSASA, a model for prediction of (7) Zaretzki, J.; Bergeron, C.; Rydberg, P.; Huang, T.; wei; Bennett,
atomic SASA from purely 2D structure information. 2DSASA K. P.; Breneman, C. M. RS-Predictor: A New Tool for Predicting Sites
enables methods built from 2D structure data, e.g., toxicity of Cytochrome P450-Mediated Metabolism Applied to CYP 3A4. J.
alerts and site of metabolism predictions, to take atomic Chem. Inf. Model. 2011, 51, 1667−1689.
accessibility into account. We showed that, for a test set (8) Zaretzki, J.; Rydberg, P.; Bergeron, C.; Bennett, K. P.; Olsen, L.;
consisting of 20,847 compounds, 2DSASA predict the atomic Breneman, C. M. RS-Predictor Models Augmented with SMARTCyp
SASA with an average absolute error of only 2.8 Å2. Reactivities: Robust Metabolic Regioselectivity Predictions for Nine
CYP Isozymes. J. Chem. Inf. Model. 2012, 52, 1637−1659.
We integrated 2DSASA in SMARTCyp and showed that for (9) Pochapsky, T. C.; Kazanis, S.; Dang, M. Conformational
a set of 425 CYP3A4 substrates it gave a model as accurate as Plasticity and Structure/Function Relationships in Cytochromes
one built from atomic SASA computed from 3D structures. It P450. Antioxid. Redox Signaling 2010, 13, 1273−1296.
was also applied to data sets for eight other CYP isoforms (1A2, (10) Rydberg, P.; Gloriam, D. E.; Zaretzki, J.; Breneman, C.; Olsen,
2A6, 2B6, 2C8, 2C9, 2C19, 2D6, and 2E1) and was shown to L. SMARTCyp: A 2D Method for Prediction of Cytochrome P450-
consistently improve the predictions. The final models were Mediated Drug Metabolism. ACS Med. Chem. Lett. 2010, 1, 96−100.
(11) Rydberg, P.; Gloriam, D. E.; Olsen, L. The SMARTCyp Efficient Generation of Bioactive Conformers. J. Chem. Inf. Model.
cytochrome P450 metabolism prediction server. Bioinformatics 2010, 2010, 50, 534−546.
26, 2988−2989. (35) Confgen, version 2.1; Schrödinger L.L.C.: New York, NY, 2009.
(12) Rydberg, P.; Jørgensen, M. S.; Jacobsen, T. A.; Jacobsen, A.-M.; (36) Bernal, J. D.; Fowler, R. H. A Theory of Water and Ionic
Madsen, K. G.; Olsen, L. Nitrogen Inversion Barriers Affect the N- Solution, with Particular Reference to Hydrogen and Hydroxyl Ions. J.
Oxidation of Tertiary Alkylamines by Cytochromes P450. Angew. Chem. Phys. 1933, 1, 515.
Chem., Int. Ed. 2013, 52 (3), 993−997. (37) Mantina, M.; Chamberlin, A. C.; Valero, R.; Cramer, C. J.;
(13) Rydberg, P.; Olsen, L. Predicting Drug Metabolism by Truhlar, D. G. Consistent van der Waals Radii for the Whole Main
Cytochrome P450 2C9: Comparison with the 2D6 and 3A4 Isoforms. Group. J. Phys. Chem. A 2009, 113, 5806−5812.
ChemMedChem 2012, 7, 1202−1209. (38) Bondi, A. Van der Waals Volumes + Radii. J. Phys. Chem. 1964,
(14) Sheridan, R. P.; Korzekwa, K. R.; Torres, R. A.; Walker, M. J. 68, 441−451.
Empirical regioselectivity models for human cytochromes p450 3A4, (39) Bondi, A. Van der Waals Volumes and Radii of Metals in
2D6, and 2C9. J. Med. Chem. 2007, 50, 3173−3184. Covalent Compounds. J. Phys. Chem. 1966, 70, 3006−3007.
(15) Hennemann, M.; Friedl, A.; Lobell, M.; Keldenich, J.; Hillisch, (40) Hu, C. Y.; Xu, L. On highly discriminating molecular topological
A.; Clark, T.; Göller, A. H. CypScore: Quantitative prediction of index. J. Chem. Inf. Comput. Sci. 1996, 36, 82−90.
reactivity toward cytochromes P450 based on semiempirical molecular (41) Berger, F.; Gritzmann, P.; De Vries, S. Minimum cycle bases for
orbital theory. ChemMedChem 2009, 4, 657−669. network graphs. Algorithmica 2004, 40, 51−62.
(16) Lee, B.; Richards, F. M. Interpretation of Protein Structures - (42) Steinbeck, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann,
Estimation of Static Accessibility. J. Mol. Biol. 1971, 55, 379−400. E.; Willighagen, E. The Chemistry Development Kit (CDK): An open-
(17) Shrake, A.; Rupley, J. A. Environment and Exposure to Solvent source Java library for chemo- and bioinformatics. J. Chem. Inf. Comput.
of Protein Atoms - Lysozyme and Insulin. J. Mol. Biol. 1973, 79, 351− Sci. 2003, 43, 493−500.
371. (43) Steinbeck, C.; Hoppe, C.; Kuhn, S.; Floris, M.; Guha, R.;
(18) Weiser, J.; Shenkin, P. S.; Still, W. C. Approximate atomic Willighagen, E. L. Recent developments of the Chemistry Develop-
surfaces from linear combinations of pairwise overlaps (LCPO). J. ment Kit (CDK) - An open-source Java library for chemo- and
Comput. Chem. 1999, 20, 217−230. bioinformatics. Curr. Pharm. Des. 2006, 12, 2111−2120.
(19) Haberthur, U.; Caflisch, A. FACTS: Fast analytical continuum (44) SIMCA-P, version 11; Umetrics AB: Umeå, Sweden, 2005.
treatment of solvation. J. Comput. Chem. 2008, 29, 701−715. (45) Wold, H. Estimation of principal components and related
(20) Lee, M. S.; Feig, M.; Salsbury, F. R.; Brooks, C. L. New analytic models by iterative least squares. In Multivariate Analysis; Krishnaiaah,
approximation to the standard molecular volume definition and its P. R., Ed.; Academic Press: New York, 1966; pp 391−420.
application to generalized born calculations. J. Comput. Chem. 2003, (46) Breiman, L. Random Forests 2001, 45, 5−32.
24, 1348−1356. (47) Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn.
(21) Cavallo, L.; Kleinjung, J.; Fraternali, F. POPS: a fast algorithm 1995, 20, 273−297.
for solvent accessible surface areas at atomic and residue level. Nucleic (48) Liu, R.; Liu, J.; Tawa, G.; Wallqvist, A. 2D SMARTCyp
Acids Res. 2003, 31, 3364−3366. Reactivity-Based Site of Metabolism Prediction for Major Drug-
(22) Hasel, W.; Hendrickson, T. F.; Still, W. C. A rapid Metabolizing Cytochrome P450 Enzymes. J. Chem. Inf. Model. 2012,
approximation to the solvent accessible surface areas of atoms. 52, 1698−1712.
Tetrahedron Comput. Methodol. 1988, 1, 103−116. (49) Yano, J. K.; Hsu, M.-H.; Griffin, K. J.; Stout, C. D.; Johnson, E.
(23) Jaworska, J.; Nikolova-Jeliazkova, N. How can structural F. Structures of human microsomal cytochrome P450 2A6 complexed
similarity analysis help in category formation? SAR QSAR Environ. with coumarin and methoxsalen. Nat. Struct. Mol. Biol. 2005, 12, 822−
Res. 2007, 18, 195−207. 823.
(24) Jeliazkova, N.; Jaworska, J.; Worth, A. Open Source Tools for (50) Porubsky, P. R.; Meneely, K. M.; Scott, E. E. Structures of
Read-Across and Category Formation. In In Silico Toxicology: Principles human cytochrome P-450 2E1. Insights into the binding of inhibitors
and Applications; Cronin, M., Madden, J., Eds.; RSC Publishing: and both small molecular weight and fatty acid substrates. J. Biol.
Cambridge, U.K., 2010; pp 408−445. Chem. 2008, 283, 33698−33707.
(25) Xing, L.; Glen, R. C. Novel methods for the prediction of logP, (51) Reynald, R. L.; Sansen, S.; Stout, C. D.; Johnson, E. F. Structural
pK(a), and logD. J. Chem. Inf. Comput. Sci. 2002, 42, 796−805. characterization of human cytochrome P450 2C19: active site
(26) Boyer, S.; Arnby, C. H.; Carlsson, L.; Smith, J.; Stein, V.; Glen, differences between P450′s 2C8, 2C9 and 2C19. J. Biol. Chem.
R. C. Reaction site mapping of xenobiotic biotransformations. J. Chem. 2012, 287, 44581−44591.
Inf. Model. 2007, 47, 583−590. (52) Bergeron, C.; Moore, G.; Zaretzki, J.; Breneman, C. M.;
(27) Carlsson, L.; Spjuth, O.; Adams, S.; Glen, R. C.; Boyer, S. Use of Bennett, K. P. Fast bundle algorithm for multiple-instance learning.
IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1068−1079.
historic metabolic biotransformation data as a means of anticipating
metabolic sites using MetaPrint2D and Bioclipse. BMC Bioinf. 2010,
11, 362.
(28) Molecular Operating Environment (MOE), 2011.10; Chemical
Computing Group Inc., 1010 Sherbooke St. West, Suite #910,
Montreal, QC, Canada, H3A 2R7, 2011.
(29) Irwin, J. J.; Shoichet, B. K. ZINCA free database of
commercially available compounds for virtual screening. J. Chem. Inf.
Model. 2005, 45, 177−182.
(30) Canvas, version 1.2; Schrödinger L.L.C.: New York, NY, 2009.
(31) Soergel, D. Mathematical analysis of documentation systems.
Inf. Storage Retr. 1967, 3, 129−173.
(32) Shelley, J. C.; Cholleti, A.; Frye, L. L.; Greenwood, J. R.; Timlin,
M. R.; Uchimaya, M. Epik: a software program for pK (a) prediction
and protonation state generation for drug-like molecules. J. Comput.-
Aided Mol. Des. 2007, 21, 681−691.
(33) Epik, version 2.0; Schrödinger L.L.C.: New York, NY, 2009.
(34) Watts, K. S.; Dalal, P.; Murphy, R. B.; Sherman, W.; Friesner, R.
A.; Shelley, J. C. ConfGen: A Conformational Search Method for