Documente Academic
Documente Profesional
Documente Cultură
Virtual
Screening using Datamining
approach
Aim of Cheminformatics
Project
Tuberculosis
Obstacles For Drug Design
Drugs Currently in
Development
What is QSAR?
QSAR is a mathematical relationship between a
biological activity of a molecular system and its
geometric and chemical characteristics.
A general formula for a quantitative structure-activity
relationship
(QSAR) can be given by the following:
Molecule Properties
SPC : Structure Property
Correlation
CHEMICAL PROPERTIES
MOLECULE
STRUCTURE
INTRINSIC PROPERTIES
Molar Volume
Connectivity Indices
Charge Distribution
Molecular Weight
Polar surface Area....
.......
CHEMICAL PROPERTIES
pKa
Log P
Solubility
Stability
BIOLOGICAL PROPERTIES
Activity
Toxicity
Biotransformation
Pharmacokinetics
Molecule Descriptors
o Molecular descriptors are numerical values
that
characterize properties of molecules.
o The descriptors fall into Four classes .
a) Topological
b) Geometrical
c) Electronic
d) Hybrid or 3D Descriptors
Classification of Descriptors
Topological Descriptors
Topological descriptors are derived directly from the connection table
representation of the structure which include:
a) Atom and Bond Counts
b) substructure counts
c) molecular connectivity Indices (Weiner Index , Randic Index, Chi Index)
d) Kappa Indices
e) path descriptors
f) distance-sum Connectivity
g) Molecular Symmetry
Geometrical Descriptors
Geometrical descriptors are derived from the threedimensional representations and include:
a) principal moments of inertia,
b) molecular volume,
c)solvent-accessible surface area,
d) Charged partial Surface area
e) Molecular Surface area
Electronic Descriptors
Electronic descriptors characterize the molecular
Strcutures with such
quantities :
a) dipole
moment,
b) Quadrupole moment,
c) polarizibility,
d) HOMO and LUMO energies,
e) Dielectric energy
f) Molar Refractivity
Limit Of Descriptors
The
The
in
tool
http://rguha.net/code/java/cdkdesc.html
POWER MV
http://nisla05.niss.org/PowerMV/?
q=PowerMV/
MOLD2
http://www.fda.gov/ScienceResearch/Bioi
nformaticsTools/Mold2/default.htm
PADEL Descriptor
http://www.downv.com/Windows/installPaDEL- Descriptor-10439915.htm
Bioavailability
The Bioavailability of a compound is
classified as :
Bioavailability
Liver
Absorbtion
Permeability
Lipophilicity
Hydrogen
Bonding
Metabolism
Gut-wall
Metabolism
Solubility
Molecular
Size/Shape
Transporters
Flexibility
PREDICTION OF
ADMET PROPERTIES
Requirements
for a drug:
Absorption, Distribution,
metabolism, Excretion (Elimination),
Toxicity
absorption or permeation is
more likely when:
MW > 500
LogP >5
More than 5 H-bond donors (sum of
OH and NH groups)
More than 10 H-bond acceptors (sum
of N and O atoms)
o
o
Drugs that act on the CNS need to be able to cross the BBB in order to reach
their target, while minimal BBB penetration is required for other drugs to prevent
CNS side effects.
A common measure of BBB penetration is the ratio of drug concs in the brain
and the blood, which is expressed as log (C brain /Cblood ).
Van de Waterbeemd and Kansy were probably the first to correlate the PSA of a
series of CNS drugs to their membrane transport. They obtained a fair correlation
of brain uptake with single conformer PSA and molecular volume descriptors.
Clark etal. Derived a model of 55 compounds using TPSA and LogP
LogBB= 0.516-0.115* TPSA
n= 55 r2 =0.686 r= 0.828 = 0.42
TPSA in combiantion with ClogP
LogBB= 0.070-0.014*TPSA+0.169*ClogP
n=55 r2 =0.787 r=0.887 =0.35
Great majority of orally administered CNS drugs have a PSA <70 2 . Non CNS
compounds suggested that these have a PSA < 120 2 .
Thus to conclude a majority of the Non CNS penetrating and orally absorbed
compounds have PSA values between 70 and 120 A 2.
Partition coefficients
P
Xaqueous
Xoctanol
[X]octanol
[X]aqueous
P is a measure of the relative affinity of a molecule for the lipid and aqueous phases in
the absence of ionisation.
1-Octanol is the most frequently used lipid phase in pharmaceutical research. This
is because:
It has a polar and non polar region (like a membrane phospholipid)
Po/w is fairly easy to measure
Po/w often correlates well with many biological properties
It can be predicted fairly accurately using computational models
Calculation of logP
LogP for a molecule can be calculated from a sum of fragmental
or atom-based terms plus various corrections.
logP = fragments + corrections
H
Branch
O
H
H
H
H
C
H
C
C H
H
H H
O
H
Phenylbutazone
C H
H C
C
H
C
C
H
Value
FRAGMENT | # 1 | 3,5-pyrazolidinedione
ISOLATING |CARBON| 5 Aliphatic isolating carbon(s)
ISOLATING |CARBON| 12 Aromatic isolating carbon(s)
EXFRAGMENT|BRANCH| 1 chain and 0 cluster branch(es)
EXFRAGMENT|HYDROG| 20 H(s) on isolating carbons
EXFRAGMENT|BONDS | 3 chain and 2 alicyclic (net)
-3.240
0.975
1.560
-0.130
4.540
-0.540
RESULT
clogP 3.165
logP
Binding to
enzyme /
receptor
Aqueous
solubility
Binding to
P450
metabolising
enzymes
Absorption
through
membrane
Binding to
blood / tissue
proteins
less drug free
to act
Binding to
hERG heart
ion channel
-cardiotoxicity
risk
Admet Descriptors
Calculation Tools
PreADMET http://preadmet.bmdrc.org/
Molinspiration Cheminformatics
www.molinspiration.com/seruices/index.
Calculation of molecular properties relevant to drug design and QSAR,
including log P, polar surface area, Rule of Five parameters, and druglikeness index
Pirika - www.pirika.com
Calculation of various types of molecular properties, including boiling point,
vapor pressure, and solubility; web demo restricted to only aliphatic
molecules
Actelion -www.actelion.com/page/property_explorer
Calculation of molecular weight, logP, solubility, drug-score and toxlcity
risk .
Virtual Screening
STRUCTURE-BASED VIRTUAL
SCREENING
Protein-Ligand
Docking
Aims to predict 3D structures when a
molecule docks to a protein
Protein-Ligand Docking
Methods
Modern
Scoring
Structure-Based Virtual
Screening: Other Aspects
Computationally
hypothesis of
the critical features of a ligand.Standard features include H-bond donors and
acceptors, charged groups,and Hydrophobic patterns.The hypothesis can be used
to screen databases for compounds and to refine existing leads .
By comparing the volume of the active and the inactive compounds, a common
volume can be constructed in order to approximate the shape of the (unknown)
receptor site to further refine the pharmacophore model and to screen out
additional compounds.
3D compound
Structures
Feature
Analysis
Set of
Conformers
comp
are
Pharmacophore
Modelling
Workflow
Pharmacophore
validat
ion
Application
Align to
template
Continued.......
b)QSAR:
Pharmacophore
The two structures above are less similar chemically (topologically) yet have the same
pharmacological activity, namely they both are Angiotensin-Converting Enzyme (ACE)
inhibitors
Molecular similarity
How to calculate it?
Quantitative assessment of similarity/dissimilarity of structures
need a numerically tractable form
molecular descriptors, fingerprints, structural keys
Sequences/vectors of bits, or numeric values that can be compared by
distance functions, similarity metrics .
E= Euclidean distance
T = Tanimoto index
E ( x, y )
x
i 1
yi
T ( x, y )
B( x & y )
B( x) B( y ) B( x & y )
Molecular descriptors
a) chemical fingerprint
hashed binary fingerprint
o encodes topological properties of the chemical graph: connectivity,
edge label (bond type), node label (atom type)
o allows the comparison of two molecules with respect to their
chemical structure
Construction
1. find all 0, 1, , n step walks in the chemical graph
2. generate a bit array for each walks with given number of bits set
3. merge the bit arrays with logical OR operation
Molecular descriptors
Example 1: chemical fingerprint
Example
CH3 CH2 OH
walks from the first carbon atom
length walk
bit array
1010000000
CH
0001010000
CC
0001000100
CCH
0001000010
CCO
0100010000
3
CCOH
0000011000
merge bit arrays for the first carbon atom: 1111011110
This example illustrates how a 10 bits long topological chemical fingerprint is
created for a simple chain structure. In this example all walks up to 3 steps are
considered, and 2 bits are set for each pattern.
Molecular Similarity
Example 1: chemical fingerprint
0100010100010100010000000001101010011010100000010100000000100000
0100010100010100010000000001101010011010100000000100000000100000
Molecular descriptors
Example 2: pharmacophore fingerprint
encodes pharmacophore properties of molecules as frequency
Construction
1. map pharmacophore point type to atoms
2. calculate length of shortest path between each pair of atoms
3. assign a histogram to every pharmacophore point pairs and count
Molecular descriptors
Example 2: pharmacophore fingerprint
Pharmacophore point type based
coloring of atoms: acceptor, donor,
hydrophobic, none.
12
12
11
11
10
10
A A A A A A D D D D D D D D D D D D H H H H H H H H H H H H H H H H H H
A A A A A A A A A A A A D D D D D D A A A A A A D D D D D D H H H H H H
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
A A A A A A D D D D D D D D D D D D H H H H H H H H H H H H H H H H H H
A A A A A A A A A A A A D D D D D D A A A A A A D D D D D D H H H H H H
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
query fingerprint
query
proximity
targets
0000000100001101000000101010000000000110000010000100001000001000
0100010110010010010110011010011100111101000000110000000110001000
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0100011100011101000100001011101100110110010010001101001100001000
0101110100110101010111111000010000011111100010000100001000101000
0100010100111101010000100010000000010010000010100100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010011000000000000000000010100000010000000000000000000
0100010100010100000000000000101000010010000000000100000000000000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
0000000100000000010000100000000000001010100000000100000100100000
0100010100010100000000100000000000010000000000000100001000011000
0001000100001100010010100000010100101011100010000100001000101000
0100011100010100010000100001001110010010000010001100000000101000
0101010100010100010100100000000000010010000010010100100100010000
target fingerprints
hits
Hypothesis Fingerprints
Advantages
strict conditions for hits if
actives are fairly similar
Disadvantages
false results with
asymmetric metrics
misses common features of
highly diverse sets
very sensitive to one
missing feature
SUMMARY
Virtual
Increasing
Idea of Datamining
Is
Canonical learning
Problems
Supervised
Datamining Methods
Substructural
Analysis
Discriminant algorithms
The aim of discriminant analysis is try to
separate the molecules into constituent classes.
The simplest Linear discriminant which in case of
two activity class and two descriptors which aim
to find a st. line that separates data such that
maximum number of compounds are classified.
If more than variable uses the line become
hyperplane.
The idea is to express a class as a linear
combination of attributes.
X= w0+w1a1+w2a2+w3a3+.........
Neural Networks(NN)
Neural Networks
Continued...
Disadvantage of Neural
Networks
Its
DECISION TREES(DT)
In a DT one start at the root node and follows the edge with
appropriate first rule. This continues until a terminal node is
reached at which point one can assign the molecule into
active and inactive class.
Support Vector
Machines(SVM)
SVM continued....
Weaknesses
Need
Measuring Classifier
Performance
N= total number of instances in the dataset
TPj= Number of True Positives for class j
FPj = Number of False positives for class j
TNj= Number of True Negatives for class j
FNj= Number of False Negatives for class j
Accuracy =
Sensitivity/recall =
Specificity/precision =
Types of Datamining
learning
Classification- learning-the learning scheme
Process
in
Weka
is presented with a set of classified examples from
which it is expected to learn a way of classifying
unseen examples.
Association
Learning-any association
Numeric
prediction-the outcome to be
predicted
is not a discrete class but a numeric quantity.
Classifier Algorithms in
WEKA
a)Bayes Classifier
AODE
BAYES NET
NAVE BAYES
NAVE BAYES MULTINOMIAL
NAVE BAYES UPDATABLE
c) Functions
LINEAR REGRESSION
LOGISTIC
MULTILAYERD PERCEPTRON
RBF NETWORK
SIMPLE LINEAR REGRESSION
SIMPLE LOGISTIC
SMO,SMO REG.
b)Trees
ADTREE
ID3
J48
LMT
NB5TREE
RANDOM FOREST
RANDOM TREE
REP TREE
d)Rules
CONJUCTIVE RULE
DECISION TABLE
JRIP
M 5RULES
NNGE
ONE R
PRISM
ZERO R
Summary
Machine learning is mainly applied to ligand-based
drug screening and it is applied to the calculation
of the optimal
distance between the feature
vectors of active and inactive compounds.
A kernel is essentially a similarity function with
certain mathematical properties, and it is possible
to define kernel functions over all sorts of
structures for example, sets, strings, trees, and
probability distributions .
Interest in neural networks appears to have
declined since the arrival of support vector
machines, perhaps because the latter generally
require fewer parameters to be tuned to achieve
the same (or greater) accuracy.
THANK YOU