Comp150 18 Cheminformatics

Data Visualization in
Cheminformatics
Simon Xi
Computational Sciences CoE
Pfizer Cambridge
My Background
Professional Experience
Senior Principal Scientist, Computational Sciences CoE, Pfizer
Cambridge
9-year experience in pharmaceutical research with a focused on
developing cheminformatics and bioinformatics applications for
research scientists
Education
MSc in Molecular Cell Biology in UTDallas
MSc in Software Engineering in SMU
Finishing Ph.D in Bioinformatics in Boston University
What we will cover today
Introduction to drug discovery

Cheminformatics basics
Encoding of the chemical structures
Visualizing data and structures
Design and optimization of compound library
A case study
The Billion Dollar Molecules

Drug Name
Lipitor
Nexium
Advair
Prevacid
Plavix
Singulair
Seroquel
Effexor
Norvasc
2006 WorldWide Sales

$14,385M
$5,182M
$6,129M
$3,425M
$6,057M
$3,579M
$3,560M
$3,722M
$4,866M
Primary Use
cholesterol
heartburn
asthma
heartburn
anticoagulant
asthma
depression
depression
hypertension
Lipitor 14 billion
annual sales
Industry Productivity vs. Investment

The Challenge
Total R&D Investment ($ Billions)
NME/$
$25
$20
$15
# NMEs
$10
$5
Source: PhRMA annual survey, 2000
20
00
19
98
19
96
19
94
19
92
19
90
19
88
19
86
19
84
19
82
19
80
19
78
19
76
19
74
19
72
19
70
$0
60
40
20
0
Nature Reviews Drug Discovery 3, 451-456 (2004)
~100
~100 Discovery
Discovery Approaches
Approaches
Attrition On The R&D Process
Millions of
Compounds Screened
Preclinical
Pharmacology
Preclinical Safety
1-2
Products
Clinical Pharmacology
& Safety
Discovery
Exploratory Development
Phase I
Idea
Full Development
Phase II
Phase III
10
11 - 15 Years
15
Drug
Nat Rev Drug Discov. 2007 6:636-49.
What is Chemoinformatics?
Use of computer and informational techniques,
applied to a range of problems in the field of chemistry.
These in silico techniques are commonly used in
pharmaceutical companies in the process of drug
discovery.
Chemistry is a visual science. Data visualization is a
key component of cheminformatics.
What is Chemoinformatics?
Encoding Chemical Structures

SD format
Lipitor
Atoms
Bonds
SMILES format
CC(C)C1=C(C(=O)NC2=CC=CC=C2)C(
C3=CC=CC=C3)=C(N1CCC(O)CC(O)C
C(O)=O)C4=CC=C(F)C=C4
Representing Structure as Fingerprints
010 0 100 0 1001 00000 1 00
Compound Similarity Search
Compound Properties/Descriptors
1D, 2D, 3D, multi-dimensional properties
1D: Molecular Weight, clogP, #of Atoms,

charge, #H-Bond donors and acceptors
2D: Atom pairs, substructures functional groups
3D: Shape, pharmacophores
nD: Fingerprints, etc..

3D
Chemical series compounds sharing
the same core structures
Series Classifications
Wards Clustering
Iteratively merging a pair of

nodes until all nodes are
merged.
At each merging step, two
nodes that give minimal
variance are chosen and merged
into one new node.
Once the tree hierarchy is
generated, clusters can be
defined by cutting the tree at
certain dissimilarity threshold
What makes a drug?

Primary pharmacology
In vitro potency
Cell based potency
Functional assays
Selectivity against other targets
Toxicity Properties
Inhibition of CYP450 isozymes
PXR transactivation
Human hepatocyte toxicity
Mutagenicity
Mitochondria toxicity
Covalent protein binding
Inhibition of HERG
ADME/Physicochemical Properties
Solubility
Chemical stability
Hydrophobicity/hydrogen bonding
potential
Intestinal mucosal cell permeation
Liver and kidney clearance
Metabolism
Transporters
Charge
Size
Protein binding
Blood-brain barrier permeation
Target cell permeation
Drug-Likeness: Rule of Five

Proposed by C. Lipinski to describe drug-like molecules.
Molecules displaying good oral absorption and /or distribution
properties are likely to possess the following characteristics:
Molecular Weight < 500
logP < 5.0
H-donors < 5
H-acceptors (number of N and O atoms) < 10
Data Visualization
Grid View
Table View
Plot View
Heatmap View
Software Relevance
Software Usability
Software
Management
Building Predictive Models using Machine

Learning Techniques
Use computational models to understand Structure-Activitive

Relationship (SAR)
Use computational models to run virtual screen to guide

compound selection for synthesis
Interpretability of Predictive Models

The good part
Can we derive this for non-linear models?
The not so
good part
Multiple Parameter Optimization in

Combinatorial Library Design
Given a 100x100x100 virtual library space and a set of
predictive models for various properties (e.g. potency,
ADME, selectivity), select the best 300 compounds for
synthesis with the highest probability of being potent and
drug-like and with diverse sampling of the chemical
space
R3
N
N
R1
N
R2
For example diaminopyrimidine library
The problem of Multiple Parameters Optimzation
The chemical space is huge
Predictive models are not very

predictive
Many parameters to optimize and

sometime contradictory to each other
MPO a case study with kinase selectivity

~200 cmpds from a library tested against 40
kinases, can we design another 100 cmpds that
are highly selective
F F
F
N
N
R1
N
R2
Identify compounds with desired seletivity

profile in the expanded virtual chemical space
Trifluoro-diaminopyrimidine
series (~200 cmpds)
Virtual Library Profile
Tested compounds
de
Mo
FW
R1
ng
ildi
u
B
l
Solving R-groups
contribution using
linear regression
R1
R2
Predictable Virtual
Chemical Space
Only few combination RgroupKinase have been previously tested
5-50x
expansion
R1
R2
Enu
m er
atio
n
R2
Predictive models - Leave-One-Out Validations
Experimental Validation of Predictions

KSS pIC50 vs. FW pIC50
r2=0.45
r2=0.59
r2=0.92
r2=0.86
~40 cmpds in two

series were selected
for KSS testing
r2=0.74
r2=0.83
r2=0.63
r2=0.88
More promiscuous
r2=0.85
r2=0.81
r2=0.81
r2=0.85
More selective
Cheminformatics Challenges for Drug

Discovery
Information retrieval and knowledge managment - rapidly and
efficiently present all relevant data/knowledge to scientists at
the right time and right place
Predictive models - drastically improve the accuracy and
interpretability of in silico models for potency and ADME
endpoints
Computer-aided design provide easy to use software
applications to help scientists analyze/visualize their data and
make efficient use of prior knowledge during compound
designs
References
1. Agrafiotis, D. K., Lobanov, V. S. and Salemme, F. R. (2002) Combinatorial
informatics in the post-genomics ERA. Nat Rev Drug Discov. 1, 337-346
2. Lipinski, C. and Hopkins, A. (2004) Navigating chemical space for biology
and medicine. Nature. 432, 855-861
3. Paolini, G. V., Shapland, R. H., van Hoorn, W. P., Mason, J. S. and
Hopkins, A. L. (2006) Global mapping of pharmacological space. Nat
Biotechnol. 24, 805-815

Comp150 18 Cheminformatics

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Comp150 18 Cheminformatics

Încărcat de

Drepturi de autor:

Formate disponibile

Data Visualization in

What we will cover today

Introduction to drug discovery

The Billion Dollar Molecules

2006 WorldWide Sales

Industry Productivity vs. Investment

Source: PhRMA annual survey, 2000

Nature Reviews Drug Discovery 3, 451-456 (2004)

Attrition On The R&D Process

Nat Rev Drug Discov. 2007 6:636-49.

Encoding Chemical Structures

Representing Structure as Fingerprints

010 0 100 0 1001 00000 1 00

Compound Similarity Search

1D: Molecular Weight, clogP, #of Atoms,

2D: Atom pairs, substructures functional groups

3D: Shape, pharmacophores

nD: Fingerprints, etc..

Iteratively merging a pair of

What makes a drug?

Drug-Likeness: Rule of Five

Building Predictive Models using Machine

Use computational models to understand Structure-Activitive

Use computational models to run virtual screen to guide

Interpretability of Predictive Models

Can we derive this for non-linear models?

Multiple Parameter Optimization in

For example diaminopyrimidine library

The problem of Multiple Parameters Optimzation

The chemical space is huge

Predictive models are not very

Many parameters to optimize and

MPO a case study with kinase selectivity

Identify compounds with desired seletivity

Virtual Library Profile

Only few combination RgroupKinase have been previously tested

Predictive models - Leave-One-Out Validations

Experimental Validation of Predictions

~40 cmpds in two

Cheminformatics Challenges for Drug

S-ar putea să vă placă și