Sunteți pe pagina 1din 54

Prediction of acute aquatic toxicity

in Tetrahymena pyriformis – A
knowledge based system approach

Dr Martin Payne
Lhasa Limited
Plan

Development of Eco-Derek: alerts and log P based rules

Eco-Derek 1.0.0 – Illustration

Results of “validation” studies of Eco-Derek performance


(internal and external test sets)

Limitations and potential improvements to Eco-Derek


modelling

Summary
Plan

Development of Eco-Derek: alerts and log P


based rules

Eco-Derek 1.0.0 – Illustration

Results of “validation” studies of Eco-Derek performance


(internal and external test sets)

Limitations and potential improvements to Eco-Derek modelling

Summary
“Eco-Derek” – Health Warning

Eco-Derek is only a “proof-of-concept” experimental piece


of software with provisional alerts and rules.
It is not fully “quality assured” or tested.
It is not comparable with Derek (or Derek Nexus) in
respect to the maturity (chemical scope) or quality of its
knowledge base.
The released version lacks reporting or batch processing
facilities.
However, it is free and will be available from the
“inchemicotox” project website (July 2012).
• http://www.inchemicotox.org/software-
downloads/
Method (1): Toxicity in T. pyriformis

Toxicity model: 40 hour static flow growth inhibition assay


(log(1/IGC50) values) for the ciliated protozoan Tetrahymena
pyriformis
• Schultz TW et al. Toxicol. Methods 7: 289-309 (1997) or
http://www.vet.utk.edu/TETRATOX/index.php

Published data on over 1200 chemicals:


• Xue et al. Chem. Res. Toxicol. 19, 1030-1039 (2006), and
for alert and CLogP-SAR rule refinement:
• CADASTER challenge datasets www.CADASTER.eu
• A few papers e.g. Roberts et al Chem. Res. Tox. 23
228-234 (2010) (haloaliphatics)

Data storage and SAR analysis:


• Structurally searchable database using Accord for Excel 6.1
(Accelrys, Microsoft Inc.)
• Log P calculated by CLogP (Biobyte)
Method (2):
Potency rules for non-polar narcosis (NPN)
Baseline toxicity (non-polar narcosis) calculated from
log (1/IGC50 NPN) = 0.78 log P - 2.01
(n = 87, r2 = 0.96)
Ellison CM et al. SAR QSAR Environ. Res. 19, 751-783 (2008).
Rules:
If Log P ≤ -0.115 then NPN is very low (<-2.1*)
If -0.115 < Log P ≤ 1.68 then NPN low (-2.1 to -0.7*)
If 1.68 < Log P ≤ 3.475 then NPN is moderate (-0.7 to 0.7)
If 3.475 < Log P ≤ 5.27 then NPN is high (0.7 to 2.1)
If Log P > 5.27 then NPN is very high (> 2.1)
*In practice, due to non-linearity of baseline toxicity vs LogP, better
agreement with experimental results is obtained for the definitions:
Low = -2 to 0, and Very low < -1
Method (3): SAR analysis

Excess toxicity factor Te defined as:


Te = (1/IGC50 experimental) / (1/IGC50 non-polar narcosis, calculated)

Data divided into subsets containing potential structural alerts,


mostly identified by at least one member with log Te > 2.
• Development guided by knowledge of structure-reactivity relationships, and
Derek SARs for skin sensitisation and other endpoints

One alert for polar narcosis, for mono-substituted anilines and


phenols, with potency levels determined by:
.-.-. log (1/IGC50 polar narcosis, calculated) = 0.588 log P – 0.939
(from Schultz et al. Sci. Total Environ. 109-110 (1991), pp.
569-580).
Toxicity to T. pyriformis (Xue et al. dataset) vs CLogP
Ellison et al.’s NPN regression equation shown
(SAR QSAR Environ. Res. 19, 751-783 (2008).)

__ log (1/IGC50 non-polar narcosis, calculated) = 0.78 log P - 2.01


Toxicity to T. pyriformis (Xue et al. dataset) vs CLogP
Ellison et al.’s NPN and Schultz et al PN regression equations shown
(SAR QSAR Environ. Res. 19, 751-783 (2008), Sci. Total Environ. 109-110 (1991), pp.
569-580.)

__ log (1/IGC50 non-polar narcosis, calculated) = 0.78 log P - 2.01


.-.-. log (1/IGC
50 polar narcosis, calculated) = 0.588 log P – 0.939
Excess toxicity, Log Te = log10Expt/NPN(calc)
(data from Xue et al 2006, -4 < CLogP < 8)

Log Te vs. CLogP


Examples, Log Te > 2

Examples of chemicals of high excess toxicity


showing values for Log Te, log(1/IGC50 (mM)) and
CLogP values
Original chemical classes, with significant Log Te
(beta-haloethers incorrect)
Method (4): SAR analysis

Where practicable, broad structural alerts were split up to


give, “isoreactive” groups:. For example:
Alpha, beta-unsaturated carbonyl compounds divided into:
– Alpha,beta-unsaturated esters,
– Alpha,beta-unsaturated ketones ,
– Alpha,beta-unsaturated amides etc
These were further divided into those with zero, one or two
substituents present at the alpha or beta-unsaturated
carbons

Log P dependence of the toxicity of subsets with individual


structural alerts was examined
• “Reasoning” rules written expressing potency as a function
of Log P
Example of LogP Dependence Rules:
alpha, beta-Unsaturated Aldehydes

High toxicity (0.7-2.1) for CLogP > 0.5,


Moderate toxicity (-0.7 to 0.7) for LogP
≤ 0.5 (extrapolation)
Result: “Eco-Derek 1.0.0”

45 active alerts for T. pyriformis


Each alert has a full set of comments
Where possible, for each alert there are approximate
rules expressing a CLogP (Kow) dependence
Two alerts for cholinesterase inhibition in fish (OPs and
carbamates) have been added.
Some alerts are based on very limited data and are
“provisional” (e.g. alerts for epoxides, nitroalkanes,
secondary allyl or benzyl halides and halopyrimidines).
Eco-Derek runs under Microsoft Windows XP (Service
Pack 2), Windows 2000 (Service Pack 4), Windows Vista
or Windows 7. Structures can be drawn (ISIS-Draw) or
imported as mol files.
Plan

Development of Eco-Derek alerts and log P based rules

Eco-Derek 1.0.0 – Illustration

Results of “validation” studies of Eco-Derek performance (internal and


external (CADASTER) test sets)

Limitations and potential improvements to Eco-Derek

Summary
Eco-Derek Example (2-Bromo-4,6-dinitroaniline)
Log(1/IGC50) = 1.24, cf. predicted high (0.7-2.1)
Eco-Derek Example (2-Bromo-4,6-dinitroaniline)
Reasoning summary for prediction of high toxicity
Plan

Development of Eco-Derek alerts and log P based rules

Eco-Derek 1.0.0 – Illustration

Results of “validation” studies of Eco-


Derek performance (internal and external
test sets)

Limitations and potential improvements to Eco-Derek modelling

Summary
Toxicity to T. pyriformis (Xue et al. dataset)
(Mean toxicity is approx. 0.2 across all data)

Log (1/IGC50) vs. CLogP


Toxicity to T. pyriformis (Xue et al. dataset)
(337 Eco-Derek alerting substances in bold)

Log (1/IGC50) vs. CLogP


Toxicity to T. pyriformis (Xue et al. dataset)
(684 Eco-Derek non-alerting substances in bold)

Log (1/IGC50) vs. CLogP


Excess toxicity, log10Expt/NPN(calc), T. pyriformis
(Xue et al 2006, -4 < CLogP < 8, Mean = 0.71)

Log Te vs. CLogP


Excess toxicity, log10Expt/NPN(calc), T. pyriformis
(Xue et al 2006, -4 < CLogP < 8, alerting structures in bold)

Log Te vs. CLogP


Excess toxicity, log10Expt/NPN(calc), T. pyriformis
(Xue et al 2006, -6 < CLogP < 8, non-alerting structures in bold)

Log Te vs. CLogP


“Internal Validation”
Performance of Eco-Derek [cf. NPN model]

Xue et al 2006
Experimental log(1/IGC50) classified according to: very high ≥ 2.1, high ≥ 0.7-
2.1, moderate ≥ -0.7 to 0.7, low ≥ -2.1 to -0.7 and very low <-2.1:

Table: Proportions of experimental potencies correctly predicted:

% Eco-Derek cf. [% NPN model]


(number of chemicals), * indicates Eco-Derek alert(s) present
Performance of Eco-Derek - Improved
definitions of “low” and “very low”

Xue et al 2006
Eco-Derek predictions defined according to log(1/IGC50(mM)): very high ≥
2.1, high ≥ 0.7-2.1, moderate ≥ -0.7 to 0.7, low ≥ -2.1 to -0.7 or low ≥ -2 to
0.0 (%)# and very low <-2.1 or <-1 (%)#:

Table: Frequency of Eco-Derek potencies correctly predicted:


(number of chemicals), * indicates Eco-Derek alert(s) present
Results from an Eco-Derek “Internal”
Validation using the Xue et al training set
Over 70% accuracy for chemicals of high, moderate or low toxicity,
but only about 30% accuracy for chemicals of very high toxicity.
More under-prediction than over-prediction
Under-prediction of toxicity attributable to:
• (a) insufficient coverage of “polar narcosis”
• (b) non-linearity of logP relationships
• (c) too broad a scope of certain reactivity-based structural alerts
set to the level of the mean toxicity for the class
• (d) the need for additional alerts to cover classes and mechanisms
associated with only moderate levels of toxicity and lower excess
toxicity.
• (e) experimental error (usually < 0.5)
• (f) alert or rule errors, logP and speciation errors
Over-prediction due to b, c, e, f and issues with applicability domains
CADASTER Challenge Training dataset – lessons learnt:
greater discrimination of substituent effects

644 compounds: Chemically similar to Xue et al dataset and a similar


Eco-Derek performance achieved.
Beta-lactones and 2-nitroanilines are potential new alerts
Isothiocyanates: 12 in dataset, 10 fired alert. Diphenyl-
isothiocyanatomethane, very high tox (3.05) predicted high, cf. 2,2-
Dimethylpropyl isothiocyanate moderate tox (0.35) predicted high.
Correlations with reactivity rates have been found (Schultz et al 2005)
Very high poorly predicted, e.g. tetrachlorophenols and
pentafluoronitrobenzene were very high, but predicted high
Compounds such as dichloro-2-nitroaniline of moderately high toxicity
(1.66) or even 3,5-Dichlorophenol (1.56), but predicted only moderate
– substituent effects and polar narcosis models require improvement.

Tox = 3.05 Tox = 0.35


Known CADASTER Test set – example of lessons learnt:
alert modification and new alerts

Underprediction for several due to absence of very high toxicity


prediction for several alerts
alpha-Chlorocinnamaldehyde Tox = 1.73, cf. predicted 0.0, due to 910
not firing (chlorine enhances toxicity cf. methyl reduces it)
3,5-Dinitroaniline (CLogP = 1.12) fires alert 918 for dinitrobenzenes and
914 for 4-Nitro-, 4-cyano- or trichloro-anilines. Expt tox is 0.94, cf. -0.7
to 0.7 (moderate) predicted. Potential underestimation of additional
toxicity derived from the “acidic” aniline from uncoupling of oxidative
phosphorylation? 914 may require division according to substituent
types.

New Alerts for other reactive groups. E.g. thiocyanates?


benzyl thiocyanate, tox =1.5, cf. moderate (0.0) predicted
Test Set: Cadaster Blind Test Set (All)
Log (1/IGC50, mM) vs CLogP

Data from Environmental Toxicity Prediction Challenge


http://www.cadaster.eu/node/65
Toxicity to T. pyriformis
CADASTER Blind Test set, Alerting structures in bold

Log (1/IGC50) vs. CLogP


“External” Validation (CADASTER blind test set)

Significantly poorer performance, particularly for chemicals of


very high toxicity. About 60% correctly predicted, 40%
underestimated.
Eco-Derek predictions defined according to log(1/IGC50(mM)): very high
≥ 2.1, high ≥ 0.7-2.1, moderate ≥ -0.7 to 0.7, low ≥ -2.1 to -0.7 or low ≥
-2 to 0.0 ( ) # and very low <-2.1 or <-1 ( )#:

Table: Proportions of experimental potencies correctly predicted (by Eco-Derek


categories) (CADASTER blind test set)
Compounds with large errors for Eco-Derek (ED) (and the
machine learning approaches?)
Quantifying Prediction Errors in Log(1/IGC50) from
Eco-Derek Potency Classes

Approximate substitution of Eco-Derek predicted potency ranges


for log(1/IGC50) by single values:

Very high (>2.1) = 2.8


High (0.7-2.1) = 1.4
Moderate (-0.7 to 0.7) = 0.0
Low (-2.1 to -0.7)= -1.4
Very low (<-2.1) = -2.8

Values for low and very low, later replaced by mean


experimental values -0.75 and -1.56 (narcosis mechanisms
dominant)

Reminder: For the NPN model, very high CLogP: > 5.21, high:
3.475-5.21, moderate: 1.68-3.475, low: -0.115-1.68 and very
low: <-0.115 respectively.
NPN Model Error: Excess toxicity (Log Te = Log10Expt/NPN(calc.))
Xue et al 2006, All 1121, RMS Error = 1.04, M Error = +0.71

Log Te vs. CLogP


Eco-Derek Error: Log10Expt/Eco-Derek* vs CLogP
Xue et al 2006, r.m.s. error= 0.71, mean abs error = 0.56, mean error = +0.28

log10(Expt/Eco-Derek Predicted) vs. CLogP


* Based on Eco-Derek predicted log(1/IGC50): very high = 2.8, high =
1.4, moderate = 0.0, low = -1.4 and very low <-2.8. For NPN, CLogP:
> 5.21, 3.475-5.21, 1.68-3.475, -0.115-1.68 and <-0.115 respectively.
Adjusted Eco-Derek Error: Log10Expt/Eco-Derek* vs CLogP
Xue et al 2006, r.m.s. error = 0.56, mean abs error =0.46, mean error = 0.06

log10(Expt/Eco-Derek Predicted) vs. CLogP


* Based on Eco-Derek predicted log(1/IGC50): very high = 2.8, high = 1.4,
moderate = 0.0, low = -0.75 and very low = -1.56.
Eco-Derek Error: Log10Expt/Eco-Derek* vs CLogP
Xue et al 2006, 337 alerting chemicals, RMSE = 0.59, ME = -0.05

log10(Expt/Eco-Derek Predicted) vs. CLogP


* Based on Eco-Derek predicted log(1/IGC50): very high = 2.8, high =
1.4, moderate = 0.0, low = -1.4 and very low <-2.8.
Adjusted Eco-Derek Error: Log10Expt/Eco-Derek* vs CLogP
Xue et al 2006, 337 alerting chemicals, RMSE = 0.59, ME = -0.06

log10(Expt/Eco-Derek Predicted) vs. CLogP


* Based on Eco-Derek predicted log(1/IGC50): very high = 2.8, high = 1.4,
moderate = 0.0, low = -0.75 and very low = -1.56.
Eco-Derek Error: Log10Expt/Eco-Derek* vs CLogP
Xue et al 2006, non-alerting chemicals, RMSE = 0.76, ME =+0.43

log10(Expt/Eco-Derek Predicted) vs. CLogP


* Based on Eco-Derek predicted log(1/IGC50): very high = 2.8, high =
1.4, moderate = 0.0, low = -1.4 and very low <-2.8. For NPN, CLogP:
> 5.21, 3.475-5.21, 1.68-3.475, -0.115-1.68 and <-0.115 respectively.
Adjusted Eco-Derek Error: Log10Expt/Eco-Derek* vs CLogP
(Xue et al 2006, non-alerting chemicals, RMSE = 0.56, ME =+0.11

log10(Expt/Eco-Derek Predicted) vs. CLogP


* Based on Eco-Derek predicted log(1/IGC50): very high = 2.8, high = 1.4,
moderate = 0.0, * low = -0.75 and very low = -1.56. For NPN, CLogP:
> 5.21, 3.475-5.21, 1.68-3.475, -0.115-1.68 and <-0.115 respectively.
Xue et al dataset: Eco-Derek* vs Experimental Toxicity
RMS error = 0.56, mean error = 0.06,

Predicted:
Pred. Very high

High

Moderate

Low

Very low

Actual toxicity, log(1/IGC50(mM)), 1: 1 line shown


Log(1/IGC50) = 0.923 Eco-Derek*+0.071, r2 = 0.92)
CADASTER Challenge: Eco-Derek vs Experimental Toxicity
T. pyriformis CADASTER Blind Test Set
RMS error = 0.965, Cf. 0.56 for Known Test Set
Predicted:
Pred. Very high

High

Moderate

Low

Very low

Actual toxicity, log(1/IGC50(mM))


CADASTER Challenge 2009: Joint Winner, Predicted vs
experimental toxicity to T. pyriformis
RMS error = 0.741 cf. 0.353 for Known Test Set

Pred.

Gavin Cawley, UEA


Norwich UK

Actual toxicity, log(1/IGC50(mM))


Compounds with large errors for Eco-Derek (ED) (and the
machine learning approaches?)
Knowledge Based versus Machine Learning Methods

KB systems explain the basis of their predictions (transparency),


indicate uncertainties and give additional useful experimental data and
information on mechanism. Machine learning results are less easily
interpretable:
KB systems present a basis for deciding whether to accept a
prediction or to find another model, or test!

However KB systems can express confidence in predictions and


potentially provide distinctions between where knowledge or
understanding is well established, sparse or absent.
For example:
• In Derek: probable, plausible or equivocal toxicity
• In Meteor: metabolites are probable, plausible, equivoval doubted or improbable
• Predictive space and model reliability domain assessments used in a recent Derek
Nexus application.
Plan

Development of Eco-Derek alerts and log P based rules

Eco-Derek – Illustrations

Results of “validation” studies of Eco-Derek performance (internal and


external test sets)

Limitations and potential improvements to


Eco-Derek modelling

Summary
Eco-Derek: Scientific Limitations

Only a moderate number of simple broad alerts, some closely


related, limits the structural applicability domain.

Specific biochemical, receptor, enzyme, or ion-channel mediated


mechanisms of toxicity receive little attention.

The log P - toxicity relationships, particularly for low log P


values, are approximate and require greater validation.

Polar narcosis contribution included only for mono-substituted


anilines or phenols.
• Could be extended in scope
Potential improvements to Eco-Derek

More alerts !
E.g. for other reactive or pro-reactive entities (including
1,3-dihydroxy/amino benzenes, α,β-acetylenic alcohols)
Splitting of broad alerts, such as 906, (quinones,
quinone-imines and precursors) and 902 (SNAr activated
arylating benzenes), by substituent type and position into
smaller classes covering similar reactivity, metabolism
and chemistry, distinguishing more accurately very high,
high and moderate toxicity.
Revised narcosis models and new definitions of low and
very low potency in Eco-Derek documentation
Potential improvements to Eco-Derek

System improvements (unlikely without further funding):


• Quantitative reporting of the baseline toxicity (NPN) value (like skin
permeability coefficient in Derek).
• Batch processing
• Electronic reports
Overall Summary

Structure-activity relationships for 40 hour toxicity in T. pyriformis


including a log P dependence have been implemented in a “Eco-
Derek” knowledge base with toxicity expressed semi-
quantitatively in potency classes.

“Eco-Derek” is easy to use and free from


http://www.inchemicotox.org/ or myself.

Eco-Derek 1.0.0 has a limited coverage of alerts and toxicological


mechanisms but can provide useful information and
understanding.
• It will make the most accurate predictions for industrial chemicals and not for,
the more complex, pharmaceuticals or agrochemicals.
Acknowledgements

Defra, UK (The early development of Eco-Derek was performed as


part of a project sponsored by Defra through the Sustainable
Arable LINK Programme)

Prof. Mark Cronin, Liverpool John Moores University, UK

My colleagues at Lhasa Limited, notably:


• Rob Toy and Bill Button, Software Development
• Dr Philip Judson (Judson Consulting Service)
• Dr Chris Barber, Director of Science

Dr Mark Hewitt, Liverpool John Moores University, UK


WCA Environment, Faringdon, UK (for website)
http://www.inchemicotox.org/
Questions

Email: martin.payne@lhasalimited.org

Eco-Derek downloads from


http://www.inchemicotox.org/software-
downloads/

S-ar putea să vă placă și