Prediction of Acute Aquatic Toxicity in Tetrahymena Pyriformis A Knowledge Based System Approach

Prediction of acute aquatic toxicity
in Tetrahymena pyriformis – A
knowledge based system approach
Dr Martin Payne
Lhasa Limited
Plan
Development of Eco-Derek: alerts and log P based rules
Eco-Derek 1.0.0 – Illustration
Results of “validation” studies of Eco-Derek performance

(internal and external test sets)
Limitations and potential improvements to Eco-Derek

modelling
Summary
Plan
Development of Eco-Derek: alerts and log P

based rules
Results of “validation” studies of Eco-Derek performance

(internal and external test sets)
Limitations and potential improvements to Eco-Derek modelling
Summary
“Eco-Derek” – Health Warning
Eco-Derek is only a “proof-of-concept” experimental piece

of software with provisional alerts and rules.
It is not fully “quality assured” or tested.
It is not comparable with Derek (or Derek Nexus) in
respect to the maturity (chemical scope) or quality of its
knowledge base.
The released version lacks reporting or batch processing
facilities.
However, it is free and will be available from the
“inchemicotox” project website (July 2012).
• http://www.inchemicotox.org/software-
downloads/
Method (1): Toxicity in T. pyriformis
Toxicity model: 40 hour static flow growth inhibition assay

(log(1/IGC50) values) for the ciliated protozoan Tetrahymena
pyriformis
• Schultz TW et al. Toxicol. Methods 7: 289-309 (1997) or
http://www.vet.utk.edu/TETRATOX/index.php
Published data on over 1200 chemicals:

• Xue et al. Chem. Res. Toxicol. 19, 1030-1039 (2006), and
for alert and CLogP-SAR rule refinement:
• CADASTER challenge datasets www.CADASTER.eu
• A few papers e.g. Roberts et al Chem. Res. Tox. 23
228-234 (2010) (haloaliphatics)
Data storage and SAR analysis:

• Structurally searchable database using Accord for Excel 6.1
(Accelrys, Microsoft Inc.)
• Log P calculated by CLogP (Biobyte)
Method (2):
Potency rules for non-polar narcosis (NPN)
Baseline toxicity (non-polar narcosis) calculated from
log (1/IGC50 NPN) = 0.78 log P - 2.01
(n = 87, r2 = 0.96)
Ellison CM et al. SAR QSAR Environ. Res. 19, 751-783 (2008).
Rules:
If Log P ≤ -0.115 then NPN is very low (<-2.1*)
If -0.115 < Log P ≤ 1.68 then NPN low (-2.1 to -0.7*)
If 1.68 < Log P ≤ 3.475 then NPN is moderate (-0.7 to 0.7)
If 3.475 < Log P ≤ 5.27 then NPN is high (0.7 to 2.1)
If Log P > 5.27 then NPN is very high (> 2.1)
*In practice, due to non-linearity of baseline toxicity vs LogP, better
agreement with experimental results is obtained for the definitions:
Low = -2 to 0, and Very low < -1
Method (3): SAR analysis
Excess toxicity factor Te defined as:

Te = (1/IGC50 experimental) / (1/IGC50 non-polar narcosis, calculated)
Data divided into subsets containing potential structural alerts,

mostly identified by at least one member with log Te > 2.
• Development guided by knowledge of structure-reactivity relationships, and
Derek SARs for skin sensitisation and other endpoints
One alert for polar narcosis, for mono-substituted anilines and

phenols, with potency levels determined by:
.-.-. log (1/IGC50 polar narcosis, calculated) = 0.588 log P – 0.939
(from Schultz et al. Sci. Total Environ. 109-110 (1991), pp.
569-580).
Toxicity to T. pyriformis (Xue et al. dataset) vs CLogP
Ellison et al.’s NPN regression equation shown
(SAR QSAR Environ. Res. 19, 751-783 (2008).)
__ log (1/IGC50 non-polar narcosis, calculated) = 0.78 log P - 2.01

Toxicity to T. pyriformis (Xue et al. dataset) vs CLogP
Ellison et al.’s NPN and Schultz et al PN regression equations shown
(SAR QSAR Environ. Res. 19, 751-783 (2008), Sci. Total Environ. 109-110 (1991), pp.
569-580.)
__ log (1/IGC50 non-polar narcosis, calculated) = 0.78 log P - 2.01

.-.-. log (1/IGC
50 polar narcosis, calculated) = 0.588 log P – 0.939
Excess toxicity, Log Te = log10Expt/NPN(calc)
(data from Xue et al 2006, -4 < CLogP < 8)
Log Te vs. CLogP

Examples, Log Te > 2
Examples of chemicals of high excess toxicity

showing values for Log Te, log(1/IGC50 (mM)) and
CLogP values
Original chemical classes, with significant Log Te
(beta-haloethers incorrect)
Method (4): SAR analysis
Where practicable, broad structural alerts were split up to

give, “isoreactive” groups:. For example:
Alpha, beta-unsaturated carbonyl compounds divided into:
– Alpha,beta-unsaturated esters,
– Alpha,beta-unsaturated ketones ,
– Alpha,beta-unsaturated amides etc
These were further divided into those with zero, one or two
substituents present at the alpha or beta-unsaturated
carbons
Log P dependence of the toxicity of subsets with individual

structural alerts was examined
• “Reasoning” rules written expressing potency as a function
of Log P
Example of LogP Dependence Rules:
alpha, beta-Unsaturated Aldehydes
High toxicity (0.7-2.1) for CLogP > 0.5,

Moderate toxicity (-0.7 to 0.7) for LogP
≤ 0.5 (extrapolation)
Result: “Eco-Derek 1.0.0”
45 active alerts for T. pyriformis

Each alert has a full set of comments
Where possible, for each alert there are approximate
rules expressing a CLogP (Kow) dependence
Two alerts for cholinesterase inhibition in fish (OPs and
carbamates) have been added.
Some alerts are based on very limited data and are
“provisional” (e.g. alerts for epoxides, nitroalkanes,
secondary allyl or benzyl halides and halopyrimidines).
Eco-Derek runs under Microsoft Windows XP (Service
Pack 2), Windows 2000 (Service Pack 4), Windows Vista
or Windows 7. Structures can be drawn (ISIS-Draw) or
imported as mol files.
Plan
Development of Eco-Derek alerts and log P based rules
Results of “validation” studies of Eco-Derek performance (internal and

external (CADASTER) test sets)
Limitations and potential improvements to Eco-Derek
Summary
Eco-Derek Example (2-Bromo-4,6-dinitroaniline)
Log(1/IGC50) = 1.24, cf. predicted high (0.7-2.1)
Eco-Derek Example (2-Bromo-4,6-dinitroaniline)
Reasoning summary for prediction of high toxicity
Plan
Results of “validation” studies of Eco-

Derek performance (internal and external
test sets)
Limitations and potential improvements to Eco-Derek modelling
Summary
Toxicity to T. pyriformis (Xue et al. dataset)
(Mean toxicity is approx. 0.2 across all data)
Log (1/IGC50) vs. CLogP

(337 Eco-Derek alerting substances in bold)

(684 Eco-Derek non-alerting substances in bold)

Excess toxicity, log10Expt/NPN(calc), T. pyriformis
(Xue et al 2006, -4 < CLogP < 8, Mean = 0.71)
Log Te vs. CLogP

(Xue et al 2006, -4 < CLogP < 8, alerting structures in bold)
Log Te vs. CLogP

(Xue et al 2006, -6 < CLogP < 8, non-alerting structures in bold)
Log Te vs. CLogP

“Internal Validation”
Performance of Eco-Derek [cf. NPN model]
Xue et al 2006
Experimental log(1/IGC50) classified according to: very high ≥ 2.1, high ≥ 0.7-
2.1, moderate ≥ -0.7 to 0.7, low ≥ -2.1 to -0.7 and very low <-2.1:
Table: Proportions of experimental potencies correctly predicted:
% Eco-Derek cf. [% NPN model]

(number of chemicals), * indicates Eco-Derek alert(s) present
Performance of Eco-Derek - Improved
definitions of “low” and “very low”
Xue et al 2006
Eco-Derek predictions defined according to log(1/IGC50(mM)): very high ≥
2.1, high ≥ 0.7-2.1, moderate ≥ -0.7 to 0.7, low ≥ -2.1 to -0.7 or low ≥ -2 to
0.0 (%)# and very low <-2.1 or <-1 (%)#:
Table: Frequency of Eco-Derek potencies correctly predicted:

(number of chemicals), * indicates Eco-Derek alert(s) present
Results from an Eco-Derek “Internal”
Validation using the Xue et al training set
Over 70% accuracy for chemicals of high, moderate or low toxicity,
but only about 30% accuracy for chemicals of very high toxicity.
More under-prediction than over-prediction
Under-prediction of toxicity attributable to:
• (a) insufficient coverage of “polar narcosis”
• (b) non-linearity of logP relationships
• (c) too broad a scope of certain reactivity-based structural alerts
set to the level of the mean toxicity for the class
• (d) the need for additional alerts to cover classes and mechanisms
associated with only moderate levels of toxicity and lower excess
toxicity.
• (e) experimental error (usually < 0.5)
• (f) alert or rule errors, logP and speciation errors
Over-prediction due to b, c, e, f and issues with applicability domains
CADASTER Challenge Training dataset – lessons learnt:
greater discrimination of substituent effects
644 compounds: Chemically similar to Xue et al dataset and a similar

Eco-Derek performance achieved.
Beta-lactones and 2-nitroanilines are potential new alerts
Isothiocyanates: 12 in dataset, 10 fired alert. Diphenyl-
isothiocyanatomethane, very high tox (3.05) predicted high, cf. 2,2-
Dimethylpropyl isothiocyanate moderate tox (0.35) predicted high.
Correlations with reactivity rates have been found (Schultz et al 2005)
Very high poorly predicted, e.g. tetrachlorophenols and
pentafluoronitrobenzene were very high, but predicted high
Compounds such as dichloro-2-nitroaniline of moderately high toxicity
(1.66) or even 3,5-Dichlorophenol (1.56), but predicted only moderate
– substituent effects and polar narcosis models require improvement.
Tox = 3.05 Tox = 0.35

Known CADASTER Test set – example of lessons learnt:
alert modification and new alerts
Underprediction for several due to absence of very high toxicity

prediction for several alerts
alpha-Chlorocinnamaldehyde Tox = 1.73, cf. predicted 0.0, due to 910
not firing (chlorine enhances toxicity cf. methyl reduces it)
3,5-Dinitroaniline (CLogP = 1.12) fires alert 918 for dinitrobenzenes and
914 for 4-Nitro-, 4-cyano- or trichloro-anilines. Expt tox is 0.94, cf. -0.7
to 0.7 (moderate) predicted. Potential underestimation of additional
toxicity derived from the “acidic” aniline from uncoupling of oxidative
phosphorylation? 914 may require division according to substituent
types.
New Alerts for other reactive groups. E.g. thiocyanates?

benzyl thiocyanate, tox =1.5, cf. moderate (0.0) predicted
Test Set: Cadaster Blind Test Set (All)
Log (1/IGC50, mM) vs CLogP
Data from Environmental Toxicity Prediction Challenge

http://www.cadaster.eu/node/65
Toxicity to T. pyriformis
CADASTER Blind Test set, Alerting structures in bold

“External” Validation (CADASTER blind test set)
Significantly poorer performance, particularly for chemicals of

very high toxicity. About 60% correctly predicted, 40%
underestimated.
Eco-Derek predictions defined according to log(1/IGC50(mM)): very high
≥ 2.1, high ≥ 0.7-2.1, moderate ≥ -0.7 to 0.7, low ≥ -2.1 to -0.7 or low ≥
-2 to 0.0 ( ) # and very low <-2.1 or <-1 ( )#:
Table: Proportions of experimental potencies correctly predicted (by Eco-Derek

categories) (CADASTER blind test set)
Compounds with large errors for Eco-Derek (ED) (and the
machine learning approaches?)
Quantifying Prediction Errors in Log(1/IGC50) from
Eco-Derek Potency Classes
Approximate substitution of Eco-Derek predicted potency ranges

for log(1/IGC50) by single values:
Very high (>2.1) = 2.8

High (0.7-2.1) = 1.4
Moderate (-0.7 to 0.7) = 0.0
Low (-2.1 to -0.7)= -1.4
Very low (<-2.1) = -2.8
Values for low and very low, later replaced by mean

experimental values -0.75 and -1.56 (narcosis mechanisms
dominant)
Reminder: For the NPN model, very high CLogP: > 5.21, high:
3.475-5.21, moderate: 1.68-3.475, low: -0.115-1.68 and very
low: <-0.115 respectively.
NPN Model Error: Excess toxicity (Log Te = Log10Expt/NPN(calc.))
Xue et al 2006, All 1121, RMS Error = 1.04, M Error = +0.71
Log Te vs. CLogP

Eco-Derek Error: Log10Expt/Eco-Derek* vs CLogP
Xue et al 2006, r.m.s. error= 0.71, mean abs error = 0.56, mean error = +0.28
log10(Expt/Eco-Derek Predicted) vs. CLogP

* Based on Eco-Derek predicted log(1/IGC50): very high = 2.8, high =
1.4, moderate = 0.0, low = -1.4 and very low <-2.8. For NPN, CLogP:
> 5.21, 3.475-5.21, 1.68-3.475, -0.115-1.68 and <-0.115 respectively.
Adjusted Eco-Derek Error: Log10Expt/Eco-Derek* vs CLogP
Xue et al 2006, r.m.s. error = 0.56, mean abs error =0.46, mean error = 0.06

* Based on Eco-Derek predicted log(1/IGC50): very high = 2.8, high = 1.4,
moderate = 0.0, low = -0.75 and very low = -1.56.
Xue et al 2006, 337 alerting chemicals, RMSE = 0.59, ME = -0.05

1.4, moderate = 0.0, low = -1.4 and very low <-2.8.
Xue et al 2006, 337 alerting chemicals, RMSE = 0.59, ME = -0.06

moderate = 0.0, low = -0.75 and very low = -1.56.
Xue et al 2006, non-alerting chemicals, RMSE = 0.76, ME =+0.43

1.4, moderate = 0.0, low = -1.4 and very low <-2.8. For NPN, CLogP:
(Xue et al 2006, non-alerting chemicals, RMSE = 0.56, ME =+0.11

moderate = 0.0, * low = -0.75 and very low = -1.56. For NPN, CLogP:
Xue et al dataset: Eco-Derek* vs Experimental Toxicity
RMS error = 0.56, mean error = 0.06,
Predicted:
Pred. Very high
High
Moderate
Low
Very low
Actual toxicity, log(1/IGC50(mM)), 1: 1 line shown

Log(1/IGC50) = 0.923 Eco-Derek*+0.071, r2 = 0.92)
CADASTER Challenge: Eco-Derek vs Experimental Toxicity
T. pyriformis CADASTER Blind Test Set
RMS error = 0.965, Cf. 0.56 for Known Test Set
Predicted:
Pred. Very high
High
Moderate
Low
Very low
Actual toxicity, log(1/IGC50(mM))

CADASTER Challenge 2009: Joint Winner, Predicted vs
experimental toxicity to T. pyriformis
RMS error = 0.741 cf. 0.353 for Known Test Set
Pred.
Gavin Cawley, UEA

Norwich UK
Actual toxicity, log(1/IGC50(mM))

Compounds with large errors for Eco-Derek (ED) (and the
machine learning approaches?)
Knowledge Based versus Machine Learning Methods
KB systems explain the basis of their predictions (transparency),

indicate uncertainties and give additional useful experimental data and
information on mechanism. Machine learning results are less easily
interpretable:
KB systems present a basis for deciding whether to accept a
prediction or to find another model, or test!
However KB systems can express confidence in predictions and

potentially provide distinctions between where knowledge or
understanding is well established, sparse or absent.
For example:
• In Derek: probable, plausible or equivocal toxicity
• In Meteor: metabolites are probable, plausible, equivoval doubted or improbable
• Predictive space and model reliability domain assessments used in a recent Derek
Nexus application.
Plan
Eco-Derek – Illustrations
Results of “validation” studies of Eco-Derek performance (internal and

external test sets)
Limitations and potential improvements to

Eco-Derek modelling
Summary
Eco-Derek: Scientific Limitations
Only a moderate number of simple broad alerts, some closely

related, limits the structural applicability domain.
Specific biochemical, receptor, enzyme, or ion-channel mediated

mechanisms of toxicity receive little attention.
The log P - toxicity relationships, particularly for low log P

values, are approximate and require greater validation.
Polar narcosis contribution included only for mono-substituted

anilines or phenols.
• Could be extended in scope
Potential improvements to Eco-Derek
More alerts !
E.g. for other reactive or pro-reactive entities (including
1,3-dihydroxy/amino benzenes, α,β-acetylenic alcohols)
Splitting of broad alerts, such as 906, (quinones,
quinone-imines and precursors) and 902 (SNAr activated
arylating benzenes), by substituent type and position into
smaller classes covering similar reactivity, metabolism
and chemistry, distinguishing more accurately very high,
high and moderate toxicity.
Revised narcosis models and new definitions of low and
very low potency in Eco-Derek documentation
Potential improvements to Eco-Derek
System improvements (unlikely without further funding):

• Quantitative reporting of the baseline toxicity (NPN) value (like skin
permeability coefficient in Derek).
• Batch processing
• Electronic reports
Overall Summary
Structure-activity relationships for 40 hour toxicity in T. pyriformis

including a log P dependence have been implemented in a “Eco-
Derek” knowledge base with toxicity expressed semi-
quantitatively in potency classes.
“Eco-Derek” is easy to use and free from

http://www.inchemicotox.org/ or myself.
Eco-Derek 1.0.0 has a limited coverage of alerts and toxicological

mechanisms but can provide useful information and
understanding.
• It will make the most accurate predictions for industrial chemicals and not for,
the more complex, pharmaceuticals or agrochemicals.
Acknowledgements
Defra, UK (The early development of Eco-Derek was performed as

part of a project sponsored by Defra through the Sustainable
Arable LINK Programme)
Prof. Mark Cronin, Liverpool John Moores University, UK
My colleagues at Lhasa Limited, notably:

• Rob Toy and Bill Button, Software Development
• Dr Philip Judson (Judson Consulting Service)
• Dr Chris Barber, Director of Science
Dr Mark Hewitt, Liverpool John Moores University, UK

WCA Environment, Faringdon, UK (for website)
http://www.inchemicotox.org/
Questions
Email: martin.payne@lhasalimited.org
Eco-Derek downloads from

http://www.inchemicotox.org/software-
downloads/

Prediction of Acute Aquatic Toxicity in Tetrahymena Pyriformis A Knowledge Based System Approach

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Prediction of Acute Aquatic Toxicity in Tetrahymena Pyriformis A Knowledge Based System Approach

Încărcat de

Drepturi de autor:

Formate disponibile

Prediction of acute aquatic toxicity

Development of Eco-Derek: alerts and log P based rules

Eco-Derek 1.0.0 – Illustration

Results of “validation” studies of Eco-Derek performance

Limitations and potential improvements to Eco-Derek

Development of Eco-Derek: alerts and log P

Eco-Derek 1.0.0 – Illustration

Results of “validation” studies of Eco-Derek performance

Limitations and potential improvements to Eco-Derek modelling

Eco-Derek is only a “proof-of-concept” experimental piece

Toxicity model: 40 hour static flow growth inhibition assay

Published data on over 1200 chemicals:

Data storage and SAR analysis:

Excess toxicity factor Te defined as:

Data divided into subsets containing potential structural alerts,

One alert for polar narcosis, for mono-substituted anilines and

__ log (1/IGC50 non-polar narcosis, calculated) = 0.78 log P - 2.01

__ log (1/IGC50 non-polar narcosis, calculated) = 0.78 log P - 2.01

Log Te vs. CLogP

Examples of chemicals of high excess toxicity

Where practicable, broad structural alerts were split up to

Log P dependence of the toxicity of subsets with individual

High toxicity (0.7-2.1) for CLogP > 0.5,

45 active alerts for T. pyriformis

Development of Eco-Derek alerts and log P based rules

Eco-Derek 1.0.0 – Illustration

Results of “validation” studies of Eco-Derek performance (internal and

Limitations and potential improvements to Eco-Derek

Development of Eco-Derek alerts and log P based rules

Eco-Derek 1.0.0 – Illustration

Results of “validation” studies of Eco-

Limitations and potential improvements to Eco-Derek modelling

Log (1/IGC50) vs. CLogP

Log (1/IGC50) vs. CLogP

Log (1/IGC50) vs. CLogP

Log Te vs. CLogP

Log Te vs. CLogP

Log Te vs. CLogP

Table: Proportions of experimental potencies correctly predicted:

% Eco-Derek cf. [% NPN model]

Table: Frequency of Eco-Derek potencies correctly predicted:

644 compounds: Chemically similar to Xue et al dataset and a similar

Tox = 3.05 Tox = 0.35

Underprediction for several due to absence of very high toxicity

New Alerts for other reactive groups. E.g. thiocyanates?

Data from Environmental Toxicity Prediction Challenge

Log (1/IGC50) vs. CLogP

Significantly poorer performance, particularly for chemicals of

Table: Proportions of experimental potencies correctly predicted (by Eco-Derek

Approximate substitution of Eco-Derek predicted potency ranges

Very high (>2.1) = 2.8

Values for low and very low, later replaced by mean

Log Te vs. CLogP

log10(Expt/Eco-Derek Predicted) vs. CLogP

log10(Expt/Eco-Derek Predicted) vs. CLogP

log10(Expt/Eco-Derek Predicted) vs. CLogP

log10(Expt/Eco-Derek Predicted) vs. CLogP

log10(Expt/Eco-Derek Predicted) vs. CLogP

log10(Expt/Eco-Derek Predicted) vs. CLogP

Actual toxicity, log(1/IGC50(mM)), 1: 1 line shown

Actual toxicity, log(1/IGC50(mM))

Gavin Cawley, UEA

Actual toxicity, log(1/IGC50(mM))