<html>
<head>
</head>
<body>
<p style="margin-top: 0">
ADMET Predictor - Bacterial mutagenicity model (MUT_102+wp2)
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The bacterial mutagenicity panel within ADMET Predictor Toxicity Module
features a series of 10 MUT_*** models that predict Ames Mutagenicity in
5 individual strains of Salmonella (and/or E.coli), with or without
metabolic activation, i.e.: MUT_97+1537; MUT_m97+1537; MUT_98; MUT_m98;
MUT_100; MUT_m100; MUT_102+wp2; MUT_m102+wp2; MUT_1535; MUT_m1535. The
ten TOX_MUT* Artificial Neural Network Ensembles (ANNE) are qualitative
models, predicting the mutagenicity of new compounds as “Positive”
(i.e., mutagenic) or “Negative”. Two additional mutagenicity models are
included: i) ADMET Risk™ rule file, called "MUT_Risk", which predicts
overall mutagenicity by counting instances of “Positive"; ii)
"MUT_NIHS", classification model based on the proprietary Ames database
provided by the Division of Genetics and Mutagenesis, National Institute
of Health Sciences of Japan (DGM/NIHS).
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
2 February 2021
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
n/a
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
n/a
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The model was developed in June of 2017 and was first released in ADMET
Predictor 8.5. The model is currently implemented in ADMET Predictor
10.0 (2020).
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The model is proprietary and implemented in the commercial software
ADMET Predictor (by Simulation Plus). Training and test set are not
publicly available.
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
No
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
Salmonella typhimurium (strain TA102) and Escherichia coli (strain wp2)
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
Mutagenicity assessment based on bacterial reverse mutation test using
Salmonella typhimurium TA102 and repair-deficient E.coli strains wp2,
without metabolic activation.
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
Unitless
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
Qualitative variable: the model classifies compounds as "Positive"
(mutagenic) or "Negative" (non-mutagenic).
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The source of the dataset is Bacha et al. (2002) [4], according to which
bacterial mutagenicity data were primarily obtained from the Chemical
Carcinogenesis Research Information System (CCRIS). This toxicology data
file is maintained by the National Cancer Institute and made public
through the National Library of Medicine’s Toxicology Data Network
(TOXNET). It is a scientifically evaluated and fully referenced database
with mutagenicity results for individual bacterial indicator strains (S.
typhimurium TA97, TA1537, TA98, TA100, TA1535, and TA102 and E. coli WP2
uVrA) with and without addition of rat liver microsomal preparation to
measure metabolic activation. These data were supplemented both with
information from the Genetic Activity Profile database maintained by the
Environmental Protection Agency in association with the International
Agency for Research on Cancer and with data from a series of literature
references. Only data referred to S. typhimurium strain TA102 and E.
coli wp2, without metabolic activation, were considered for model
development.
</p>
<p style="margin-top: 0">
Curation of chemical structures was performed automatically and/or
manually within the ADMET Modeler/Predictor, and included the following:
i) extraction of the active moiety from salts and other multicomponent
compounds; ii) standardization of substructural representations (e.g.,
nitro groups); iii) standardization of tautomers (rule-based system that
strikes a balance between consistency and accuracy; the microstate
analysis tool was used to check cases where automatic tautomer
assignments were questionable).
</p>
<p style="margin-top: 0">
Curation of mutagenicity data was performed automatically and/or
manually within the ADMET Predictor, and included the following: i)
removal of duplicate entries (based on shared name or structure or based
on tautomeric equivalence), eliminating all but one example that
represents a consensus of the replicates; ii) handling of structures
with conflicting results (positive and negative) from different data
sources: data are further verified for correctness analysing the
original data source(s) (e.g., journal articles); if the conflict can't
be resolved, then the records are removed.
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
Endpoint quality was dependent on the original literature. Experimental
variability was not taken directly into account, but is known
historically to be about 85% between labs.
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
QSAR
</p>
</body>
</html>
<html>
<head>
</head>
<body>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The ADMET Modeler module (which is part of the ADMET Predictor)
automates the key steps necessary for model building, including the
reduction and selection of descriptors. In a first stage, the number of
descriptors is reduced based on their own properties and how they relate
to other descriptors ("unsupervised" process). The filtering of
molecular descriptors is performed to eliminate those that are
underrepresented, those with very small variance, and those that are
highly correlated with other descriptors. The second, "supervised" stage
takes the relationship of the descriptors to the dependent variable -
their "sensitivity" - into account in prioritizing them for
incorporation during model building. For the building of MUT_102-wp2
model the Genetic algorithm (once per row) method was selected for
variable selection. More detail on this method can be found in the ADMET
Predictor Manual [1].
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
Molecular descriptors were calculated within ADMET Predictor software.
ADMET Predictor generates 341 molecular descriptors from 2D structures,
including Textual Description descriptors and indicators (not used for
modeling), Simple Constitutional descriptors, Topological Indices,
Atom-type Electrotopological State indices, Charge-based descriptors,
Hydrogen bonding descriptors, Molecular Ionization descriptors and
Functional groups (a description of the available descriptors is
provided within the ADMET Predictor Manual [1]).
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
803 compounds / 53 descriptors
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
DESCRIPTOR DOMAIN: applicability domain defined by the descriptor space
of training set compounds (hypercubes in the model's standardized
space). Predictions computed for compounds lying outside the
applicability domain of the model should be assessed as low reliable.
</p>
<p style="margin-top: 0">
RESPONSE DOMAIN: positive/negative
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The applicability domain is defined by hypercubes in the model's
standardized space: the range of training set values for each descriptor
used in the model is mapped to the interval [0,1]. A compound for which
any of those descriptors is below -0.1 or above 1.1 is flagged as "out
of scope" - i.e., as lying outside the applicability domain of the
model. The prediction for such a compound may be correct but it would be
unwise to put much faith in it (i.e., low reliable prediction).
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The limits of the model's applicability domain are defined by the
DESCRIPTOR SPACE of training set compounds (see section 5.2). For new
compounds, the standardised modelling descriptors' values should fall
within the interval [0,1]; if any of those descriptors is below -0.1 or
above 1.1, the compound is flagged as "out of scope" (i.e. outside the
applicability domain of the model).
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The training set consists of 803 compounds, including 166 positive
compounds (ca. 21%) and 637 negative compounds.
</p>
<p style="margin-top: 0">
For model development, the training pool was further randomly splitted
into a training set (66%) and a verification set (33%), with each
individual model in the ensemble "seeing" a different split ("fold").
The verification set is involved in building individual models, albeit
only indirectly: for early stopping and parameter setting in ANNE
classification model. More critically, predictive performance on the
verification set is used to determine which models get included in the
final ensemble. Hence verification statistics are a better indicator of
what to expect for compounds from outside the data set than training set
statistics.
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
Pre-processing of mutagenicity data was performed automatically and/or
manually within the ADMET Predictor, and included the following: i)
removal of duplicate entries (based on shared name or structure or based
on tautomeric equivalence), eliminating all but one example that
represents a consensus of the replicates; ii) handling of structures
with conflicting results (positive and negative) from different data
sources: data are further verified for correctness analysing the
original data source(s) (e.g., journal articles); if the conflict can't
be resolved, then the records are removed.
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
Concordance= 87.9%; Sensitivity = 81.9%; Specificity = 89.5%
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
n/a
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
n/a
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
n/a
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
n/a
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<span>The model had an acceptable uncertainty profile (Clark et al. J
Cheminfo 2014, 6(1), 1-19.).</span>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The test set consists of 142 compounds, including 32 positive compounds
(ca. 22.5%) and 110 negative compounds.
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The dataset was partitioned into a training pool (i.e., subset of
compounds that are used to train the model) and a test set (i.e., group
of compounds that are set aside before training begins, and is not
involved in the training in any way). The splitting was performed by
random selection of test set chemicals from the dataset.
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
Concordance= 89.4%; Sensitivity = 81.3%; Specificity = 91.8%
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The partitioning of the dataset was aimed at maximizing the size and
diversity of training and test set. The external validation set is
considered as sufficiently large (since represents nearly 20% of the
data set) and representative of the applicability domain, especially
considering the response representation (training and test set exhibit a
similar balance between positive and negative compounds).
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
n/a
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
The high number of modeling descriptors involved in ANNE models doesn't
allow an easy mechanistic interpretation of the model. However, ADMET
Predictor allows the user to perform descriptor sensitivity analysis for
this model [1].
</p>
<p style="margin-top: 0">
</p>
<p style="margin-top: 0">
The DESCRIPTOR SENSITIVITY ANALYSIS (DSA) allows the user to explore the
relationship between one specific descriptor and model output in detail,
for one data record at a time. For classification models, the
sensitivities can be visualized with the Gradient Bar graph. Within this
plot, the bars show the average sensitivities of descriptors used by the
selected model for the selected molecule. The direction of the bars
shows the sign of the sensitivity while the size of the bars shows the
magnitude of the sensitivity. The longer the bar is, the greater the
impact the respective descriptor has on the prediction of the selected
molecule. By default, descriptors are automatically sorted by the
magnitude of their sensitivity.
</p>
<p style="margin-top: 0">
Calculation of sensitivity for binary classification models: first, the
program calculates the minimal change Δd, either positive or negative or
both, of a given descriptor “d” to flip the current prediction for the
selected molecule. The smaller the change Δd is, the higher the
descriptor d sensitivity. Therefore, the d’s sensitivity is defined as
the reciprocal of the minimal flipping change: S = 1/(Δd+1). Thus, if
the DSA for a particular descriptor is positive, then increasing the
value of the descriptor may cause the prediction to flip. On the other
hand, if the sensitivity bar is negative then decreasing the magnitude
of the descriptor may cause the prediction to be flipped. However, since
a descriptor can go both ways to flip a prediction, some descriptors
will have both negative and positive bars.
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
A posteriori mechanistic interpretation, with the support of DESCRIPTOR
SENSITIVITY ANALYSIS window available within the ADMET Predictor (see
section 8.1)
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
n/a
</p>
</body>
</html>
<html>
<head>
</head>
<body>
<p style="margin-top: 0">
n/a
</p>
</body>
</html>
To be entered by JRC
To be entered by JRC
To be entered by JRC
To be entered by JRC