Belfield, S. J.; Cronin, M. T. D.; Enoch, S. J.; Firman, J. W. Guidance for Good Practice in the Application of Machine Learning in Development of Toxicological Quantitative Structure-Activity Relationships (QSARs). PLOS ONE, 2023, 18, e0282924.

QsarDB Repository

Belfield, S. J.; Cronin, M. T. D.; Enoch, S. J.; Firman, J. W. Guidance for Good Practice in the Application of Machine Learning in Development of Toxicological Quantitative Structure-Activity Relationships (QSARs). PLOS ONE, 2023, 18, e0282924.

QDB archive DOI: 10.15152/QDB.264   DOWNLOAD

QsarDB content

Property pIGC50: 40-h Tetrahymena toxicity as log(1/IGC50) [log(L/mmol)] i

RF: QSAR model for Tetrahymena pyriformis growth inhibition using the RF algorithm

Random forest (regression)

Open in:QDB ExplorerQDB Predictor

NameTypen

R2

σ

Training settraining19940.9750.190
10-fold cross-validationinternal validation19940.7530.524
SVM: QSAR model for Tetrahymena pyriformis growth inhibition using the SVM algorithm

Support vector machine (regression)

Open in:QDB ExplorerQDB Predictor

NameTypen

R2

σ

Training settraining19940.9760.161
10-fold cross-validationinternal validation19940.8000.466
KNN: QSAR model for Tetrahymena pyriformis growth inhibition using the KNN algorithm

k-Nearest neighbors (regression)

Open in:QDB ExplorerQDB Predictor

NameTypen

R2

σ

Training settraining19940.8520.404
10-fold cross-validationinternal validation19940.6890.581
XGB: QSAR model for Tetrahymena pyriformis growth inhibition using the XGB algorithm

Extreme Gradient Boosting (regression)

Open in:QDB ExplorerQDB Predictor

NameTypen

R2

σ

Training settraining19940.9940.081
10-fold cross-validationinternal validation19940.8010.465
SNN: QSAR model for Tetrahymena pyriformis growth inhibition using the SNN algorithm

Neural network (regression)

Open in:QDB ExplorerQDB Predictor

NameTypen

R2

σ

Training settraining19940.9680.188
10-fold cross-validationinternal validation19940.8110.453
DNN: QSAR model for Tetrahymena pyriformis growth inhibition using the DNN algorithm

Neural network (regression)

Open in:QDB ExplorerQDB Predictor

NameTypen

R2

σ

Training settraining19940.9930.086
10-fold cross-validationinternal validation19940.8290.431

Citing

When using this QDB archive, please cite (see details) it together with the original article:

  • Chrysochoou, G.; Sild, S. Data for: Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). QsarDB repository, QDB.264. 2024. https://doi.org/10.15152/QDB.264

  • Belfield, S.; Cronin, M. T. D.; Enoch, S. J.; Firman, J. W. Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). PLoS One 2023, 18, e0282924. https://doi.org/10.1371/journal.pone.0282924

Metadata

Show full item record

Title: Belfield, S. J.; Cronin, M. T. D.; Enoch, S. J.; Firman, J. W. Guidance for Good Practice in the Application of Machine Learning in Development of Toxicological Quantitative Structure-Activity Relationships (QSARs). PLOS ONE, 2023, 18, e0282924.
Abstract:Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animalintensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable– appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for “best practice” aimed at mitigation of their influence. However, the scope of such exercises has remained limited to “classical” QSAR–that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.
URI:http://hdl.handle.net/10967/264
http://dx.doi.org/10.15152/QDB.264
Date:2024-10-21


Files in this item

NameDescriptionFormatSizeView
tetrahymena.qdb.zipMain articleapplication/zip22.08MbView/Open
Files associated with this item are distributed
under Creative Commons license.

This item appears in the following Collection(s)

Show full item record