Belfield, S. J.; Cronin, M. T. D.; Enoch, S. J.; Firman, J. W. Guidance for Good Practice in the Application of Machine Learning in Development of Toxicological Quantitative Structure-Activity Relationships (QSARs). PLOS ONE, 2023, 18, e0282924.

Belfield, S. J.; Cronin, M. T. D.; Enoch, S. J.; Firman, J. W. Guidance for Good Practice in the Application of Machine Learning in Development of Toxicological Quantitative Structure-Activity Relationships (QSARs). PLOS ONE, 2023, 18, e0282924.

QDB archive DOI: 10.15152/QDB.264 DOWNLOAD

QsarDB content

Property pIGC50: 40-h Tetrahymena toxicity as log(1/IGC50) [log(L/mmol)] i

1994 Compounds

RF: QSAR model for Tetrahymena pyriformis growth inhibition using the RF algorithm

Random forest (regression)

Open in:QDB Explorer QDB Predictor

Name	Type	n	R2	σ
Training set	training	1994	0.975	0.190
10-fold cross-validation	internal validation	1994	0.753	0.524

SVM: QSAR model for Tetrahymena pyriformis growth inhibition using the SVM algorithm

Support vector machine (regression)

Open in:QDB Explorer QDB Predictor

Name	Type	n	R2	σ
Training set	training	1994	0.976	0.161
10-fold cross-validation	internal validation	1994	0.800	0.466

KNN: QSAR model for Tetrahymena pyriformis growth inhibition using the KNN algorithm

k-Nearest neighbors (regression)

Open in:QDB Explorer QDB Predictor

Name	Type	n	R2	σ
Training set	training	1994	0.852	0.404
10-fold cross-validation	internal validation	1994	0.689	0.581

XGB: QSAR model for Tetrahymena pyriformis growth inhibition using the XGB algorithm

Extreme Gradient Boosting (regression)

Open in:QDB Explorer QDB Predictor

Name	Type	n	R2	σ
Training set	training	1994	0.994	0.081
10-fold cross-validation	internal validation	1994	0.801	0.465

SNN: QSAR model for Tetrahymena pyriformis growth inhibition using the SNN algorithm

Neural network (regression)

Open in:QDB Explorer QDB Predictor

Name	Type	n	R2	σ
Training set	training	1994	0.968	0.188
10-fold cross-validation	internal validation	1994	0.811	0.453

DNN: QSAR model for Tetrahymena pyriformis growth inhibition using the DNN algorithm

Neural network (regression)

Open in:QDB Explorer QDB Predictor

Name	Type	n	R2	σ
Training set	training	1994	0.993	0.086
10-fold cross-validation	internal validation	1994	0.829	0.431

Citing

When using this QDB archive, please cite (see details) it together with the original article:

Chrysochoou, G.; Sild, S. Data for: Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). QsarDB repository, QDB.264. 2024. https://doi.org/10.15152/QDB.264
Belfield, S.; Cronin, M. T. D.; Enoch, S. J.; Firman, J. W. Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). PLoS One 2023, 18, e0282924. https://doi.org/10.1371/journal.pone.0282924

Metadata

Show simple item record

dc.contributor.other	This project receives funding from the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 964537 (RISK-HUNT3R), and it is part of the ASPIS cluster; the Horizon Europe Framework Programme project Partnership for the Assessment of Risks from Chemicals (PARC, grant 101057014) under Tasks 5.2 and 6.4.2; and the QUANTUM-TOX - Revolutionizing Computational Toxicology with Electronic Structure Descriptors and Artificial Intelligence (QUANTUM-TOX) HORIZON-EIC-2023-PATHFINDEROPEN-01 Project number: 101130724. UM, SS, GP acknowledge support by the Ministry of Education and Research, Republic of Estonia, through Estonian Research Council (grant number PRG1509), Ministry of Climate, Republic of Estonia (grant 4-4/22/19), Ministry of Social Affairs, Republic of Estonia (grant 3-4/1593-1).
dc.date.accessioned	2024-10-21T11:48:57Z
dc.date.available	2024-10-21T11:48:57Z
dc.date.issued	2024-10-21
dc.identifier.uri	http://hdl.handle.net/10967/264
dc.identifier.uri	http://dx.doi.org/10.15152/QDB.264
dc.description.abstract	Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animalintensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable– appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for “best practice” aimed at mitigation of their influence. However, the scope of such exercises has remained limited to “classical” QSAR–that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.	en_US
dc.publisher	Georgios Chrysochoou
dc.publisher	Sulev Sild
dc.rights	Attribution-ShareAlike 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-sa/4.0/	*
dc.title	Belfield, S. J.; Cronin, M. T. D.; Enoch, S. J.; Firman, J. W. Guidance for Good Practice in the Application of Machine Learning in Development of Toxicological Quantitative Structure-Activity Relationships (QSARs). PLOS ONE, 2023, 18, e0282924.
qdb.property.endpoint	6. Other (Acute toxicity to ciliate protozoa)	en_US
qdb.property.species	Tetrahymena pyriformis	en_US
qdb.descriptor.application	PaDEL-Descriptor 2.21	en_US
qdb.prediction.application	scikit-learn 0.22.1	en_US
qdb.prediction.application	XGBoost 1.2.1	en_US
qdb.prediction.application	Keras 2.4.0	en_US
qdb.prediction.application	TensorFlow 2.3.1	en_US
bibtex.entry	article	en_US
bibtex.entry.author	Belfield, Samuel
bibtex.entry.author	Cronin, Mark T. D.
bibtex.entry.author	Enoch, Steven J.
bibtex.entry.author	Firman, James W.
bibtex.entry.doi	10.1371/journal.pone.0282924	en_US
bibtex.entry.journal	PLoS One	en_US
bibtex.entry.number	5	en_US
bibtex.entry.pages	e0282924	en_US
bibtex.entry.title	Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs)	en_US
bibtex.entry.volume	18	en_US
bibtex.entry.year	2023
qdb.model.type	Random forest (regression)	en_US
qdb.model.type	Support vector machine (regression)	en_US
qdb.model.type	k-Nearest neighbors (regression)	en_US
qdb.model.type	Neural network (regression)	en_US
qdb.model.type	Extreme Gradient Boosting (regression)

Files in this item

Name	Description	Format	Size	View
tetrahymena.qdb.zip	Main article	application/zip	22.08Mb	View/Open

Files associated with this item are distributed
under Creative Commons license.

This item appears in the following Collection(s)

Original publications
Liverpool John Moores University (England), Chemoinformatics Research Group

Show simple item record

Search

Chemical search

Advanced Search

Belfield, S. J.; Cronin, M. T. D.; Enoch, S. J.; Firman, J. W. Guidance for Good Practice in the Application of Machine Learning in Development of Toxicological Quantitative Structure-Activity Relationships (QSARs). PLOS ONE, 2023, 18, e0282924.

QsarDB Repository

Belfield, S. J.; Cronin, M. T. D.; Enoch, S. J.; Firman, J. W. Guidance for Good Practice in the Application of Machine Learning in Development of Toxicological Quantitative Structure-Activity Relationships (QSARs). PLOS ONE, 2023, 18, e0282924.

QsarDB content

Property pIGC50: 40-h Tetrahymena toxicity as log(1/IGC50) [log(L/mmol)] i

RF: QSAR model for Tetrahymena pyriformis growth inhibition using the RF algorithm

SVM: QSAR model for Tetrahymena pyriformis growth inhibition using the SVM algorithm

KNN: QSAR model for Tetrahymena pyriformis growth inhibition using the KNN algorithm

XGB: QSAR model for Tetrahymena pyriformis growth inhibition using the XGB algorithm

SNN: QSAR model for Tetrahymena pyriformis growth inhibition using the SNN algorithm

DNN: QSAR model for Tetrahymena pyriformis growth inhibition using the DNN algorithm

Citing

Metadata

Files in this item

This item appears in the following Collection(s)

Search

Browse

All of QsarDB

This Collection

My Account