Kotli, M.; Piir, G.; Maran, U. Data-driven machine learning of chronic toxicity of imbalanced heterogeneous pesticide data to earthworms.

QsarDB Repository

Kotli, M.; Piir, G.; Maran, U. Data-driven machine learning of chronic toxicity of imbalanced heterogeneous pesticide data to earthworms.

QDB archive DOI: 10.15152/QDB.263   DOWNLOAD

QsarDB content

Property NOEC_class: chronic toxicity class of No-Observed-Effect Concentration for earthworms i

modelA: Model A i

Ensemble model (classification)

Open in:QDB ExplorerQDB Predictor

NameTypenAccuracy
Training (under sampled)training921.000
Training (all)validation2410.705
Validationexternal validation610.656
Testexternal validation1470.619
modelB: Model B i

Ensemble model (classification)

Open in:QDB ExplorerQDB Predictor

NameTypenAccuracy
Training (under sampled)training1021.000
Training (all)validation2410.876
Validationexternal validation610.672
Testexternal validation1470.687
model_AB: model_AB i

Ensemble model ensemble (classification)

Open in:QDB ExplorerQDB Predictor

NameTypenAccuracy
Training (under sampled)training511.000
Training (all)validation1940.853
Validationexternal validation140.750
Testexternal validation1470.785

Property NOEC: chronic toxicity of No-Observed-Effect Concentration for earthworms as mg/kg in soil [mg/kg] i

Citing

When using this QDB archive, please cite (see details) it together with the original article:

  • Kotli, M. Data for: Data-driven machine learning of chronic toxicity of imbalanced heterogeneous pesticide data to earthworms. QsarDB repository, QDB.263. 2024. https://doi.org/10.15152/QDB.263

  • Kotli, M.; Piir, G.; Maran, U. Data-driven machine learning of chronic toxicity of imbalanced heterogeneous pesticide data to earthworms.

Metadata

Show full item record

Title: Kotli, M.; Piir, G.; Maran, U. Data-driven machine learning of chronic toxicity of imbalanced heterogeneous pesticide data to earthworms.
Abstract:The development of Quantitative Structure-Activity Relationship (QSAR) models is critical for predicting the chronic toxicity of chemical compounds to earthworms, a key indicator species in soil ecosystems. This study presents an advanced QSAR modelling approach that integrates gradient-boosted decision trees as classifiers with a genetic algorithm (GA) for feature selection and Bayesian optimization for hyperparameter tuning to build models with different performance characteristics on an imbalanced dataset. Descriptor contributions were analyzed with SHAP values to distinguish structural features contributing to pesticides’ toxicity and non-toxicity. The final QSAR model was constructed as a stacked ensemble of models and combines the strengths of the base models. Evaluation of the model with an external test set of 142 compounds demonstrated solid predictive capabilities with a well-defined applicability domain and a balanced accuracy of 77%. The model representation follows FAIR principles and is available on QsarDB.org.
URI:http://hdl.handle.net/10967/263
http://dx.doi.org/10.15152/QDB.263
Date:2024-10-09


Files in this item

NameDescriptionFormatSizeView
arch.zipData and modelsapplication/zip363.1KbView/Open
Files associated with this item are distributed
under Creative Commons license.

This item appears in the following Collection(s)

Show full item record