Ensemble model (classification)
Open in:QDB ExplorerQDB Predictor
Name | Type | n | Accuracy |
---|---|---|---|
Training (under sampled) | training | 92 | 1.000 |
Training (all) | validation | 241 | 0.705 |
Validation | external validation | 61 | 0.656 |
Test | external validation | 147 | 0.619 |
Ensemble model (classification)
Open in:QDB ExplorerQDB Predictor
Name | Type | n | Accuracy |
---|---|---|---|
Training (under sampled) | training | 102 | 1.000 |
Training (all) | validation | 241 | 0.876 |
Validation | external validation | 61 | 0.672 |
Test | external validation | 147 | 0.687 |
Ensemble model ensemble (classification)
Open in:QDB ExplorerQDB Predictor
Name | Type | n | Accuracy |
---|---|---|---|
Training (under sampled) | training | 51 | 1.000 |
Training (all) | validation | 194 | 0.853 |
Validation | external validation | 14 | 0.750 |
Test | external validation | 147 | 0.785 |
When using this QDB archive, please cite (see details) it together with the original article:
Kotli, M.; Piir, G.; Maran, U. Data for: Predictive Modeling of Pesticides Reproductive Toxicity in Earthworms Using Interpretable Machine-Learning Techniques on Imbalanced Data. QsarDB repository, QDB.263. 2024. https://doi.org/10.15152/QDB.263
Kotli, M.; Piir, G.; Maran, U. Predictive Modeling of Pesticides Reproductive Toxicity in Earthworms Using Interpretable Machine-Learning Techniques on Imbalanced Data. ACS Omega 2025, https://doi.org/10.1021/acsomega.4c09719
Title: | Kotli, M.; Piir, G.; Maran, U. Predictive Predictive Modeling of Pesticides Reproductive Toxicity in Earthworms Using Interpretable Machine-Learning Techniques on Imbalanced Data. ACS Omega 2025. |
Abstract: | The earthworm is a key indicator species in soil ecosystems. This makes the reproductive toxicity of chemical compounds to earthworms a desired property of determination and makes computational models necessary for descriptive and predictive purposes. Thus, the aim was to develop an advanced Quantitative Structure–Activity Relationship modeling approach for this complex property with imbalanced data. The approach integrated gradient-boosted decision trees as classifiers with a genetic algorithm for feature selection and Bayesian optimization for hyperparameter tuning. An additional goal was to analyze and interpret, using SHAP values, the structural features encoded by the molecular descriptors that contribute to pesticide toxicity and nontoxicity, the most notable of which are solvation entropy and a number of hydrolyzable bonds. The final model was constructed as a stacked ensemble of models and combined the strengths of the individual models. Evaluation of this model with an external test set of 147 compounds demonstrated a well-defined applicability domain and sufficient predictive capabilities with a Balanced Accuracy of 77%. The model representation follows FAIR principles and is available on QsarDB.org. |
URI: | http://hdl.handle.net/10967/263
http://dx.doi.org/10.15152/QDB.263 |
Date: | 2024-10-09 |
Funding: | This work was supported by the Ministry of Education and Research, Republic of Estonia, through the Estonian Research Council (grant number PRG1509), Ministry of Climate, Republic of Estonia (grant 4-4/22/19), Ministry of Social Affairs, Republic of Estonia (grants 3-4/1593-1, 3-4/2332-1), and European Union through Horizon Europe Health Framework Program project “Partnership for the Assessment of Risks from Chemicals (Grant ID 101057014).″ |
Name | Description | Format | Size | View |
---|---|---|---|---|
arch_omega.zip | Data and models | application/zip | 363.2Kb | View/ |