Ensemble model (classification)
Open in:QDB ExplorerQDB Predictor
Name | Type | n | Accuracy |
---|---|---|---|
Training (under sampled) | training | 92 | 1.000 |
Training (all) | validation | 241 | 0.705 |
Validation | external validation | 61 | 0.656 |
Test | external validation | 147 | 0.619 |
Ensemble model (classification)
Open in:QDB ExplorerQDB Predictor
Name | Type | n | Accuracy |
---|---|---|---|
Training (under sampled) | training | 102 | 1.000 |
Training (all) | validation | 241 | 0.876 |
Validation | external validation | 61 | 0.672 |
Test | external validation | 147 | 0.687 |
Ensemble model ensemble (classification)
Open in:QDB ExplorerQDB Predictor
Name | Type | n | Accuracy |
---|---|---|---|
Training (under sampled) | training | 51 | 1.000 |
Training (all) | validation | 194 | 0.853 |
Validation | external validation | 14 | 0.750 |
Test | external validation | 147 | 0.785 |
When using this QDB archive, please cite (see details) it together with the original article:
Kotli, M. Data for: Data-driven machine learning of chronic toxicity of imbalanced heterogeneous pesticide data to earthworms. QsarDB repository, QDB.263. 2024. https://doi.org/10.15152/QDB.263
Kotli, M.; Piir, G.; Maran, U. Data-driven machine learning of chronic toxicity of imbalanced heterogeneous pesticide data to earthworms.
Title: | Kotli, M.; Piir, G.; Maran, U. Data-driven machine learning of chronic toxicity of imbalanced heterogeneous pesticide data to earthworms. |
Abstract: | The development of Quantitative Structure-Activity Relationship (QSAR) models is critical for predicting the chronic toxicity of chemical compounds to earthworms, a key indicator species in soil ecosystems. This study presents an advanced QSAR modelling approach that integrates gradient-boosted decision trees as classifiers with a genetic algorithm (GA) for feature selection and Bayesian optimization for hyperparameter tuning to build models with different performance characteristics on an imbalanced dataset. Descriptor contributions were analyzed with SHAP values to distinguish structural features contributing to pesticides’ toxicity and non-toxicity. The final QSAR model was constructed as a stacked ensemble of models and combines the strengths of the base models. Evaluation of the model with an external test set of 142 compounds demonstrated solid predictive capabilities with a well-defined applicability domain and a balanced accuracy of 77%. The model representation follows FAIR principles and is available on QsarDB.org. |
URI: | http://hdl.handle.net/10967/263
http://dx.doi.org/10.15152/QDB.263 |
Date: | 2024-10-09 |
Name | Description | Format | Size | View |
---|---|---|---|---|
arch.zip | Data and models | application/zip | 363.1Kb | View/ |