Piir, G.; Sild, S.; Maran, U. Classifying bio-concentration factor with random forest algorithm, influence of the bio-accumulative vs. non-bio-accumulative compound ratio to modelling result, and applicability domain for random forest model. SAR QSAR Environ. Res. 2014, 25, 967-981.

QsarDB Repository

Piir, G.; Sild, S.; Maran, U. Classifying bio-concentration factor with random forest algorithm, influence of the bio-accumulative vs. non-bio-accumulative compound ratio to modelling result, and applicability domain for random forest model. SAR QSAR Environ. Res. 2014, 25, 967-981.

QDB archive DOI: 10.15152/QDB.116   DOWNLOAD

QsarDB content

Property BCF_class: Experimental BCF class (nB - non-bio-accumulative, B - bioaccumulative)

Tab1.Model1: Imbalanced model towards nB-compounds

Random forest (classification)

Open in:QDB ExplorerQDB Predictor

NameTypenAccuracy
Training settraining6731.000
Out of bag set iinternal validation6730.854
Validation setexternal validation3340.874
Tab1.Model2: Balanced model

Random forest (classification)

Open in:QDB ExplorerQDB Predictor

NameTypenAccuracy
Training settraining6730.878
Out of bag set iinternal validation6730.842
Validation setexternal validation3340.844
Tab1.Model3: Imbalanced model towards B-compounds

Random forest (classification)

Open in:QDB ExplorerQDB Predictor

NameTypenAccuracy
Training settraining6730.767
Out of bag set iinternal validation6730.761
Validation setexternal validation3340.737

Property logBCF: Experimental logarithmic BCF

Citing

When using this QDB archive, please cite (see details) it together with the original article:

  • Piir, G.; Sild, S.; Maran, U. Data for: Classifying bio-concentration factor with random forest algorithm, influence of the bio-accumulative vs. non-bio-accumulative compound ratio to modelling result, and applicability domain for random forest model. QsarDB repository, QDB.116. 2014. https://doi.org/10.15152/QDB.116

  • Piir, G.; Sild, S.; Maran, U. Classifying bio-concentration factor with random forest algorithm, influence of the bio-accumulative vs. non-bio-accumulative compound ratio to modelling result, and applicability domain for random forest model. SAR QSAR Environ. Res. 2014, 25, 967-981. https://doi.org/10.1080/1062936X.2014.969310

Metadata

Show simple item record

dc.date.accessioned2014-07-30T06:02:22Z
dc.date.available2014-07-30T06:02:22Z
dc.date.issued2014-07-30
dc.identifier.urihttp://hdl.handle.net/10967/116
dc.identifier.urihttp://dx.doi.org/10.15152/QDB.116
dc.description.abstractIn environmental risk assessment, the bio-concentration factor (BCF) is a widely used parameter in the estimation of the bio-accumulation potential of chemicals. BCF data often have an uneven distribution of classes (bio-accumulative vs. non-bio-accumulative), which could severely bias the classification results towards the prevailing class. The present study focuses on the influence of uneven distribution of the classes in training phase of Random Forest (RF) classification models. Three different training set designs were used and descriptors selected to the models based on the occurrence frequency in RF trees and considering the mechanistic aspects they reflect. Models were compared and their classification performance was analysed, indicating good predictive characteristics (sensitivity = 0.90 and specificity = 0.83) for the balanced set; also imbalanced sets have their strengths in certain application scenarios. The confidence of classifications was assessed with a new schema for the applicability domain that makes use of the RF proximity matrix by analysing the similarity between the predicted compound and the training set of the model. All developed models were made available in the transparent, accessible and reproducible way in QsarDB repository (http://dx.doi.org/10.15152/QDB.116).
dc.publisherGeven Piir
dc.publisherSulev Sild
dc.publisherUko Maran
dc.rightsAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.titlePiir, G.; Sild, S.; Maran, U. Classifying bio-concentration factor with random forest algorithm, influence of the bio-accumulative vs. non-bio-accumulative compound ratio to modelling result, and applicability domain for random forest model. SAR QSAR Environ. Res. 2014, 25, 967-981.
qdb.property.endpoint2. Environmental fate parameters 2.4. Bioconcentrationen_US
qdb.descriptor.applicationXLOGP3 3.2.2en_US
qdb.descriptor.applicationPaDEL-Descriptor 2.18en_US
bibtex.entryarticleen_US
bibtex.entry.authorPiir, G.
bibtex.entry.authorSild, S.
bibtex.entry.authorMaran, U.
bibtex.entry.doi10.1080/1062936X.2014.969310
bibtex.entry.journalSAR QSAR Environ. Res.
bibtex.entry.number12
bibtex.entry.pages967-981
bibtex.entry.titleClassifying bio-concentration factor with random forest algorithm, influence of the bio-accumulative vs. non-bio-accumulative compound ratio to modelling result, and applicability domain for random forest modelen_US
bibtex.entry.volume25
bibtex.entry.year2014
qdb.model.typeRandom forest (classification)en_US
qdb.descriptor.calculationTab1.Model1
qdb.descriptor.calculationTab1.Model2
qdb.descriptor.calculationTab1.Model3


Files in this item

NameDescriptionFormatSizeView
2014SQER967.qdb.zipRandom Forest classification models for BCFapplication/zip1.302MbView/Open
Files associated with this item are distributed
under Creative Commons license.

This item appears in the following Collection(s)

Show simple item record