REPOSITORY QDB RESOURCES NEWS CONTACTS

Gramatica, P.; Giani, E.; Papa, E. Statistical external validation and consensus modeling: A QSPR case study for Koc prediction. J. Mol. Graph. Model. 2007, 25, 6, 755–766.

QsarDB Repository

Gramatica, P.; Giani, E.; Papa, E. Statistical external validation and consensus modeling: A QSPR case study for Koc prediction. J. Mol. Graph. Model. 2007, 25, 6, 755–766.

QDB archive DOI: 10.15152/QDB.135   DOWNLOAD

QsarDB content

Property logKoc: logarithm of soil sorption coefficient i

Compounds: 643 | Models: 6 | Predictions: 11

Eq1: Best externally predictive model i

Regression model (regression)

Open in:QDB Explorer QDB Predictor

Name Type n

R2

σ

Training set training 93 0.799 0.554
Validation set external validation 550 0.785 0.556
Eq2: Full model, including all data i

Regression model (regression)

Open in:QDB Explorer QDB Predictor

Name Type n

R2

σ

Training set training 643 0.789 0.551
Tab2-9: Correlation with logKow

Regression model (regression)

Open in:QDB Explorer QDB Predictor

Name Type n

R2

σ

Training set training 93 0.774 0.587
Validation set external validation 550 0.770 0.572
Tab2-10: Correlation with logSw

Regression model (regression)

Open in:QDB Explorer QDB Predictor

Name Type n

R2

σ

Training set training 93 0.796 0.558
Validation set external validation 550 0.779 0.562
Tab2-11: logKow and aromaticity

Regression model (regression)

Open in:QDB Explorer QDB Predictor

Name Type n

R2

σ

Training set training 93 0.804 0.547
Validation set external validation 550 0.799 0.535
Tab2-12: logSw and aromaticity

Regression model (regression)

Open in:QDB Explorer QDB Predictor

Name Type n

R2

σ

Training set training 93 0.810 0.539
Validation set external validation 550 0.797 0.539

Citing

When using this data, please cite the original article and this QDB archive:

Metadata

Show full item record

Title: Gramatica, P.; Giani, E.; Papa, E. Statistical external validation and consensus modeling: A QSPR case study for Koc prediction. J. Mol. Graph. Model. 2007, 25, 6, 755–766.
Abstract: The soil sorption partition coefficient (log Koc) of a heterogeneous set of 643 organic non-ionic compounds, with a range of more than 6 log units, is predicted by a statistically validated QSAR modeling approach. The applied multiple linear regression (ordinary least squares, OLS) is based on a variety of theoretical molecular descriptors selected by the genetic algorithms-variable subset selection (GA-VSS) procedure. The models were validated for predictivity by different internal and external validation approaches. For external validation we applied self organizing maps (SOM) to split the original data set: the best four-dimensional model, developed on a reduced training set of 93 chemicals, has a predictivity of 78% when applied on 550 validation chemicals (prediction set). The selected molecular descriptors, which could be interpreted through their mechanistic meaning, were compared with the more common physico-chemical descriptors log Kow and log Sw. The chemical applicability domain of each model was verified by the leverage approach in order to propose only reliable data. The best predicted data were obtained by consensus modeling from 10 different models in the genetic algorithm model population.
URI: http://hdl.handle.net/10967/135
http://dx.doi.org/10.15152/QDB.135
Date: 2015-01-27


Files in this item

Name Description Format Size View
2007JMGM755.qdb.zip QSARs for soil sorption coefficients application/x-zip 72.87Kb View/Open
Eq1&2_Q47-19-49-477.pdf QMRF PDF 95.93Kb View/Open
Files associated with this item are distributed
under Creative Commons license.

This item appears in the following Collection(s)

Show full item record