Random forest (classification)
Open in:QDB ExplorerQDB Predictor
| Name | Type | n | Accuracy |
|---|---|---|---|
| Training set | training | 10368 | 1.000 |
| Out of bag set | internal validation | 10368 | 0.799 |
| Test set | external validation | 24023 | 0.793 |
| Validation set | external validation | 1018 | 0.753 |
When using this QDB archive, please cite (see details) it together with the original article:
Piir, G. Data for: Classification of Thyroid Peroxidase (TPO) Inhibitors Using Transfer Learning with SMILES Embeddings. QsarDB repository, QDB.272. 2026. https://doi.org/10.15152/QDB.272
Piir, G.; Sild, S.; Spilioti, E.; Nikolopoulou, D.; Katsanou, E.; Langezaal, I.; Maran, U. Classification of Thyroid Peroxidase (TPO) Inhibitors Using Transfer Learning with SMILES Embeddings. Chemical Research in Toxicology 2026.
| Title: | Piir, G.; Sild, S.; Spilioti, E.; Nikolopoulou, D.; Katsanou, E.; Langezaal, I.; Maran, U. Classification of Thyroid Peroxidase (TPO) Inhibitors Using Transfer Learning with SMILES Embeddings. Chemical Research in Toxicology 2026. |
| Abstract: | Thyroid hormones (THs) regulate many processes in mammals and, therefore, affect every organ in the body. Thyroid peroxidase (TPO) is an essential enzyme for the successful biosynthesis of THs. Although TPO inhibition is a well-documented molecular initiating event (MIE) in thyroid hormone system disruption adverse outcome pathways (AOPs), experimental methods and computational models to assess TPO activity are lacking. Efficient computational new approach methodologies (NAMs) are a viable solution for identifying TPO inhibitors from a large pool of agrochemicals. The aim of this study was to investigate the suitability of SMILES embeddings generated using a specialized language model (SLM) based on a pretrained deep neural network (DNN) for applying a transfer learning approach in the development of quantitative structure−activity relationships for classifying TPO inhibitors. Traditional theoretical molecular descriptors were used for comparison. Two different molecular descriptor sets resulted in Random Forest (RF) models that performed similarly on the training and test sets, while the sensitivity for the external validation set was substantially different between the two models (0.788 vs 0.490). Comparison of the predictions with the TPO inhibition data of the chemicals assessed by EFSA and EU-NETVAL laboratories showed good agreement. At the same time, analysis of experimental data from other sources showed some conflicting estimates. This suggests that further and more precise studies are needed for some compounds. This study advances in silico methodologies by implementing transfer learning for QSAR modeling from text representations (e.g., SMILES) using the pretrained Bidirectional Encoder Representations from Transformers (BERT) architecture. While traditional QSAR approach relies on molecular descriptors, this evaluation shows that model-generated SMILES embeddings can expand the applicability domain, indicating a more robust representation of structural information compared to traditional molecular descriptors. |
| URI: | http://hdl.handle.net/10967/272
http://dx.doi.org/10.15152/QDB.272 |
| Date: | 2026-05-27 |
| Name | Description | Format | Size | View |
|---|---|---|---|---|
| 2026CRT.qdb.zip | Thyroid peroxidase inhibition of agrochemicals | application/zip | 274.1Mb | View/ |
| TPO_modelling.7z | Data and script | Unknown | 191.7Mb | View/ |
