University of Tartu, Institute of Chemistry, Molecular Technology (Estonia)

University of Tartu, Institute of Chemistry, Molecular Technology (Estonia) MolTech, UTARTU http://hdl.handle.net/10967/1 2026-07-26T22:44:49Z 2026-07-26T22:44:49Z Käärik, M.; Arulepp, M.; Maran, U.; Leis, J. Machine learning-assisted QnSPR study of structurally diverse nanoporous carbon materials and their capacitive behavior in dilute ionic liquid electrolyte. J. Mater. Sci. 2026, Published. http://hdl.handle.net/10967/275 2026-07-13T08:57:15Z 2026-07-07T07:37:13Z

The growing global demand for efficient energy storage has highlighted the need to discover and establish relationships and rules that link the electric double-layer (EDL) capacitance to the structural properties of the materials used in electrodes. Among these materials, nanoporous carbon stands out for its high microporosity and precisely adjustable pore size distribution, both of which play a crucial role in influencing EDL performance. At the same time, machine learning (ML) has emerged as a toolbox in materials science, enabling the prediction and optimization of various application-related properties based on experiment-derived structure and surface characteristics. In this study, the ML method was applied to 67 nanoporous carbon materials (carbide-derived carbons (CDCs)), with different structures and textures. Their EDL capacitance was then modeled under both positive and negative polarization in an electrolyte whose ions had an asymmetric, i.e., extremely non-spherical geometric structure. The structural and textural properties of the materials in the dataset were described using experimental descriptors of carbon materials derived from nitrogen and carbon dioxide adsorption measurements. These descriptors were used as model inputs, while the target property, EDL capacitance, was measured in three-electrode cells using 1.9 M EMIm-TFSI in ACN as electrolyte. The ML modeling results demonstrate that combining experimentally derived structural descriptors, such as specific surface area, ion size-related volume fraction of pore size distribution, and bulk density of CDC electrodes, in one quantitative nanostructure–property relationship (QnSPR), enables accurate prediction of specific volumetric capacitance for both volumetric cathodic capacitance (R2 = 0.93) and volumetric anodic capacitance (R2 = 0.94) using multiple linear regression. The textural descriptors in the models indicate that the most effective pore size range for electrosorption is consistent with the smallest dimension of the ions, which is 0.4–0.5 nm to accommodate the EMIm+ cation, while for the slightly larger but highly asymmetric TFSI− anion, it is below 0.4 nm.

2026-07-07T07:37:13Z Piir, G.; Sild, S.; Spilioti, E.; Nikolopoulou, D.; Katsanou, E.; Langezaal, I.; Maran, U. Classification of Thyroid Peroxidase (TPO) Inhibitors Using Transfer Learning with SMILES Embeddings. Chemical Research in Toxicology 2026. http://hdl.handle.net/10967/272 2026-06-05T16:01:50Z 2026-05-27T17:25:57Z

Thyroid hormones (THs) regulate many processes in mammals and, therefore, affect every organ in the body. Thyroid peroxidase (TPO) is an essential enzyme for the successful biosynthesis of THs. Although TPO inhibition is a well-documented molecular initiating event (MIE) in thyroid hormone system disruption adverse outcome pathways (AOPs), experimental methods and computational models to assess TPO activity are lacking. Efficient computational new approach methodologies (NAMs) are a viable solution for identifying TPO inhibitors from a large pool of agrochemicals. The aim of this study was to investigate the suitability of SMILES embeddings generated using a specialized language model (SLM) based on a pretrained deep neural network (DNN) for applying a transfer learning approach in the development of quantitative structure−activity relationships for classifying TPO inhibitors. Traditional theoretical molecular descriptors were used for comparison. Two different molecular descriptor sets resulted in Random Forest (RF) models that performed similarly on the training and test sets, while the sensitivity for the external validation set was substantially different between the two models (0.788 vs 0.490). Comparison of the predictions with the TPO inhibition data of the chemicals assessed by EFSA and EU-NETVAL laboratories showed good agreement. At the same time, analysis of experimental data from other sources showed some conflicting estimates. This suggests that further and more precise studies are needed for some compounds. This study advances in silico methodologies by implementing transfer learning for QSAR modeling from text representations (e.g., SMILES) using the pretrained Bidirectional Encoder Representations from Transformers (BERT) architecture. While traditional QSAR approach relies on molecular descriptors, this evaluation shows that model-generated SMILES embeddings can expand the applicability domain, indicating a more robust representation of structural information compared to traditional molecular descriptors.

2026-05-27T17:25:57Z Akinola, L. K.; Uzairu, A.; Shallangwa, G. A.; Abechi, S. E. Development of binary classification models for grouping hydroxylated polychlorinated biphenyls into active and inactive thyroid hormone receptor agonists. SAR and QSAR in Environmental Research 2023, 34, 267–284. http://hdl.handle.net/10967/269 2025-05-30T15:24:49Z 2025-05-07T13:27:28Z

Some adverse effects of hydroxylated polychlorinated biphenyls (OH-PCBs) in humans are presumed to be initiated via thyroid hormone receptor (TR) binding. Due to the trial-and-error approach adopted for OH-PCB selection in previous studies, experiments designed to test the TR binding hypothesis mostly utilized inactive OH-PCBs, leading to considerable waste of time, effort and other material resources. In this paper, linear discriminant analysis (LDA) and binary logistic regression (LR) were used to develop classification models to group OH-PCBs into active and inactive TR agonists using radial distribution function (RDF) descriptors as predictor variables. The classifications made by both LDA and LR models on the training set compounds resulted in an accuracy of 84.3%, sensitivity of 72.2% and specificity of 90.9%. The areas under the ROC curves, constructed with the training set data, were found to be 0.872 and 0.880 for LDA and LR models, respectively. External validation of the models revealed that 76.5% of the test set compounds were correctly classified by both LDA and LR models. These findings suggest that the two models reported in this paper are good and reliable for classifying OH-PCB congeners into active and inactive TR agonists.

2025-05-07T13:27:28Z Toots, K. M.; Sild, S.; Leis, J.; Acree, W. E.; Maran, U. A multicomponent QSPR approach to describe and predict gas-ionic liquid distribution of organic solutes using machine learning. J. Mol. Liq. 2025, 436, 128184. http://hdl.handle.net/10967/266 2025-07-29T15:03:36Z 2025-03-17T11:45:47Z

Ionic liquids are known as green solvents, which makes accurate prediction of gas–ionic liquid partition coefficients (log K) important from the perspective of various industrial applications. A gas–ionic liquid is a multicomponent system, but is usually modelled by the structural properties of one component, the solute. The integration of structural descriptors of all three components, solute, cation, and anion, into a single computational model has not been achieved. To do this, a machine learning approach was applied to a large collected dataset consisting of 6,531 experimental log K values, including data series for 170 solutes and 138 ionic liquids. The Multiple Linear Regression (MLR) and Random Forest Regression (RF) approaches were compared, both of which applied stepwise forward descriptor selection. The best MLR model achieved a cross-validated coefficient of determination (Rcv2) of 0.795 and an external validation coefficient of determination (R2) of 0.801, while the RF model demonstrated significant increase in performance with cross-validated Rcv2 of 0.965 and external validation R2 of 0.957. The descriptors included in the models showed that the description and prediction of log K is significantly improved when structural properties of all three components of the system (solute, cation, and anion) are taken into account. When comparing the linear and non-linear RF models, the presence of molecular descriptors of different components was significantly increased in the latter. The molecular descriptors in the models highlighted the roles of dispersion forces, dipolar interactions, and hydrogen bonding in solute–ionic liquid partitioning. The study provides thoroughly-analyzed predictive models for estimating gas–ionic liquid partition coefficients and provides structure-level insights into solute–ionic-liquid interactions, facilitating the rational design of ionic liquids and expanding the range of solutes for various applications.

2025-03-17T11:45:47Z