<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
<title>Original publications</title>
<link href="http://hdl.handle.net/10967/105" rel="alternate"/>
<subtitle>University of Tartu (Estonia), Institute of Chemistry, Molecular Technology</subtitle>
<id>http://hdl.handle.net/10967/105</id>
<updated>2026-04-15T16:53:31Z</updated>
<dc:date>2026-04-15T16:53:31Z</dc:date>
<entry>
<title>Toots, K. M.; Sild, S.; Leis, J.; Acree, W. E.; Maran, U. A multicomponent QSPR approach to describe and predict gas-ionic liquid distribution of organic solutes using machine learning. J. Mol. Liq. 2025, 436, 128184.</title>
<link href="http://hdl.handle.net/10967/266" rel="alternate"/>
<author>
<name/>
</author>
<id>http://hdl.handle.net/10967/266</id>
<updated>2025-07-29T15:03:36Z</updated>
<published>2025-03-17T11:45:47Z</published>
<summary type="text">Ionic liquids are known as green solvents, which makes accurate prediction of gas–ionic liquid partition coefficients (log K) important from the perspective of various industrial applications. A gas–ionic liquid is a multicomponent system, but is usually modelled by the structural properties of one component, the solute. The integration of structural descriptors of all three components, solute, cation, and anion, into a single computational model has not been achieved. To do this, a machine learning approach was applied to a large collected dataset consisting of 6,531 experimental log K values, including data series for 170 solutes and 138 ionic liquids. The Multiple Linear Regression (MLR) and Random Forest Regression (RF) approaches were compared, both of which applied stepwise forward descriptor selection. The best MLR model achieved a cross-validated coefficient of determination (Rcv2) of 0.795 and an external validation coefficient of determination (R2) of 0.801, while the RF model demonstrated significant increase in performance with cross-validated Rcv2 of 0.965 and external validation R2 of 0.957. The descriptors included in the models showed that the description and prediction of log K is significantly improved when structural properties of all three components of the system (solute, cation, and anion) are taken into account. When comparing the  linear and non-linear RF models, the presence of molecular descriptors of different components was significantly increased in the latter. The molecular descriptors in the models highlighted the roles of dispersion forces, dipolar interactions, and hydrogen bonding in solute–ionic liquid partitioning. The study provides thoroughly-analyzed predictive models for estimating gas–ionic liquid partition coefficients and provides structure-level insights into solute–ionic-liquid interactions, facilitating the rational design of ionic liquids and expanding the range of solutes for various applications.
</summary>
<dc:date>2025-03-17T11:45:47Z</dc:date>
</entry>
<entry>
<title>Kotli, M.; Piir, G.; Maran, U. Predictive Modeling of Pesticides Reproductive Toxicity in Earthworms Using Interpretable Machine-Learning Techniques on Imbalanced Data. ACS Omega 2025, 10, 4732–4744.</title>
<link href="http://hdl.handle.net/10967/263" rel="alternate"/>
<author>
<name/>
</author>
<id>http://hdl.handle.net/10967/263</id>
<updated>2025-02-20T19:27:42Z</updated>
<published>2024-10-09T12:03:43Z</published>
<summary type="text">The earthworm is a key indicator species in soil ecosystems. This makes the reproductive toxicity of chemical compounds to earthworms a desired property of determination and makes computational models necessary for descriptive and predictive purposes. Thus, the aim was to develop an advanced Quantitative Structure–Activity Relationship modeling approach for this complex property with imbalanced data. The approach integrated gradient-boosted decision trees as classifiers with a genetic algorithm for feature selection and Bayesian optimization for hyperparameter tuning. An additional goal was to analyze and interpret, using SHAP values, the structural features encoded by the molecular descriptors that contribute to pesticide toxicity and nontoxicity, the most notable of which are solvation entropy and a number of hydrolyzable bonds. The final model was constructed as a stacked ensemble of models and combined the strengths of the individual models. Evaluation of this model with an external test set of 147 compounds demonstrated a well-defined applicability domain and sufficient predictive capabilities with a Balanced Accuracy of 77%. The model representation follows FAIR principles and is available on QsarDB.org.
</summary>
<dc:date>2024-10-09T12:03:43Z</dc:date>
</entry>
<entry>
<title>Toots, K. M.; Sild, S.; Leis, J.; Acree, W. E.; Maran, U. Exploring the influence of ionic liquid anion structure on gas-ionic liquid partition coefficients of organic solutes using machine learning. Langmuir, 2024, 40, 23714–23728</title>
<link href="http://hdl.handle.net/10967/262" rel="alternate"/>
<author>
<name/>
</author>
<id>http://hdl.handle.net/10967/262</id>
<updated>2025-05-30T10:53:24Z</updated>
<published>2024-09-02T14:39:47Z</published>
<summary type="text">This article presents an in-depth investigation into the influence of anionic structures of ionic liquids (ILs) on gas-ionic liquid partition coefficients (log K) of organic solutes in three ILs. While the primary objective was to examine whether there is a relationship between the molecular structure of the IL anion component and log K, additionally it was looked at whether the molecular descriptors of the anion in the relationships encode possible molecular interactions during the miscibility and partitioning in IL. The research involves the compilation of data series of experimental log K values, where the cation component is constant. Such representative data series were obtained for three solutes — benzene, cyclohexane, and methanol — in three ILs with a uniform cationic component of methyl imidazolium. Using multiple linear regression models enhanced with machine learning techniques, the relationship between anionic structures and log K values was successfully quantified and modeled. Systematically selected molecular descriptors describing the anion structure show that in the case of methanol log K is strongly dependent on hydrogen bonds and Coulomb-dipolar interactions with the anion component, while in the case of benzene and cyclohexane the dispersion forces of the anion component are dominant. The outlier analysis and data interpretation highlight the need for extensive experimental data. The results confirm the initial hypothesis and provide valuable information on the role of the structure of anionic component in determining the partitioning behavior of organic solutes. This knowledge is important for the design and optimization of ILs for specific applications, particularly as solvents in various industrial processes. The research also provides useful information about molecular interactions taking place in the interfaces of IL and organic additives in complex liquid media such as multicomponent electrolyte solutions, for example in energy storage applications.
</summary>
<dc:date>2024-09-02T14:39:47Z</dc:date>
</entry>
<entry>
<title>Zukić, S.; Osmanović, A.; Harej, A.; Kraljević Pavelić, S.; Špirtović-Halilović, S.; Veljović, E.; Roca, S.; Trifunović, S.; Završnik, D.; Maran, U. Data driven modelling of substituted pyrimidine and uracil-based derivatives validated with newly synthesized and antiproliferative evaluated compounds. Int. J. Mol. Sci. 2024, 25, 9390</title>
<link href="http://hdl.handle.net/10967/261" rel="alternate"/>
<author>
<name/>
</author>
<id>http://hdl.handle.net/10967/261</id>
<updated>2025-05-30T10:52:15Z</updated>
<published>2024-08-21T11:19:26Z</published>
<summary type="text">The pyrimidine heterocycle plays an important role in anticancer research. In particular, the py-rimidine derivative families of uracil show promise as structural scaffolds relevant to cervical cancer. This group of chemicals lacks data-driven machine learning QSAR models that allow for generalization and predictive capabilities in the search for new active compounds. To achieve this, a dataset of pyrimidine and uracil compounds from ChEMBL has been collected and curated. A workflow was developed for data-driven machine learning QSAR using intuitive dataset design and forwards selection of molecular descriptors. The model was thoroughly externally validated against available data. Blind validation was also performed by synthesis and antiproliferative evaluation of new synthesized uracil-based and pyrimidine derivatives. The most active com-pound among new synthesized derivatives, 2,4,5-trisubstituted pyrimidine was predicted with QSAR model with differences of 0.02 compared to experimentally tested activity.
</summary>
<dc:date>2024-08-21T11:19:26Z</dc:date>
</entry>
</feed>
