University of Tartu, Institute of Chemistry, Molecular Technology (Estonia)MolTech, UTARTUhttp://hdl.handle.net/10967/12024-03-29T11:26:56Z2024-03-29T11:26:56ZPiir, G.; Sild, S.; Maran, U. Interpretable machine learning for the identification of estrogen receptor agonists, antagonists, and binders. Chemosphere 2023.http://hdl.handle.net/10967/2592024-02-08T13:48:08Z2023-09-14T13:44:47ZAn abnormal hormonal activity or exposure to endocrine-disrupting chemicals (EDCs) can cause endocrine system malfunction. Among the many interactions EDCs can affect is the disruption of estrogen signalling, which can lead to adverse health effects such as cancer, osteoporosis, neurodegenerative diseases, cardiovascular disease, insulin resistance, and obesity. Knowing which chemical can act as an EDC is a significant advantage and a practical necessity. New Approach Methodologies (NAM) computational models offer a quick and cost-effective solution for preliminary hazard assessment of chemicals without animal testing. Therefore, a machine learning approach was used to investigate the relationships between estrogen receptor (ER) activity and chemical structure to identify chemicals that can interact with ER. For this purpose, the consolidated in vitro assay data from ToxCast/Tox21 projects was used for developing Random Forest classification models for ER binding, agonists, and antagonists. The overall classification prediction accuracy reaches up to 82%, depending on whether the model predicted agonists, antagonists, or compounds that bind to the active site. Given the imbalance in endocrine disruption data, the derived models are good candidates for deprioritising chemicals and reducing animal testing. The interpretation of theoretical molecular descriptors of the models was consistent with the molecular interactions known in the ligand binding pocket. The estimated class probabilities enabled the analysis of the applicability domain of the developed models and the assessment of the predictions’ reliability, followed by the guidelines for interpreting prediction results. The models are openly accessible and usable at QsarDB.org according to the FAIR (Findable, Accessible, Interoperable, Reusable) principles.
2023-09-14T13:44:47ZKotli, M.; Piir, G.; Maran, U. Pesticide effect on earthworm lethality via interpretable machine learning. Journal of Hazardous Materials 2023http://hdl.handle.net/10967/2582024-03-07T14:47:43Z2023-08-18T08:26:20ZEarthworms are among the most important animals (invertebrates) for soil health. Many chemical substances released into nature for agricultural development, such as pesticides, may have unwanted effects on those organisms. However, it is essential to understand the extent of the impact of chemicals on soil health first and then make the proper decisions for regulatory or commercial purposes. We hypothesize that there is an expressible quantitative structure-activity relationship (QSAR) between the structure of pesticide compounds and the acute toxicity effect of earthworm species Eisenia fetida. The description of this relationship allows for a better assessment of the impact of chemicals on the said earthworm. To describe this relationship, a dataset of chemicals was collected from open-access sources to develop a mathematical model. A novel approach, combining genetic algorithm and Bayesian optimization, was used to select structural features into the model and to optimize model parameters. The final QSAR classification model was created with the Random Forest algorithm and exhibited good prediction Accuracy of 0.78 on training set and 0.80 on test set. The model representation follows FAIR principles and is available on QsarDB.org.
2023-08-18T08:26:20ZOja, M.; Sild, S.; Piir, G.; Maran, U. Intrinsic aqueous solubility: mechanistically transparent data-driven modeling of drug substances. Pharmaceutics 2022, 14, 2248.http://hdl.handle.net/10967/2572024-02-08T09:56:06Z2022-10-12T13:54:19ZIntrinsic aqueous solubility is a foundation property for understanding chemical, technological, pharmaceutical, and environmental behavior of drug substances. Despite years of solubility research, molecular structure-based prediction of the intrinsic aqueous solubility of drug substances is still under active investigation. This paper describes the authors’ systematic data-driven modelling in which two fit-for-purpose training data sets for intrinsic aqueous solubility were collected and curated, and three quantitative structure-property relationships were derived to make predictions for the most recent solubility challenge. All three models are performing well individually, while being mechanistically transparent and easy to understand. Molecular descriptors involved in the models are related to the following key steps in the solubility process: dissociation of the molecule from the crystal, formation of a cavity in the solvent, and insertion of the molecule into the solvent. A consensus modeling approach with these models remarkably improved prediction capability and reduced the number of strong outliers by more than two times. The performance and outliers of the second solubility challenge predictions were analyzed retrospectively. All developed models have been published in the QsarDB repository according to FAIR principles and can be used without restrictions for exploring, downloading, and predictions.
2022-10-12T13:54:19ZToots, K. M.; Sild, S.; Leis, J.; Acree Jr., W. E.; Maran, U. Machine learning Quantitative Structure-Property Relationships as a function of ionic liquid cations for the gas-ionic liquid partition coefficient of hydrocarbons. Int. J. Mol. Sci. 2022, 23, 7534.http://hdl.handle.net/10967/2562024-02-06T15:01:09Z2022-06-22T18:10:20ZIonic liquids (ILs) are known for their unique characteristics as solvents and electrolytes. Therefore new ILs are being developed and adapted as innovative chemical environments for different applications where their properties need to be understood on a molecular level. Computational data driven methods provide means for understanding of properties at molecular level and quantitative structure-property relationships (QSPRs) gives framework for this. This framework is commonly used to study the properties of molecules in ILs as an environment. The opposite situation where the property is considered as a function of the ionic liquid does not exist. The aim of the present study was to supplement this perspective with new knowledge and to develop QSPRs that would allow the understanding of molecular interactions in ionic liquids based on the structure of the cationic moiety. A wide range of applications in electrochemistry, separation and extraction chemistry depend on the partitioning of solutes between the ionic liquid and the surrounding environment that is characterized by the gas-ionic liquid partition coefficient. To model this property as a function of the structure of cationic counterpart the series of ionic liquids were selected with a common bis-(trifluoromethylsulfonyl)-imide anion, [Tf2N]-, for benzene, hexane and cyclohexane. MLR, SVR and GPR machine learning approaches were used to derive data driven models and their performance was compared. The cross-validation coefficients of determination in the range 0.71–0.93 along with other performance statistics indicated strong accuracy of models for all data series and machine learning methods. The analysis and interpretation of descriptors revealed that generally higher lipophilicity and dispersion interaction capability, and lower polarity in the cations induces a higher partition coefficient for benzene, hexane, cyclohexane and hydrocarbons in general. Applicability domain analysis of models concluded no highly influential outliers and the models are applicable to a wide selection of cation families with variable size, polarity, and aliphatic or aromatic nature.
2022-06-22T18:10:20Z