基于谱聚类的何首乌天然产物聚类分析

doi:10.19803/j.1672-8629.2022.04.10

摘要/Abstract

摘要： 目的对何首乌中的天然产物进行聚类分析,建立一种较为科学的天然产物聚类方法,为后续化合物挑选、药理筛选提供技术指导。方法从文献中收集并整理何首乌天然产物,选择二苯乙烯类、蒽醌类等主要类别化合物作为聚类对象,转换为简化分子线性输入规范（SMILES）,并使用rdkit提取化合物的扩展连通性指纹和理化性质作为特征,经过方差筛选得到有效的特征。使用谱聚类算法,对何首乌天然产物进行聚类,以Calinski Harabaz（CH）指数作为评估指标,优化聚类参数。采用优化后最佳参数对化合物进行聚类,分析各类别的特点。随后对3种主要类别的化合物进行主成分分析,查看主要类别的空间分布。最后对主要类别化合物分别计算脂水分配系数和拓扑极性表面积,分析性质分布,验证聚类合理性。结果从文献中挑选13个类别的123个何首乌天然产物。经过特征提取和过滤,共得到207个特征。CH指数表明聚类数量为10,γ为0.004时聚类效果最佳。主成分分析显示3个主要成分组在空间中各自成簇,无重叠情况发生。经过聚类后,脂水分配系数和拓扑极性表面积2个指标倾向更加集中。结论谱聚类算法不仅能够区分何首乌天然产物中差异较大的化合物,也能较好地对复杂化合物进行聚类,聚类结果具有一定的合理性,能够为传统药理筛选提供新的思路。

关键词: 何首乌, 天然产物, 无监督学习, 聚类分析, 谱聚类

Abstract: Objective To establish a proper method for clustering natural products by using the spectral clustering algorithm and compounds derived from Polygonum multiflorum Thunb. Methods Major categories of compounds including stilbenes and anthraquinones that originated from Polygonum multiflorum Thunb. were collected from the literature and converted into the simplified molecular input line entry specification (SMILES). Extended-Connectivity Fingerprints and physicochemical properties were extracted and filtered by variance before the spectral clustering algorithm was used for clustering. The Calinski Harabaz (CH) score was employed for the parameter optimization of the spectral cluster. The optimal method was applied to the natural products and the features of each class were analyzed. Principal component analysis of the three main categories was carried out to visualize the spatial distribution. Finally, the topological polar surface area (TPSA) and lipid-water partition coefficient (LogP) of the main compounds were calculated, and the feature distribution of the properties was analyzed. Results A total of 123 natural products of thirteen categories were collected from the literature. After feature calculation and removal of features with near-zero variance, 207 valid features were obtained. The spectral clustering algorithm achieved the highest CH score when the number of clusters was set at 10 and γ set at 0.004. Principal component analysis showed that three major classes were clustered individually in 3-dimentional space. Besides, and that the distribution of TPSA and LogP tended to be centralized. Conclusion The spectral clustering algorithm can not only distinguish the compounds with unique structures, but also have a better performance for complex compounds in Polygonum multiflorum. These results provide novel ideas for screening of natural products.

Key words: Polygonum multiflorum Thunb., natural products, unsupervised learning, clustering algorithm, spectral clustering

中图分类号:

R917

胡笑文, 杨建波, 魏锋, 马双成. 基于谱聚类的何首乌天然产物聚类分析[J]. 中国药物警戒, 2022, 19(4): 390-394.

HU Xiaowen, YANG Jianbo, WEI Feng, MA Shuangcheng. Clustering analysis of natural products derived from Polygonum multiflorum Thunb. based on spectral clustering algorithm[J]. Chinese Journal of Pharmacovigilance, 2022, 19(4): 390-394.

参考文献

[1] YANG JB, GAO HY, SONG YF, et al.Advances in understanding the metabolites and metabolomics of polygonum multiflorum thunb: a mini-review[J]. Current Drug Metabolism, 2021, 22(3): 165-172.
[2] TEKA T, WANG L, GAO J, et al.Polygonum multiflorum: recent updates on newly isolated compounds, potential hepatotoxic compounds and their mechanisms[J]. Journal of Ethnopharmacology, 2021, 271: 113864.
[3] ZHANG Q, XU Y, LV J, et al.Structure characterization of two functional polysaccharides from polygonum multiflorum and its immunomodulatory[J]. International Journal of Biological Macromolecules, 2018, 113: 195-204.
[4] YANG JB, YE F, TIAN JY, et al.Multiflorumisides HK, stilbene glucosides isolated from polygonum multiflorum and their in vitro PTP1B inhibitory activities[J]. Fitoterapia, 2020, 146: 104703.
[5] RAO T, LIU YT, ZENG XC, et al.The hepatotoxicity of polygonum multiflorum: The emerging role of the immune-mediated liver injury[J]. Acta Pharmacologica Sinica, 2021, 42(1): 27-35.
[6] LI HY, YANG JB, LI WF, et al.In vivo hepatotoxicity screening of different extracts, components, and constituents of Polygoni Multiflori Thunb. in zebrafish (Danio rerio) larvae[J]. Biomedicine & Pharmacotherapy, 2020, 131: 110524.
[7] LIN Y, XIAO R, XIA BH, et al.Investigation of the idiosyncratic hepatotoxicity of Polygonum multiflorum Thunb. through metabol-omics using GC-MS[J]. BMC Complementary Medicine and Therapies, 2021, 21(1): 120.
[8] LI C, RAO T, CHEN X, et al.HLA-B*35:01 allele is a potential biomarker for predicting polygonum multiflorum-induced liver injury in humans[J]. Hepatology, 2019, 70(1): 346-357.
[9] YANG JB, SONG YF, LIU Y, et al.UHPLC-QQQ-MS/MS assay for the quantification of dianthrones as potential toxic markers of Polygonum multiflorum Thunb: applications for the standardization of traditional Chinese medicines (TCMs) with endogenous toxicity[J]. Chinese Medicine, 2021, 16(1): 51.
[10] RIFAIOGLU AS, ATAS H, MARTIN MJ, et al.Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases[J]. Briefings in Bioinformatics, 2019, 20(5): 1878-1912.
[11] BANERJEE P, EREHMAN J, GOHLKE BO, et al.Super Natural II-a database of natural products[J]. Nucleic Acids Research, 2015, 43: D935-939.
[12] ULTSCH A, LOTSCH J.Machine-learned cluster identification in high-dimensional data[J]. Journal of Biomedical Informatics, 2017, 66: 95-104.
[13] EUN JW, BAE HJ, SHEN Q, et al.Characteristic molecular and proteomic signatures of drug-induced liver injury in a rat model[J]. Journal of Applied Toxicology, 2015, 35(2): 152-64.
[14] GAN S, COSGROVE DA, GARDINER EJ, et al.Investigation of the use of spectral clustering for the analysis of molecular data[J]. Journal of Chemical Information and Modeling, 2014, 54(12): 3302-3319.
[15] AWALE M, REYMOND JL.A multi-fingerprint browser for the ZINC database[J]. Nucleic Acids Research, 2014, 42: W234-239.
[16] WANG Y, XIAO Q, CHEN P, et al.In silico prediction of drug-induced liver injury based on ensemble classifier method[J]. Intern-ational Journal of Molecular Sciences, 2019, 20(17): 4106-4119.
[17] ALONSO-BETANZOS A, BOLON-CANEDO V.Big-data analysis, cluster analysis, and machine-learning approaches[J]. Advances in Experimental Medicine and Biology, 2018, 1065: 607-626.
[18] CHEN WY, SONG Y, BAI H, et al.Parallel spectral clustering in distributed systems[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(3): 568-86.
[19] ADEFIOYE AA, LIU X, DE MOOR B.Multi-view spectral clustering and its chemical application[J]. International Journal of Computational Biology and Drug Design, 2013, 6(1-2): 32-49.
[20] CHEN M, BORLAK J, TONG W.High lipophilicity and high daily dose of oral medications are associated with significant risk for drug-induced liver injury[J]. Hepatology, 2013, 58(1): 388-396.