Chinese Journal of Pharmacovigilance ›› 2022, Vol. 19 ›› Issue (4): 390-394.
DOI: 10.19803/j.1672-8629.2022.04.10

Previous Articles     Next Articles

Clustering analysis of natural products derived from Polygonum multiflorum Thunb. based on spectral clustering algorithm

HU Xiaowen, YANG Jianbo, WEI Feng, MA Shuangcheng*   

  1. National Institutes for Food and Drug Control, Beijing 102629, China
  • Received:2021-07-28 Online:2022-04-15 Published:2022-04-15

Abstract: Objective To establish a proper method for clustering natural products by using the spectral clustering algorithm and compounds derived from Polygonum multiflorum Thunb. Methods Major categories of compounds including stilbenes and anthraquinones that originated from Polygonum multiflorum Thunb. were collected from the literature and converted into the simplified molecular input line entry specification (SMILES). Extended-Connectivity Fingerprints and physicochemical properties were extracted and filtered by variance before the spectral clustering algorithm was used for clustering. The Calinski Harabaz (CH) score was employed for the parameter optimization of the spectral cluster. The optimal method was applied to the natural products and the features of each class were analyzed. Principal component analysis of the three main categories was carried out to visualize the spatial distribution. Finally, the topological polar surface area (TPSA) and lipid-water partition coefficient (LogP) of the main compounds were calculated, and the feature distribution of the properties was analyzed. Results A total of 123 natural products of thirteen categories were collected from the literature. After feature calculation and removal of features with near-zero variance, 207 valid features were obtained. The spectral clustering algorithm achieved the highest CH score when the number of clusters was set at 10 and γ set at 0.004. Principal component analysis showed that three major classes were clustered individually in 3-dimentional space. Besides, and that the distribution of TPSA and LogP tended to be centralized. Conclusion The spectral clustering algorithm can not only distinguish the compounds with unique structures, but also have a better performance for complex compounds in Polygonum multiflorum. These results provide novel ideas for screening of natural products.

Key words: Polygonum multiflorum Thunb., natural products, unsupervised learning, clustering algorithm, spectral clustering

CLC Number: