Abstract
Cognitive computing involves discovering hidden rules and patterns in massive volumes of data. Density peaks clustering (DPC) is a powerful data mining tool that can identify density peaks in decision graphs and assign labels to them without requiring iterations. It can efficiently and simply detect clusters of arbitrary shapes. However, on the one hand, density measurement using the ϵ neighbor or Gaussian kernel only reflects the global structure of the data, so that correct density peaks cannot be found, and performance on manifold datasets is weakened. On the other hand, the one-step allocation strategy results in chain reaction. Once a point with high density is misallocated, a series of points will be incorrectly assigned. To solve this problem, this paper proposes the Jaccard coefficient to measure the similarity between points. The proposed density measurement based on Jaccard coefficient is only related to the k points that share the max similarity with the given point, which can reflect the local structure of manifold datasets, and the density peaks can be identified accurately. Aiming at the chain reaction caused by the assignment strategy of DPC, we develop a two-step allocation strategy based on label propagation and the proposed measurement of similarity. The first step is to assign labels to points close to the clustering centers, where these are equal to labeled points in the label propagation algorithm. The second step is to complete the assignment of labels to the remaining points according to labeled data which is the nearest to each unassigned sample. We compared the proposed algorithm with four algorithms on synthetic datasets and real-world datasets. The three metrics among these algorithms show that the proposed algorithm outperforms other algorithms. The results of clustering on synthetic datasets verified the effectiveness of the proposed method for manifold datasets, and three metrics on the UCI datasets and the Olivetti Faces dataset show that it can reveal the patterns and associations of real-world datasets.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Jiu MY, Wolf C, Garcia C, et al. Supervised learning and codebook optimization for bag-of-words models. Cogn Comput. 2012;4(4):409–19.
Jia H, Ding S, Du M. Self-tuning p-spectral clustering based on shared nearest neighbors. Cogn Comput. 2015;7(5):622–32.
Wang H, Yang Y, Liu B, Fujita H. A study of graph-based system for multi-view clustering. Knowl-Based Syst. 2019;163:1009–19.
Tan PN, Steinbach M, Kumar V. Introduction to data mining. Pearson Education India; 2016.
Aggarwal CC, Reddy CK. Data clustering: algorithms and applications. Chapman & Hall/CRC; 2013.
Shi Y, Otto C, Jain AK. Face clustering: representation and pairwise constraints. IEEE T Inf Foren Sec. 2018;13(7):1626–40.
Li Z, Zheng Y, Cao L, Jiao L, Zhang C. A Student’s t-based density peaks clustering with superpixel segmentation (tDPCSS) method for image color clustering. Color Res Appl. 2020;(2). https://doi.org/10.1002/col.22491
Zeng X, Chen A, Zhou M. Color perception algorithm of medical images using density peak based hierarchical clustering. Biomed Signal Proces. 2019;48:69–79.
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.
Ivannikova E, Park H, Hämäläinen T, Lee K. Revealing community structures by ensemble clustering using group diffusion. Inform Fusion. 2018;42:24–36.
Chang MS, Chen LH, Hung LJ, Rossmanith P, Wu GH. Exact algorithms for problems related to the densest k-set problem. Inform Process Lett. 2014;114(9):510–3.
Zhang H, Zhou A, Song S, Zhang Q, Gao XZ, Zhang J. A self-organizing multiobjective evolutionary algorithm. IEEE Trans Evol Comput. 2016;20(5):792–806.
Luo J, Gu F. An adaptive niching-based evolutionary algorithm for optimizing multi-modal function. Int J Pattern Recogn. 2016;30(03):1–19.
Macqueen J, Some methods for clarification and analysis of multi variate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967; p.281–97.
Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd. 1996;96(34):226–31.
Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;315(5814):972–6.
Wang W, Yang J, Muntz R. STING: a statistical information grid approach to spatial data mining. Proceedings of 23rd International Conference on Very Large Data Bases, 1997; p.186–95.
Zhang T, Ramakrishnan R, Livny M. BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Disc. 1997;1(2):141–82.
Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science. 2014;344(6191):1492–6.
Heimerl F, John M, Han Q, Koch S, Ertl T. DocuCompass: effective exploration of document landscapes. IEEE Conference on Visual Analytics Science and Technology, 2016; p.11–20.
Wang B, Zhang J, Ding F, Zou Y, editors. Multi-document news summarization via paragraph embedding and density peak clustering. 2017 International Conference on Asian Language Processing, 2017; p.260–3.
Xu M, Li Y, Li R, Zou F, Gu X. EADP: An extended adaptive density peaks clustering for overlapping community detection in social networks. Neurocomputing. 2019;337:287–302.
Kuhrova P, Best RB, Bottaro S, Bussi G, Sponer J, Otyepka M, et al. Computer folding of RNA tetraloops: identification of key force field deficiencies. J Chem Theory Comput. 2016;12(9):4534–48.
Chen J, Li K, Hans S, Rong H, Moore J, et al. A disease diagnosis and treatment recommendation system based on big data mining and cloud computing. Inform Sciences. 2018;435:124–49.
Chen Y, Lai D, Qi H, Wang J, Du J. A new method to estimate ages of facial image for large database. Multimed Tools Appl. 2016;75(5):2877–95.
Shi Y, Chen Z, Qi Z, Meng F, Cui L. A novel clustering-based image segmentation via density peaks algorithm with mid-level feature. Neural Comput Appl. 2017;28(1):29–39.
Jia S, Tang G, Zhu J, Li Q. A novel ranking-based clustering approach for hyperspectral band selection. IEEE T Geosci Remote. 2015;54(1):88–102.
Sun K, Geng X, Ji L. Exemplar component analysis: a fast band selection method for hyperspectral imagery. IEEE Geosci Remote S. 2014;12(5):998–1002.
Xu X, Ding S, Du M, Xue Y. DPCG: an efficient density peaks clustering algorithm based on grid. Int J Mach Learn Cyb. 2018;9(5):743–54.
Li M, Huang J, Wang J, editors. Paralleled fast search and find of density peaks clustering algorithm on GPUs with CUDA. 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2016; p.313–8.
Li T, Ge H, Su S. Density peaks clustering by automatic determination of cluster centers. J Front Comput Sci Technol. 2016;10(11):1614–22.
Xu J, Wang G, Deng W. DenPEHC: Density peak based efficient hierarchical clustering. Inform Sciences. 2016;373:200–18.
Du M, Ding S, Jia H. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst. 2016;99:135–45.
Xie J, Gao H, Xie W, Liu X, Grant PW. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors. Inform Sciences. 2016;354:19–40.
Xu X, Ding S, Wang L, Wang Y. A robust density peaks clustering algorithm with density-sensitive similarity. Knowl-Based Syst. 2020;200:1–11.
Hennig C, Hausdorf B. Design of dissimilarity measures: a new dissimilarity between species distribution areas: Springer Berlin Heidelberg; 2006. pp.29–37.
Vinh NX, Epps J, Bailey J. Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res. 2010;11:2837–54.
Fowlkes EB, Mallows CL. A method for comparing two hierarchical clusterings. J Am Stat Assoc. 1983;78(383):553–69.
Funding
This work was supported by National Natural Science Foundation of China (21606159, 62176176).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Qin, X., Han, X., Chu, J. et al. Density Peaks Clustering Based on Jaccard Similarity and Label Propagation. Cogn Comput 13, 1609–1626 (2021). https://doi.org/10.1007/s12559-021-09906-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-021-09906-w