Graph Based Kernel k-Means Using Representative Data Points as Initial Centers

Yang, Wuyi; Tang, Liguo

doi:10.1007/978-3-319-22180-9_29

Wuyi Yang¹⁶ &
Liguo Tang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9225))

Included in the following conference series:

International Conference on Intelligent Computing

1794 Accesses

Abstract

The k-means algorithm is undoubtedly the most widely used data clustering algorithm due to its relative simplicity. It can only handle data that are linearly separable. A generalization of k-means is kernel k-means, which can handle data that are not linearly separable. Standard k-means and kernel k-means have the same disadvantage of being sensitive to the initial placement of the cluster centers. A novel kernel k-means algorithm is proposed in the paper. The proposed algorithm uses a graph based kernel matrix and finds k data points as initial centers for kernel k-means. Since finding the optimal data points as initial centers is an NP-hard problem, this problem is relaxed to obtain k representative data points as initial centers. Matching pursuit algorithm for multiple vectors is used to greedily find k representative data points. The proposed algorithm is tested on synthetic and real-world datasets and compared with kernel k-means algorithms using other initialization techniques. Our empirical study shows encouraging results of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Novel Locally Multiple Kernel k-means Based on Similarity

A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information

An Improved Kernel K-means Clustering Algorithm

References

Jain, A.K.: Data clustering, 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Article Google Scholar
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–296 (1967)
Google Scholar
Celebi, E.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the K-means clustering algorithm. Expert Syst. Appl. 40, 200–210 (2013)
Article Google Scholar
Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)
Article Google Scholar
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319 (1998)
Article Google Scholar
Hochbaum, D., Shmoys, D.: A best possible heuristic for the K-center problem. Math. Oper. Res. 10(2), 180–184 (1985)
Article MathSciNet Google Scholar
Ball, G.H., Hall, D.J.: A clustering technique for summarizing multivariate data. Behav. Sci. 12(2), 153–155 (1967)
Article Google Scholar
Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Google Scholar
Hasan, M.A., Chaoji, V., Salem, S., Zaki, M.: Robust partitional clustering by outlier and density insensitive seeding. Pattern Recogn. 30(11), 994–1002 (2009)
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances In Neural Information Processing Systems, pp. 585–591 (2001)
Google Scholar
Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 144–158. Springer, Heidelberg (2003)
Chapter Google Scholar
Velmurugan, T., Santhanam, T.: A survey of partition based clustering algorithms in data mining: an experimental approach. Inf. Technol. J. 10(3), 478–484 (2011)
Article Google Scholar
Yu, K., Bi, J., Tresp, V.: Active learning via transductive experimental design. In: International Conference on Machine Learning (2006)
Google Scholar
Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Sig. Process. 41(12), 3397–3415 (1993)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (11374245).

Author information

Authors and Affiliations

Key Laboratory of Underwater Acoustic Communication and Marine Information Technology of the Minister of Education, Xiamen University, Xiamen, China
Wuyi Yang & Liguo Tang

Authors

Wuyi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Liguo Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wuyi Yang .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
Polytecnic of Bari, Bari, Italy
Vitoantonio Bevilacqua
University of Wollongong, North Wollongong, New South Wales, Australia
Prashan Premaratne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, W., Tang, L. (2015). Graph Based Kernel k-Means Using Representative Data Points as Initial Centers. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Methodologies. ICIC 2015. Lecture Notes in Computer Science(), vol 9225. Springer, Cham. https://doi.org/10.1007/978-3-319-22180-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-22180-9_29
Published: 11 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22179-3
Online ISBN: 978-3-319-22180-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics