Abstract
The k-means algorithm is undoubtedly the most widely used data clustering algorithm due to its relative simplicity. It can only handle data that are linearly separable. A generalization of k-means is kernel k-means, which can handle data that are not linearly separable. Standard k-means and kernel k-means have the same disadvantage of being sensitive to the initial placement of the cluster centers. A novel kernel k-means algorithm is proposed in the paper. The proposed algorithm uses a graph based kernel matrix and finds k data points as initial centers for kernel k-means. Since finding the optimal data points as initial centers is an NP-hard problem, this problem is relaxed to obtain k representative data points as initial centers. Matching pursuit algorithm for multiple vectors is used to greedily find k representative data points. The proposed algorithm is tested on synthetic and real-world datasets and compared with kernel k-means algorithms using other initialization techniques. Our empirical study shows encouraging results of the proposed algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jain, A.K.: Data clustering, 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–296 (1967)
Celebi, E.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the K-means clustering algorithm. Expert Syst. Appl. 40, 200–210 (2013)
Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319 (1998)
Hochbaum, D., Shmoys, D.: A best possible heuristic for the K-center problem. Math. Oper. Res. 10(2), 180–184 (1985)
Ball, G.H., Hall, D.J.: A clustering technique for summarizing multivariate data. Behav. Sci. 12(2), 153–155 (1967)
Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Hasan, M.A., Chaoji, V., Salem, S., Zaki, M.: Robust partitional clustering by outlier and density insensitive seeding. Pattern Recogn. 30(11), 994–1002 (2009)
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances In Neural Information Processing Systems, pp. 585–591 (2001)
Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 144–158. Springer, Heidelberg (2003)
Velmurugan, T., Santhanam, T.: A survey of partition based clustering algorithms in data mining: an experimental approach. Inf. Technol. J. 10(3), 478–484 (2011)
Yu, K., Bi, J., Tresp, V.: Active learning via transductive experimental design. In: International Conference on Machine Learning (2006)
Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Sig. Process. 41(12), 3397–3415 (1993)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (11374245).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yang, W., Tang, L. (2015). Graph Based Kernel k-Means Using Representative Data Points as Initial Centers. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Methodologies. ICIC 2015. Lecture Notes in Computer Science(), vol 9225. Springer, Cham. https://doi.org/10.1007/978-3-319-22180-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-22180-9_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22179-3
Online ISBN: 978-3-319-22180-9
eBook Packages: Computer ScienceComputer Science (R0)