Abstract
Feature selection has improved the performance of text clustering. Global feature selection tries to identify a single subset of features which are relevant to all clusters. However, the clustering process might be improved by considering different subsets of features for locally describing each cluster. In this work, we introduce the method ZOOM-IN to perform local feature selection for partitional hierarchical clustering of text collections. The proposed method explores the diversity of clusters generated by the hierarchical algorithm, selecting a variable number of features according to the size of the clusters. Experiments were conducted on Reuters collection, by evaluating the bisecting K-means algorithm with both global and local approaches to feature selection. The results of the experiments showed an improvement in clustering performance with the use of the proposed local method.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dhillon, I., Kogan, J., Nicholas, C.: Feature selection and document clustering. In: Berry, M.W. (ed.) Survey of Text Mining, pp. 73–100 (2003)
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889 (2004)
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: ICML 1997: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 170–178 (1997)
Lewis, D.D.: Reuters-21578 text categorization test collection distribution 1.0 (1999), http://www.daviddlewis.com
Li, Y., Dong, M., Hua, J.: Localized feature selection for clustering. Pattern Recognition Letters 29(1), 10–18 (2008)
Slonim, N., Friedman, N., Tishby, N.: Unsupervised document classification using sequential information maximization. In: Proceedings of the 25th International ACM SIGIR Conference, pp. 129–136 (2002)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. Technical Report, Department of Computer Science and Engineering, University of Minnesota (2000)
Tang, B., Shepherd, M., Milios, E., Heywood, M.I.: Comparing and combining dimension reduction techniques for efficient text clustering. In: International Workshop on Feature Selection for Data Mining (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ribeiro, M.N., Neto, M.J.R., Prudêncio, R.B.C. (2009). Local Feature Selection in Text Clustering. In: Köppen, M., Kasabov, N., Coghill, G. (eds) Advances in Neuro-Information Processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03040-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-03040-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03039-0
Online ISBN: 978-3-642-03040-6
eBook Packages: Computer ScienceComputer Science (R0)