Abstract
Clustering is a technique that group data together based on their similarity and apart based on their dissimilarity. When this technique is applied to documents and the terms within these documents retrieval of similar documents become easy and efficient. Document clustering is being researched and utilized for many years but is yet far from being optimal. To study and analyze different document clustering algorithm, a theoretical literature review and analysis was performed and the results are presented in this paper. This paper comprises of theoretical review of papers. 95 papers were identified and out of these 30 were selected. Various techniques or algorithms and modifications to previous algorithms proposed for document clustering by various researchers are compiled and presented with the intent that it will aid the researchers in finding out the current and future scope of research in information retrieval systems and document clustering technologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Handa, R., Rama Krishna, C., Aggarwal, N.: Document clustering for efficient and secure information retrieval from cloud. Concurr. Comput. Pract. Exp. e5127
Anbarasi, M.S., et al.: Ontology oriented concept-based clustering. IJRET Int. J. Res. Eng. Technol. 3(2) (2014)
Sedding, J., Kazakov, D.: WordNet-based text document clustering. In: Proceedings of the 3rd Workshop on Robust Methods in Analysis of Natural Language Data. Association for Computational Linguistics (2004)
Sarkar, S., Roy, A., Purkayastha, B.S.: A comparative analysis of particle swarm optimization and K-means algorithm for text clustering using Nepali Wordnet. Int. J. Nat. Lang. Comput. (IJNLC) 3(3) (2014)
Akter, R., Chung, Y.: An evolutionary approach for document clustering. IERI Procedia 4, 370–375 (2013)
Meena, K.Y., Singh, P.: Text documents clustering using genetic algorithm and discrete differential evolution. Int. J. Comput. Appl. 43(1), 0975–8887 (2012)
Trappey, A.J.C., et al.: A fuzzy ontological knowledge document clustering methodology. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(3), 806–814 (2009)
Thilagavathi, G., Anitha, J.: Document clustering in forensic investigation by hybrid approach. Int. J. Comput. Appl. 91(3) (2014)
Baghel, R., Dhir, R.: A frequent concepts-based document clustering algorithm. Int. J. Comput. Appl. 4(5), 6–12 (2010)
Jing, H., et al.: Semantic naïve Bayes classifier for document classification. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing (2013)
Aggarwal, C.C., Reddy, C.K. (eds.): Data Clustering: Algorithms and Applications. CRC Press, New York (2013)
Abualigah, L.M., Khader, A.T., Hanandeh, E.S.: A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis. Eng. Appl. Artif. Intell. 73, 111–125 (2018)
Lydia, E.L., et al.: Charismatic document clustering through novel K-Means non-negative matrix factorization (KNMF) algorithm using key phrase extraction. Int. J. Parallel Program. 1–19 (2018)
Altameem, T., Amoon, M.: Hybrid tolerance rough fuzzy set with improved monkey search algorithm-based document clustering. J. Ambient Intell. Humanized Comput. 1–11 (2018)
Dalal, V., Malik, L.: Data Clustering Approach for Automatic Text Summarization of Hindi Documents using Particle Swarm Optimization and Semantic Graph
Ahmad, A., Amin, M.R., Chowdhury, F.: Bengali document clustering using word movers distance. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP). IEEE (2018)
Lakshmi, R., Baskar, S.: DIC-DOC-K-means: dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering. J. Inf. Sci. 0165551518816302 (2018)
Megarchioti, S., Mamalis, B.: The BigKClustering approach for document clustering using Hadoop MapReduce. In: Proceedings of the 22nd Pan-Hellenic Conference on Informatics. ACM (2018)
Al-Jadir, I., et al.: Enhancing digital forensic analysis using memetic algorithm feature selection method for document clustering. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE (2018)
Zhu, Y., Zhang, M., Shi, F.: Application of algorithm CARDBK in document clustering. Wuhan Univ. J. Nat. Sci. 23(6), 514–524 (2018)
Abualigah, L.M., et al.: A krill herd algorithm for efficient text documents clustering. In: 2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE). IEEE (2016)
Akter, R., Chung, Y.: An improved genetic algorithm for document clustering on the cloud. Int. J. Cloud Appl. Comput. (IJCAC) 8(4), 20–28 (2018)
Chen, Y., Sun, P.: An optimized K-Means algorithm based on FSTVM. In: 2018 International Conference on Virtual Reality and Intelligent Systems (ICVRIS). IEEE (2018)
Al-Jadir, I., et al.: Adaptive crossover memetic differential harmony search for optimizing document clustering. In: International Conference on Neural Information Processing. Springer, Cham (2018)
Seshadri, K., Viswanathan Iyer, K.: Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis. Concurr. Comput. Pract. Exp. e5094
Saini, N., Saha, S., Bhattacharyya, P.: Automatic scientific document clustering using self-organized multi-objective differential evolution. Cogn. Comput. 1–23 (2018)
Rani, M.S., Babu, G.C.: Efficient query clustering technique and context well-informed document clustering. In: Soft Computing and Signal Processing, pp. 261–271. Springer, Singapore (2019)
Gonzà lez, E., Turmo, J.: Unsupervised document clustering by weighted combination. LSI Research Report LSI-06-17-R, Departament de Llenguatges i Sistemes Informátics, Barcelona (2006)
Gupta, A., Gautam, J., Kumar, A.: A survey on methodologies used for semantic document clustering. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS). IEEE (2017)
Jain, A.K., Narasimha Murty, M., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Choubey, V., Dubey, S.K. (2020). An Analytical Approach to Document Clustering Techniques. In: Tuba, M., Akashe, S., Joshi, A. (eds) ICT Systems and Sustainability. Advances in Intelligent Systems and Computing, vol 1077. Springer, Singapore. https://doi.org/10.1007/978-981-15-0936-0_3
Download citation
DOI: https://doi.org/10.1007/978-981-15-0936-0_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0935-3
Online ISBN: 978-981-15-0936-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)