Abstract
The vocabulary problem in information retrieval arises because authors and indexers often use different terms for the same concept. A thesaurus defines mappings between different but related terms. It is widely used in modern information retrieval systems to solve the vocabulary problem. Chen et al. proposed the concept space approach to automatic thesaurus construction. A concept space contains the associations between every pair of terms. Previous research studies show that concept space is a useful tool for helping information searchers in revising their queries in order to get better results from information retrieval systems. The construction of a concept space, however, is very computationally intensive. In this paper, we propose and evaluate efficient algorithms for constructing concept spaces that include only strong associations. Since weak associations are not useful in thesauri construction, our algorithms use various prunning techniques to avoid computing weak associations to achieve efficiency.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
The SMART retrieval system. ftp://ftp.cs.cornell.edu/pub/smart/med.
R. Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999.
Hsinchun Chen, Joanne Martinez, Tobun D. Ng, and Bruce R. Schatz. A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system. Journal of American Society for information Science, 48(1): 17–31, 1997.
W.B. Frakes and R. Baeza-Yates. Information Retreival: Data Structures and Algorithms. Prentice Hall, 1992.
G.W. Furnas et al. The vocabulary problem in human-system communicaiton. Comm. ACM, 30(11):964–971, 1987.
H. Chen and K.J. Lynch. Automatic construction of networks of concepts characterizing document databases. IEEE Transaction of Systems, Man, and Cybernetics, 22(5):885–902, Sep/Oct 1992.
B.R. Schatz, E. Johnson, P. Cochrane, and H. Chen. Interactive term suggestion for users of digital libraries: using subject thesauri and co-occurrence lists for information retrieval. In Digital Library 96, Bethesda MD, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ng, C.Y., Lee, J., Cheung, F., Kao, B., Cheung, D. (2001). Efficient Algorithms for Concept Space Construction. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_12
Download citation
DOI: https://doi.org/10.1007/3-540-45357-1_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive