Abstract
In this paper we propose a novel sentence retrieval method based on extracting highly frequent terms from top retrieved documents. We compare it against state of the art sentence retrieval techniques, including those based on pseudo-relevant feedback, showing that the approach is robust and competitive. Our results reinforce the idea that top retrieved data is a valuable source to enhance retrieval systems. This is especially true for short queries because there are usually few query-sentence matching terms. Moreover, the approach is particularly promising for weak queries. We demonstrate that this novel method is able to improve significantly the precision at top ranks when handling poorly specified information needs.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abdul-Jaleel, N., Allan, J., Croft, B., Diaz, F., Larkey, L., Li, X., Smucker, M., Wade, C.: UMass at TREC 2004: Novelty and Hard. In: Proc. TREC-2004, the 13th Text Retrieval Conference (2004)
Allan, J., Wade, C., Bolivar, A.: Retrieval and novelty detection at the sentence level. In: Proc. SIGIR-2003, the 26th ACM Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 314–321. ACM Press, New York (2003)
Amati, G., van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS) 20(4), 357–389 (2002)
Buckley, C., Singhal, A., Mitra, M., Salton, G.: New retrieval approaches using SMART: TREC 4. In: Harman, D. (ed.) Proc. TREC-4, pp. 25–48 (1996)
Cardie, C., Ng, V., Pierce, D., Buckley, C.: Examining the role of statistical and linguistic knowledge sources in a general-knowledge question-answering system. In: Proc. ANLP-2000, the 6th Applied Natural Language Processing Conference, Seattle, Washington, pp. 180–187 (2000)
Collins-Thompson, K., Ogilvie, P., Zhang, Y., Callan, J.: Information filtering, novelty detection, and named-page finding. In: Proc. TREC-2002 (2002)
Dkaki, T., Mothe, J.: TREC novelty track at IRIT-SIG. In: Proc. TREC-2004, the 13th text retrieval conference (2004)
Harman, D.: Overview of the TREC 2002 novelty track. In: Proc. TREC-2002 (2002)
Kwok, K.L., Deng, P., Dinstl, N., Chan, M.: TREC 2002 web, novelty and filtering track experiments using PIRCS. In: Proc. TREC-2002, the 11th text retrieval conference (2002)
Larkey, L., Allan, J., Connell, M., Bolivar, A., Wade, C.: UMass at TREC 2002:cross language and novelty tracks. In: Proc. TREC-2002 (2002)
Li, X., Croft, B.: Novelty detection based on sentence level patterns. In: Proc. CIKM-2005. ACM Conf. on Information and Knowledge Management, ACM Press, New York (2005)
Mani, I., Maybury, M.T.: Advances in Automatic Text Summarization. MIT Press, Cambridge (1999)
Murdock, V.: Aspects of sentence retrieval. PhD thesis, Univ. Massachussetts (2006)
Robertson, S.E., Walker, S., Jones, S., HancockBeaulieu, M.M., Gatford, M.: Okapi at TREC-3. In: Harman, D. (ed.) Proc. TREC-3, the 3rd Text Retrieval Conference, pp. 109–127. NIST (1995)
Schiffman, B.: Experiments in novelty detection at columbia university. In: Proc. TREC-2002, the 11th text retrieval conference (2002)
Silverstein, C., Henzinger, M., Marais, H., Moricz, M.: Analysis of a very large web search engine query log. ACM SIGIR Forum 33(1), 6–12 (1999)
Soboroff, I.: Overview of the TREC 2004 novelty track. In: Proc. TREC-2004, the 13th text retrieval conference (2004)
Soboroff, I., Harman, D.: Overview of the TREC 2003 novelty track. In: Proc. TREC-2003, the 12th text retrieval conference (2003)
Stokes, N., Carthy, J.: First story detection using a composite document representation. In: Proc. of HTL-01, the Human Language Technology Conference, San Diego, USA (March 2001)
Tombros, A., Sanderson, M.: Advantages of query biased summaries in information retrieval. In: Proc. SIGIR-1998, the 21st ACM Int. Conf. on Research and Development in Information Retrieval, August 1998, pp. 2–10. ACM Press, New York (1998)
Voorhees, E.: Using Wordnet to disambiguate word senses for text retrieval. In: Proc. SIGIR-1993, Pittsburgh, PA, pp. 171–180 (1993)
Voorhees, E., Harman, D.: Overview of the eight text retrieval conference. In: Proc. TREC-8, the 8th text retrieval conference (1999)
Voorhees, E., Harman, D. (eds.): The TREC AdHoc Experiments, chapter The TREC AdHoc Experiments, pp. 79–97. MIT Press, Cambridge (2005)
White, R., Jose, J., Ruthven, I.: A task-oriented study on the influencing effects of query-biased summarisation in web searching. Information Processing and Management 39, 707–733 (2003)
White, R., Jose, J., Ruthven, I.: Using top-ranking sentences to facilitate effective information access. Journal of the American Society for Information Science and Technology (JASIST) 56(10), 1113–1125 (2005)
Xu, J., Croft, B.: Query expansion using local and global document analysis. In: Proc. SIGIR-1996, Zurich, Switzerland, July 1996, pp. 4–11 (1996)
Zhang, H.P., Xu, H.B., Bai, S., Wang, B., Cheng, X.Q.: Experiments in TREC 2004 novelty track at CAS-ICT. In: Proc. TREC-2004 (2004)
Zhang, M., Song, R., Lin, C., Ma, S., Jiang, Z., Jin, Y., Liu, Y., Zhao, L.: THU TREC 2002: Novelty track experiments. In: Proc. TREC-2002, the 11th text retrieval conference (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Losada, D.E., Fernández, R.T. (2007). Highly Frequent Terms and Sentence Retrieval. In: Ziviani, N., Baeza-Yates, R. (eds) String Processing and Information Retrieval. SPIRE 2007. Lecture Notes in Computer Science, vol 4726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75530-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-75530-2_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75529-6
Online ISBN: 978-3-540-75530-2
eBook Packages: Computer ScienceComputer Science (R0)