TDSS: A New Word Sense Representation Framework for Information Retrieval

Chen, Liwei; Feng, Yansong; Zhao, Dongyan

doi:10.1007/978-3-319-50496-4_6

Liwei Chen^18,19,
Yansong Feng¹⁸ &
Dongyan Zhao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10102))

Included in the following conference series:

Abstract

Word sense representation is important in the tasks of information retrieval (IR). Existing lexical databases, e.g., WordNet, and automated word sense representing approaches often use only one view to represent a word, and may not work well in the tasks which are sensitive to the contexts, e.g., query rewriting. In this paper, we propose a new framework to represent a word sense simultaneously in two views, explanation view and context view. We further propose an novel method to automatically learn such representations from large scale of query logs. Experimental results show that our new sense representations can better handle word substitutions in a query rewriting task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing Medical Word Sense Inventories Using Word Sense Induction: A Preliminary Study

WSD-TIC: Word Sense Disambiguation Using Taxonomic Information Content

State of the Art Analysis of Word Sense Disambiguation

References

Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: International Conference on Computational Linguistics, vol. 1, pp. 79–85 (1998)
Google Scholar
Brody, S., Lapata, M.: Bayesian word sense induction. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 103–111. Association for Computational Linguistics (2009)
Google Scholar
Chen, T., Xu, R., He, Y., Wang, X.: Improving distributed representation of word sense via wordnet gloss composition and context clustering. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2 (Short Papers), pp. 15–20. Association for Computational Linguistics, Beijing, July 2015. http://www.aclweb.org/anthology/P15-2003
Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1025–1035. Association for Computational Linguistics, Doha, October 2014. http://www.aclweb.org/anthology/D14-1110
Dorow, B., Widdows, D.: Discovering corpus-specific word senses. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 2, pp. 79–82. Association for Computational Linguistics (2003)
Google Scholar
Guo, J., Che, W., Wang, H., Liu, T.: Learning sense-specific word embeddings by exploiting bilingual resources. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 497–507. Dublin City University and Association for Computational Linguistics, Dublin, August 2014. http://www.aclweb.org/anthology/C14-1048
Huang, E., Socher, R., Manning, C., Ng, A.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol. 1 (Long Papers), pp. 873–882. Association for Computational Linguistics, Jeju Island, July 2012. http://www.aclweb.org/anthology/P12-1092
Jurgens, D.: Word sense induction by community detection. In: Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing, pp. 24–28. Association for Computational Linguistics (2011)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://dblp.uni-trier.de/db/journals/corr/corr1301.html#abs-1301-3781
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to wordnet: an on-line lexical database*. Int. J. Lexicogr. 3(4), 235–244 (1990)
Article Google Scholar
Neelakantan, A., Shankar, J., Passos, A., McCallum, A.: Efficient non-parametric estimation of multiple embeddings per word in vector space. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1059–1069. Association for Computational Linguistics, Doha, October 2014. http://www.aclweb.org/anthology/D14-1113
Niu, Z.Y., Ji, D.H., Tan, C.L.: I2r: three systems for word sense discrimination, Chinese word sense disambiguation, and English word sense disambiguation. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 177–182. Association for Computational Linguistics (2007)
Google Scholar
Purandare, A., Pedersen, T.: Word sense discrimination by clustering contexts in vector and similarity spaces. In: Proceedings of the Conference on Computational Natural Language Learning, Boston, vol. 72 (2004)
Google Scholar
Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)
Google Scholar
Tian, F., Dai, H., Bian, J., Gao, B., Zhang, R., Chen, E., Liu, T.Y.: A probabilistic model for learning multi-prototype word embeddings. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 151–160. Dublin City University and Association for Computational Linguistics, Dublin, August 2014. http://www.aclweb.org/anthology/C14-1016
Yao, X., Van Durme, B.: Nonparametric Bayesian word sense induction. In: Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing, pp. 10–14. Association for Computational Linguistics (2011)
Google Scholar
Zhao, S., Wang, H., Liu, T.: Paraphrasing with search engine query logs. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1317–1325. Association for Computational Linguistics (2010)
Google Scholar

Download references

Acknowledgement

We would like to thank Ben Xu, Wensong He, Shuaixiang Dai, Xiaozhao Zhao, Qiannan Lv, and the anonymous reviewers for their helpful feedback. This work is supported by National High Technology R&D Program of China (Grant No. 2015AA015403, 2014AA015102) and Natural Science Foundation of China (Grant No. 61202233, 61272344, 61370055). For any correspondence, please contact Liwei Chen.

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, Beijing, China
Liwei Chen, Yansong Feng & Dongyan Zhao
Baidu Inc., Beijing, China
Liwei Chen

Authors

Liwei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yansong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Dongyan Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liwei Chen .

Editor information

Editors and Affiliations

Microsoft Research Asia, Beijing, China
Chin-Yew Lin
Brandeis University, Waltham, Massachusetts, USA
Nianwen Xue
Peking University, Beijing, China
Dongyan Zhao
Fudan University, Shanghai, China
Xuanjing Huang
Peking University, Beijing, China
Yansong Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, L., Feng, Y., Zhao, D. (2016). TDSS: A New Word Sense Representation Framework for Information Retrieval. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-50496-4_6
Published: 02 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics