Entity Set Expansion on Social Media: A Study for Newly-Presented Entity Classes

Zhao, He; Feng, Chong; Luo, Zhunchen; Pei, Yuxia

doi:10.1007/978-981-10-6805-8_10

He Zhao¹⁵,
Chong Feng¹⁵,
Zhunchen Luo¹⁶ &
…
Yuxia Pei¹⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 774))

Included in the following conference series:

Chinese National Conference on Social Media Processing

1838 Accesses

Abstract

Online social media yields a large-scale corpora which is fairly informative and sometimes includes many up-to-date entities. The challenging task of expanding entity sets on social media text is to extract more unheard entities with several seeds already in hand. In this paper, we present a novel approach that is able to discover newly-presented objects by doing entity set expansion on social media. From an initial seed set, our method first explores the performance of embedding method to get semantic similarity feature when generating candidate lists, and detects features of connective patterns and prefix rules with specific social media nature. Then a rank model is learned by supervised algorithm to synthetically score each candidate terms on those features and finally give the final ranked set. The experimental results on Twitter text corpus show that our solution is able to achieve high precision on common class sets, and new class sets containing abundant informal and new entities that have not been mentioned in common articles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

What’s New? Analysing Language-Specific Wikipedia Entity Contexts to Support Entity-Centric News Retrieval

NERank: Bringing Order to Named Entities from Texts

Entity set expansion in knowledge graph: a heterogeneous information network perspective

Article 29 September 2020

References

Wang, R.C., Cohen, W.W.: Language-independent set expansion of named entities using the web. In: IEEE International Conference on Data Mining, pp. 342–350. IEEE (2007)
Google Scholar
Wang, R.C., Cohen, W.W.: Iterative set expansion of named entities using the web. In: Eighth IEEE International Conference on Data Mining, pp. 1091–1096. IEEE (2009)
Google Scholar
Wang, R.C., Cohen, W.W.: SEAL. http://rcwang.com/seal
He, Y., Xin, D.: SEISA: Set Expansion by Iterative Similarity Aggregation. In: International Conference on World Wide Web, WWW 2011, Hyderabad, India, pp. 427–436 (2011)
Google Scholar
Dalvi, B.B., Cohen, W.W., Callan, J.: WebSets: extracting sets of entities from the web using unsupervised information extraction. In: ACM International Conference on Web Search and Data Mining, pp. 243–252. ACM (2012)
Google Scholar
Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Conference on Empirical Methods in Natural Language Processing, ACL 2002, pp. 212–221. ACL (2002)
Google Scholar
Wang, R.C., Cohen, W.W.: Automatic set instance extraction using the web. In: ACL 2009, Proceedings of the Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing of the AFNLP, pp. 441–449. ACL, Singapore (2009)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in KnowItAll. In: WWW, pp. 100–110 (2004)
Google Scholar
Widdows, D., Dorow, B.: A graph model for unsupervised lexical acquisition. In: International Conference on Computational Linguistics, pp. 1093–1099 (2002)
Google Scholar
Sarmento, L., Jijkuon, V., De Rijke, M., Oliveira, E.: More like these: growing entity classes from seeds. In: Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 959–962. ACM (2007)
Google Scholar
Talukdar, P.P., Brants, T., Liberman, M., Pereira, F.: A context pattern induction method for named entity extraction. In: Computational Natural Language Learning, CoNLL-X, pp. 141–148 (2006)
Google Scholar
Ghahramani, Z., Heller, K.A.: Bayesian sets (2005)
Google Scholar
Li, X.L., Zhang, L., Liu, B., Ng, S.K.: Distributional similarity vs. PU learning for entity set expansion. In: ACL 2010 Conference Short Papers, pp. 359–364. ACL (2010)
Google Scholar
Ritter, A., Sam, C., Mausam, Etzioni, O.: Named entity recognition in tweets (2011)
Google Scholar
Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Bu-Sung, L.: TwiNER: named entity recognition in targeted twitter stream. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 721–730 (2012)
Google Scholar
Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M.A., Maynard, D., Aswani, N.: TwitIE: an open-source information extraction pipeline for microblog text. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing. Association for Computational Linguistics (2013)
Google Scholar
GATE. https://gate.ac.uk/wiki/twitie.html
Qadir, A., Mendes, P.N., Gruhl, D., Lewis, N.: Semantic lexicon induction from twitter with pattern relatedness and flexible term length. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2432–2439 (2015)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. (2013)
Google Scholar
Xu, J., Li, H.: AdaRank: a boosting algorithm for information retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 391–398 (2007)
Google Scholar

Download references

Acknowledgements

This work was supported by the National High-tech Research and Development Program (863 Program) (No. 2014AA015105) and National Natural Science Foundation of China (No. 61602490).

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
He Zhao & Chong Feng
China Defense Science and Technology Information Center, Beijing, 100142, China
Zhunchen Luo
State Key Laboratory of Smart Manufacturing for Special Vehicles and Transmission System, Beijing, China
Yuxia Pei

Authors

He Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Zhunchen Luo
View author publications
You can also search for this author in PubMed Google Scholar
Yuxia Pei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chong Feng .

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xueqi Cheng
Beijing Jinri Toutiao Technology Co. Ltd , Beijing, China
Weiying Ma
Arizona State University , Tempe, Arizona, USA
Huan Liu
Institute of Computing Technology, Chinese Academy of Sciences , Beijing, China
Huawei Shen
Renmin University of China , Beijing, China
Shizheng Feng
Microsoft Asia Research , Beijing, China
Xing Xie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, H., Feng, C., Luo, Z., Pei, Y. (2017). Entity Set Expansion on Social Media: A Study for Newly-Presented Entity Classes. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds) Social Media Processing. SMP 2017. Communications in Computer and Information Science, vol 774. Springer, Singapore. https://doi.org/10.1007/978-981-10-6805-8_10

Download citation

DOI: https://doi.org/10.1007/978-981-10-6805-8_10
Published: 26 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6804-1
Online ISBN: 978-981-10-6805-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics