Skip to main content

Entity Set Expansion on Social Media: A Study for Newly-Presented Entity Classes

  • Conference paper
  • First Online:
Social Media Processing (SMP 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 774))

Included in the following conference series:

  • 1838 Accesses

Abstract

Online social media yields a large-scale corpora which is fairly informative and sometimes includes many up-to-date entities. The challenging task of expanding entity sets on social media text is to extract more unheard entities with several seeds already in hand. In this paper, we present a novel approach that is able to discover newly-presented objects by doing entity set expansion on social media. From an initial seed set, our method first explores the performance of embedding method to get semantic similarity feature when generating candidate lists, and detects features of connective patterns and prefix rules with specific social media nature. Then a rank model is learned by supervised algorithm to synthetically score each candidate terms on those features and finally give the final ranked set. The experimental results on Twitter text corpus show that our solution is able to achieve high precision on common class sets, and new class sets containing abundant informal and new entities that have not been mentioned in common articles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Wang, R.C., Cohen, W.W.: Language-independent set expansion of named entities using the web. In: IEEE International Conference on Data Mining, pp. 342–350. IEEE (2007)

    Google Scholar 

  2. Wang, R.C., Cohen, W.W.: Iterative set expansion of named entities using the web. In: Eighth IEEE International Conference on Data Mining, pp. 1091–1096. IEEE (2009)

    Google Scholar 

  3. Wang, R.C., Cohen, W.W.: SEAL. http://rcwang.com/seal

  4. He, Y., Xin, D.: SEISA: Set Expansion by Iterative Similarity Aggregation. In: International Conference on World Wide Web, WWW 2011, Hyderabad, India, pp. 427–436 (2011)

    Google Scholar 

  5. Dalvi, B.B., Cohen, W.W., Callan, J.: WebSets: extracting sets of entities from the web using unsupervised information extraction. In: ACM International Conference on Web Search and Data Mining, pp. 243–252. ACM (2012)

    Google Scholar 

  6. Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Conference on Empirical Methods in Natural Language Processing, ACL 2002, pp. 212–221. ACL (2002)

    Google Scholar 

  7. Wang, R.C., Cohen, W.W.: Automatic set instance extraction using the web. In: ACL 2009, Proceedings of the Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing of the AFNLP, pp. 441–449. ACL, Singapore (2009)

    Google Scholar 

  8. Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in KnowItAll. In: WWW, pp. 100–110 (2004)

    Google Scholar 

  9. Widdows, D., Dorow, B.: A graph model for unsupervised lexical acquisition. In: International Conference on Computational Linguistics, pp. 1093–1099 (2002)

    Google Scholar 

  10. Sarmento, L., Jijkuon, V., De Rijke, M., Oliveira, E.: More like these: growing entity classes from seeds. In: Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 959–962. ACM (2007)

    Google Scholar 

  11. Talukdar, P.P., Brants, T., Liberman, M., Pereira, F.: A context pattern induction method for named entity extraction. In: Computational Natural Language Learning, CoNLL-X, pp. 141–148 (2006)

    Google Scholar 

  12. Ghahramani, Z., Heller, K.A.: Bayesian sets (2005)

    Google Scholar 

  13. Li, X.L., Zhang, L., Liu, B., Ng, S.K.: Distributional similarity vs. PU learning for entity set expansion. In: ACL 2010 Conference Short Papers, pp. 359–364. ACL (2010)

    Google Scholar 

  14. Ritter, A., Sam, C., Mausam, Etzioni, O.: Named entity recognition in tweets (2011)

    Google Scholar 

  15. Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Bu-Sung, L.: TwiNER: named entity recognition in targeted twitter stream. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 721–730 (2012)

    Google Scholar 

  16. Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M.A., Maynard, D., Aswani, N.: TwitIE: an open-source information extraction pipeline for microblog text. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing. Association for Computational Linguistics (2013)

    Google Scholar 

  17. GATE. https://gate.ac.uk/wiki/twitie.html

  18. Qadir, A., Mendes, P.N., Gruhl, D., Lewis, N.: Semantic lexicon induction from twitter with pattern relatedness and flexible term length. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2432–2439 (2015)

    Google Scholar 

  19. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. (2013)

    Google Scholar 

  20. Xu, J., Li, H.: AdaRank: a boosting algorithm for information retrieval. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 391–398 (2007)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National High-tech Research and Development Program (863 Program) (No. 2014AA015105) and National Natural Science Foundation of China (No. 61602490).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chong Feng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Zhao, H., Feng, C., Luo, Z., Pei, Y. (2017). Entity Set Expansion on Social Media: A Study for Newly-Presented Entity Classes. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds) Social Media Processing. SMP 2017. Communications in Computer and Information Science, vol 774. Springer, Singapore. https://doi.org/10.1007/978-981-10-6805-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6805-8_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6804-1

  • Online ISBN: 978-981-10-6805-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics