Task-oriented keyphrase extraction from social media

Yang, Min; Liang, Yuzhi; Zhao, Wei; Xu, Wei; Zhu, Jia; Qu, Qiang

doi:10.1007/s11042-017-5041-y

Task-oriented keyphrase extraction from social media

Published: 31 July 2017

Volume 77, pages 3171–3187, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Min Yang ORCID: orcid.org/0000-0001-7345-5071^1,2,
Yuzhi Liang³,
Wei Zhao⁴,
Wei Xu⁴,
Jia Zhu¹ &
…
Qiang Qu²

675 Accesses
9 Citations
Explore all metrics

Abstract

Keyphrase extraction from social media is a crucial and challenging task. Previous studies usually focus on extracting keyphrases that provide the summary of a corpus. However, they do not take users’ specific needs into consideration. In this paper, we propose a novel three-stage model to learn a keyphrase set that represents or related to a particular topic. Firstly, a phrase mining algorithm is applied to segment the documents into human-interpretable phrases. Secondly, we propose a weakly supervised model to extract candidate keyphrases, which uses a few pre-specific seed keyphrases to guide the model. The model consequently makes the extracted keyphrases more specific and related to the seed keyphrases (which reflect the user’s needs). Finally, to further identify the implicitly related phrases, the PMI-IR algorithm is employed to obtain the synonyms of the extracted candidate keyphrases. We conducted experiments on two publicly available datasets from news and Twitter. The experimental results demonstrate that our approach outperforms the state-of-the-art baselines and has the potential to extract high-quality task-oriented keyphrases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Available at https://www.google.com/advanced_search.
Available at http://qwone.com/~jason/20Newsgroups.
Available at http://www.nltk.org.
Available at http://www.ranks.nl/stopwords.
Available at http://wordnet.princeton.edu/.

References

Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference of very large data bases, VLDB, vol 1215, pp 487–499
Google Scholar
Arora R, Ravindran B (2008) Latent dirichlet allocation based multi-document summarization. In: Proceedings of the second workshop on analytics for noisy unstructured text data. ACM, pp 91–97
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Chang X, Nie F, Wang S, Yi Y, Zhou X, Zhang C (2016) Compound rank- k projections for bilinear analysis. IEEE Transactions on Neural Networks and Learning Systems 27(7):1502–1513
Article MathSciNet Google Scholar
Chang X, Yu Y-L, Yi Y, Xing EP (2016) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell. doi:10.1109/TPAMI.2016.2608901
Article Google Scholar
Chang X, Yi Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2016.2582746
Article MathSciNet Google Scholar
Chang X, Ma Z, Lin M, Yi Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920
Article MathSciNet Google Scholar
Chang X, Ma Z, Yi Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197
Article Google Scholar
Chen J, Zhang B, Shen D, Yang Q, Chen Z, Cheng Q (2006) Diverse topic phrase extraction from text collection
Chien L-F (1997) Pat-tree-based keyword extraction for chinese information retrieval. In: ACM SIGIR forum, vol 31. ACM, pp 50–58
Choi Y, Cardie C (2009) Adapting a polarity lexicon using integer linear programming for domain-specific sentiment classification. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 2. Association for Computational Linguistics, pp 590–598
El-Kishky A, Song Y, Wang C, Voss CR, Han J (2014) Scalable topical phrase mining from text corpora. Proceedings of the VLDB Endowment 8(3):305–316
Article Google Scholar
Feng X, Huang L, Tang D, Qin B, Ji H, Liu T (2016) A language-independent neural network for event detection. In: The 54th annual meeting of the association for computational linguistics, p 66
Google Scholar
Firth JR (1957) A synopsis of linguistic theory, 1930-1955
Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG (1999) Domain-specific keyphrase extraction
Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 19–25
Lafferty J, McCallum A, Pereira F et al (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, vol 1, pp 282–289
Google Scholar
Li J, Fan Q, Zhang K (2007) Keyword extraction based on tf/idf for chinese news document. Wuhan Univ J Nat Sci 12(5):917–921. doi:10.1007/s11859-007-0038-4
Article Google Scholar
Lott B (2012) Survey of keyword extraction techniques. UNM Education
Ma Z, Chang X, Yi Y, Sebe N, Hauptmann A (2017) The many shades of negativity. IEEE Trans Multimedia 19(7):1558–1568
Article Google Scholar
Neto JL, Santos AD, Kaestner CAA, Alexandre N, Santos D et al (2000) Document clustering and text summarization
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Article Google Scholar
Shamma DA, Kennedy L, Churchill EF (2009) Tweet the debates: understanding community annotation of uncollected sources. In: Proceedings of the first SIGMM workshop on social media. ACM, pp 3–10
Tu W, Cheung DW-L, Mamoulis N, Yang M, Lu Z (2015) Real-time detection and sorting of news on microblogging platforms. In: PACLIC
Google Scholar
Turney P (2001) Mining the web for synonyms: Pmi-ir versus lsa on toefl
Turney PD (2000) Learning algorithms for keyphrase extraction. Inf Retr 2 (4):303–336
Article Google Scholar
Turney PD (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 417–424
Yang M, Chow K-P (2015) An information extraction framework for digital forensic investigations. In: IFIP international conference on digital forensics. Springer, Cham, pp 61–76
Google Scholar
Yang M, Peng B, Chen Z, Zhu D, Chow K-P (2014) A topic model for building fine-grained domain-specific emotion lexicon. pp 421–426. ACL
Yang M, Zhu D, Rashed M, Chow K-P (2014) Learning domain-specific sentiment lexicon with supervised sentiment-aware lda. In: The 21st European conference on artificial intelligence (ECAI). IOS Press
Yang M, Cui T, Tu W (2015) Ordering-sensitive and semantic-aware topic modeling. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 2353–2359
Google Scholar
Zhang C (2008) Automatic keyword extraction from documents using conditional random fields. J Comput Inf Syst 4(3):1169–1180
Google Scholar
Zhu L, Shen J, Xie L, Cheng Z (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybern. doi:10.1109/TCYB.2016.2591068
Article Google Scholar
Zhu J, Xie Q, Yu S-I, Wong WH (2016) Exploiting link structure for web page genre identification. Data Min Knowl Disc 30(3):550–575
Article MathSciNet Google Scholar
Zhu J, Xu C, Li Z, Fung G, Lin X, Huang J, Huang C (2016) An examination of on-line machine learning approaches for pseudo-random generated data. Clust Comput 19(3):1309–1321
Article Google Scholar
Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing Science, South China Normal University, Guangzhou, China
Min Yang & Jia Zhu
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Min Yang & Qiang Qu
Department of Computer Science, The University of Hong Kong, Pok Fu Lam, Hong Kong
Yuzhi Liang
Tencent, Shenzhen, China
Wei Zhao & Wei Xu

Authors

Min Yang
View author publications
You can also search for this author inPubMed Google Scholar
Yuzhi Liang
View author publications
You can also search for this author inPubMed Google Scholar
Wei Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Wei Xu
View author publications
You can also search for this author inPubMed Google Scholar
Jia Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Qiang Qu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jia Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, M., Liang, Y., Zhao, W. et al. Task-oriented keyphrase extraction from social media. Multimed Tools Appl 77, 3171–3187 (2018). https://doi.org/10.1007/s11042-017-5041-y

Download citation

Received: 06 April 2017
Revised: 10 July 2017
Accepted: 17 July 2017
Published: 31 July 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11042-017-5041-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Task-oriented keyphrase extraction from social media

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MultPAX: Keyphrase Extraction Using Language Models and Knowledge Graphs

Keyphrase Extraction Based on Optimized Random Walks on Multiple Word Relations

Capturing Global Informativeness in Open Domain Keyphrase Extraction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Task-oriented keyphrase extraction from social media

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MultPAX: Keyphrase Extraction Using Language Models and Knowledge Graphs

Keyphrase Extraction Based on Optimized Random Walks on Multiple Word Relations

Capturing Global Informativeness in Open Domain Keyphrase Extraction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now