Abstract
Query-oriented summarization addresses the problem of information overload and help people get the main ideas within a short time. Summaries are composed by sentences. So, the basic idea of composing a salient summary is to construct quality sentences both for user specific queries and multiple documents. Sentence embedding has been shown effective in summarization tasks. However, these methods lack of the latent topic structure of contents. Hence, the summary lies only on vector space can hardly capture multi-topical content. In this paper, our proposed model incorporates the topical aspects and continuous vector representations, which jointly learns semantic rich representations encoded by vectors. Then, leveraged by topic filtering and embedding ranking model, the summarization can select desirable salient sentences. Experiments demonstrate outstanding performance of our proposed model from the perspectives of prominent topics and semantic coherence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In DUC, the word-based query is also called “title”, such as “New hydroelectric projects”.
- 2.
In DUC, the sentence-based query is also called “narrative”, such as “What hydroelectric projects are planned or in progress and what problems are associated with them?”.
- 3.
- 4.
In DUC, the query is also called “narrative” or “topic”.
References
Barzilay, R., Lee, L.: Catching the drift: probabilistic content models, with applications to generation and summarization. Comput. Sci. 113–120 (2004)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3(Jan), 993–1022 (2003)
Cao, Z., Li, S., Liu, Y., Li, W., Ji, H.: A novel neural topic model and its supervised extension. In: Proceedings of AAAI 2015 (2015)
Chen, K.Y., Liu, S.H., Wang, H.M., Chen, B., Chen, H.H.: Leveraging word embeddings for spoken document summarization. Comput. Sci. 1383–1387 (2015)
Conroy, J.M., Schlesinger, J.D., O’Leary, D.P.: Topic-focused multi-document summarization using an approximate oracle score. In: Proceedings of ACL 2006 (2006)
Das, R., Zaheer, M., Dyer, C.: Gaussian LDA for topic models with word embeddings. In: Proceedings of ACL 2015 (2015)
Galley, M.: A skip-chain conditional random field for ranking meeting utterances by importance. In: Proceedings of EMNLP 2007 (2006)
Ghosh, S., Vinyals, O., Strope, B., Roy, S., Dean, T., Heck, L.: Contextual LSTM (CLSTM) models for large scale NLP tasks (2016)
Gupta, S., Nenkova, A., Jurafsky, D.: Measuring Importance and Query Relevance in Topic-Focused Multi-document Summarization (2007)
Harabagiu, S., Lacatusu, F.: Topic themes for multi-document summarization. In: Proceedings of SIGIR 2005 (2005)
Kågebäck, M., Mogren, O., Tahmasebi, N., Dubhashi, D.: Extractive summarization using continuous vector space models. In: Proceedings of EACL 2014 (2014)
Kobayashi, H., Noguchi, M., Yatsuka, T.: Summarization based on embedding distributions. In: Proceedings of EMNLP 2015 (2015)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. Comput. Sci. 1188–1196 (2014)
Lin, C.Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: Proceedings of COLING 2000 (2000)
Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of ACL 2003 (2003)
Liu, Y.: Query-oriented multi-document summarization via unsupervised deep learning. In: Proceedings of AAAI 2012 (2012)
Liu, Y., Liu, Z., Chua, T.S., Sun, M.: Topical word embeddings. In: Proceedings of AAAI 2015 (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and Their Compositionality (2013)
Nenkova, A., Vanderwende, L., Mckeown, K.: A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In: Proceedings of SIGIR 2006 (2006)
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of ACL 2003 (2003)
Parveen, D., Ramsl, H., Strube, M.: Topical coherence for graph-based extractive summarization. In: Proceedings of EMNLP 2015 (2015)
Tang, J., Yao, L., Chen, D.: Multi-topic based query-oriented summarization. In: Proceedings of SDM 2009 (2009)
Wang, D., Li, T., Zhu, S., Ding, C.: Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In: Proceedings of SIGIR 2008 (2008)
Yang, M., Cui, T., Tu, W.: Ordering-sensitive and semantic-aware topic modeling. In: Proceedings of AAAI 2015 (2015)
Yih, W.T., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content-words. In: Proceedings of IJCAI 2007 (2007)
Yin, W., Pei, Y.: Optimizing sentence modeling and selection for document summarization. In: Proceedings of IJCAI 2015 (2015)
Acknowledgments
The work was supported by National Nature Science Foundation of China (Grant No. 61602036), National Basic Research Program of China (973 Program, Grant No. 2013CB329303), Beijing Advanced Innovation Center for Imaging Technology (BAICIT-2016007).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wei, L., Huang, H., Gao, Y., Wei, X., Feng, C. (2017). Aligning Gaussian-Topic with Embedding Network for Summarization Ranking. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10366. Springer, Cham. https://doi.org/10.1007/978-3-319-63579-8_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-63579-8_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63578-1
Online ISBN: 978-3-319-63579-8
eBook Packages: Computer ScienceComputer Science (R0)