Abstract
Emotion Recognition in Conversation (ERC) is a task aimed at predicting the emotions conveyed by an utterance in a dialogue. It is common in ERC research to integrate intra-utterance, local contextual, and global contextual information to obtain the utterance vectors. However, there exist complex semantic dependencies among these factors, and failing to model these dependencies accurately can adversely affect the effectiveness of emotion recognition. Moreover, to enhance the semantic dependencies within the context, researchers commonly introduce external commonsense knowledge after modeling it. However, injecting commonsense knowledge into the model simply without considering its potential impact can introduce unexpected noise. To address these issues, we propose a dialogue emotion model based on local–global context encoder and commonsense knowledge fusion attention. The local–global context encoder effectively integrates the information of intra-utterance, local context, and global context to capture the semantic dependencies among them. To provide more accurate external commonsense information, we present a fusion module to filter the commonsense information through multi-head attention. Our proposed method has achieved competitive results on four datasets and exhibits advantages compared with mainstream models using commonsense knowledge.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Kratzwald B, Ilic S, Kraus M, Feuerriegel S, Prendinger H (2018) Decision support with text-based emotion recognition: deep learning for affective computing. arXiv preprint arXiv:1803.06397
Wen J, Jiang D, Tu G, Liu C, Cambria E (2023) Dynamic interactive multiview memory network for emotion recognition in conversation. Inf Fusion 91:123–133
Cambria E, Wang H, White B (2014) Guest editorial: big social data analysis. Knowl-Based Syst 69:1–2
Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
Saberi B, Saad S (2017) Sentiment analysis or opinion mining: a review. Int J Adv Sci Eng Inf Technol 7(5):1660–1666
Baecker AN, Geiskkovitch DY, González AL, Young JE (2020) Emotional support domestic robots for healthy older adults: conversational prototypes to help with loneliness. In: Companion of the 2020 ACM/IEEE international conference on human–robot interaction, pp 122–124
Abdollahi H, Mahoor MH, Zandie R, Sewierski J, Qualls SH (2022) Artificial emotional intelligence in socially assistive robots for older adults: a pilot study. IEEE Trans Affect Comput 14(3):2020–2032. https://doi.org/10.1109/TAFFC.2022.3143803
Darling K (2016) Extending legal protection to social robots: the effects of anthropomorphism, empathy, and violent behavior towards robotic objects. In: Law Robot, Froomkin Calo, Kerr (eds) Edward Elgar.
Zhong P, Wang D, Miao C (2019) Knowledge-enriched transformer for emotion detection in textual conversations. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 165–176
Ghosal D, Majumder N, Gelbukh A, Mihalcea R, Poria S (2020) COSMIC: COmmonSense knowledge for eMotion identification in conversations. In: Findings of the association for computational linguistics: EMNLP 2020, pp 2470–2481
Li J, Lin Z, Fu P, Wang W (2021) Past, present, and future: conversational emotion recognition through structural modeling of psychological knowledge. In: Findings of the association for computational linguistics: EMNLP 2021, pp 1204–1214
Hu J, Liu Y, Zhao J, Jin Q (2021) MMGCN: multimodal fusion via deep graph convolution network for emotion recognition in conversation. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 5666–5675
Wen Z, Wang R, Luo X, Wang Q, Liang B, Du J, Yu X, Gui L, Xu R (2023) Multi-perspective contrastive learning framework guided by sememe knowledge and label information for sarcasm detection. Int J Mach Learn Cybern 14:4119–4134
Wang R, Bao J, Mi F, Chen Y, Wang H, Wang Y, Li Y, Shang L, Wong K-F, Xu R (2023) Retrieval-free knowledge injection through multi-document traversal for dialogue models. In: Proceedings of the 61st annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 6608–6619
Bosselut A, Rashkin H, Sap M, Malaviya C, Celikyilmaz A, Choi Y (2019) COMET: commonsense transformers for automatic knowledge graph construction. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4762–4779
Xiao G, Tu G, Zheng L, Zhou T, Li X, Ahmed SH, Jiang D (2020) Multimodality sentiment analysis in social internet of things based on hierarchical attentions and CSAT-TCN with MBM network. IEEE Internet Things J 8(16):12748–12757
Jiang D, Liu H, Wei R, Tu G (2023) CSAT-FTCN: a fuzzy-oriented model with contextual self-attention network for multimodal emotion recognition. Cogn Comput 15:1082–1091
Tu G, Wen J, Liu H, Chen S, Zheng L, Jiang D (2022) Exploration meets exploitation: multitask learning for emotion recognition based on discrete and dimensional models. Knowl-Based Syst 235:107598
Khan W, Daud A, Nasir JA, Amjad T (2016) A survey on the state-of-the-art machine learning models in the context of NLP. Kuwait J Sci 43(4):95–113
Tu G, Liang B, Jiang D, Xu R (2022) Sentiment- emotion- and context-guided knowledge selection framework for emotion recognition in conversations. IEEE Trans Affect Comput 14:1803–1816
Chen R, Wang J, Yu L-C, Zhang X (2023) Decoupled variational autoencoder with interactive attention for affective text generation. Eng Appl Artif Intell 123:106447
Sheng D, Wang D, Shen Y, Zheng H, Liu H (2020) Summarize before aggregate: a global-to-local heterogeneous graph inference network for conversational emotion recognition. In: Proceedings of the 28th international conference on computational linguistics, pp 4153–4163
Poria S, Cambria E, Hazarika D, Majumder N, Zadeh A, Morency L-P (2017) Context-dependent sentiment analysis in user-generated videos. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long Papers), pp 873–883
Zahiri SM, Choi JD (2018) Emotion detection on tv show transcripts with sequence-based convolutional neural networks. In: Workshops at the thirty-second AAAI conference on artificial intelligence
Ishiwatari T, Yasuda Y, Miyazaki T, Goto J (2020) Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 7360–7370
Zhang D, Wu L, Sun C, Li S, Zhu Q, Zhou G (2019) Modeling both context-and speaker-sensitive dependence for emotion detection in multi-speaker conversations. In: IJCAI, pp 5415–5421
Shen W, Wu S, Yang Y, Quan X (2021) Directed acyclic graph network for conversational emotion recognition. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 1551–1560
Hu D, Wei L, Huai X (2021) DialogueCRN: contextual reasoning networks for emotion recognition in conversations. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 7042–7052
Lee J, Lee W (2022) CoMPM: context modeling with speaker’s pre-trained memorytracking for emotion recognition in conversation. In: Proceedings of the 2022 Conference of the North American chapter of the association for computational linguistics: human language technologies, pp 5669–5679
Wang Y, Zhang J, Ma J, Wang S, Xiao J (2020) Contextualized emotion recognition in conversation as sequence tagging. In: Proceedings of the 21th annual meeting of the special interest group on discourse and dialogue, pp 186–195
Chen R, Wang J, Yu L-C, Zhang X (2023) Learning to memorize entailment and discourse relations for persona-consistent dialogues. arXiv preprint arXiv:2301.04871
Speer R, Chin J, Havasi C (2017) Conceptnet 5.5: an open multilingual graph of general knowledge. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 31
Sap M, Le Bras R, Allaway E, Bhagavatula C, Lourie N, Rashkin H, Roof B, Smith NA, Choi Y (2019) Atomic: an Atlas of machine commonsense for if-then reasoning. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 3027–3035
Cambria E, Liu Q, Decherchi S, Xing F, Kwok K (2022) Senticnet 7: a commonsense-based neurosymbolic AI framework for explainable sentiment analysis. In: Proc LREC 2022, pp 3829–3839
Cai H, Shen X, Xu Q, Shen W, Wang X, Ge W, Zheng X, Xue X (2023) Improving empathetic dialogue generation by dynamically infusing commonsense knowledge. arXiv preprint arXiv:2306.04657
Liu Y, Wan Y, He L, Peng H, Philip SY (2021) Kg-bart: knowledge graph-augmented bart for generative commonsense reasoning. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 6418–6425
Zhang X, Bosselut A, Yasunaga M, Ren H, Liang P, Manning CD, Leskovec J (2022) Greaselm: graph reasoning enhanced language models for question answering. arXiv preprint arXiv:2201.08860
Song R, He S, Gao S, Cai L, Liu K, Yu Z, Zhao J (2023) Multilingual knowledge graph completion from pretrained language models with knowledge constraints. In: Findings of the association for computational linguistics: ACL 2023, pp 7709–7721
Zhu L, Pergola G, Gui L, Zhou D, He Y (2021) Topic-driven and knowledge-aware transformer for dialogue emotion detection. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 1571–1582
Tu G, Wen J, Liu C, Jiang D, Cambria E (2022) Context- and sentiment-aware networks for emotion recognition in conversation. IEEE Trans Artif Intell 3(5):699–708
Jiang D, Wei R, Wen J, Tu G, Cambria E (2023) AutoML-Emo: automatic knowledge selection using congruent effect for emotion identification in conversations. IEEE Trans Affect Comput 14:1845–1856
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
Liu Y, Lapata M (2019) Text summarization with pretrained encoders. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 3730–3740
Radford A, Narasimhan K, Salimans T, Sutskever I, et al (2018) Improving language understanding by generative pre-training
Busso C, Bulut M, Lee C-C, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) Iemocap: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335–359
Li Y, Su H, Shen X, Li W, Cao Z, Niu S (2017) DailyDialog: a manually labelled multi-turn dialogue dataset. In: Proceedings of the eighth international joint conference on natural language processing (Volume 1: Long Papers), pp 986–995
Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2019) MELD: a multimodal multi-party dataset for emotion recognition in conversations. In: ACL, pp 527–536
Chen Y (2015) Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo
Majumder N, Poria S, Hazarika D, Mihalcea R, Gelbukh A, Cambria E (2019) Dialoguernn: an attentive RNN for emotion detection in conversations. In: Proceedings of the AAAI conference on artificial intelligence, pp 6818–6825
Ghosal D, Majumder N, Poria S, Chhaya N, Gelbukh A (2019) DialogueGCN: a graph convolutional neural network for emotion recognition in conversation. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 154–164
Li J, Ji D, Li F, Zhang M, Liu Y (2020) Hitrans: a transformer-based context-and speaker-sensitive model for emotion detection in conversations. In: Proceedings of the 28th international conference on computational linguistics, pp 4190–4200
Xie Y, Yang K, Sun C-J, Liu B, Ji Z (2021) Knowledge-interactive network with sentiment polarity intensity-aware multi-task learning for emotion recognition in conversations. In: Findings of the association for computational linguistics: EMNLP 2021, pp 2879–2889
Acknowledgements
The authors would like to respect and thank all reviewers for their constructive and helpful review. This research is funded by the National Natural Science Foundation of China (62372283, 62206163), Science and Technology Major Project of Guangdong Province (STKJ2021005, STKJ202209002, STKJ2023076), Natural Science Foundation of Guangdong Province (2019A1515010943).
Funding
The authors would like to respect and thank all reviewers for their constructive and helpful review. This research is funded by the National Natural Science Foundation of China (62372283, 62206163), Natural Science Foundation of Guangdong Province (2019A1515010943), The Basic and Applied Basic Research of Colleges and Universities in Guangdong Province (Special Projects in Artificial Intelligence)(2019KZDZX1030), 2020 Li Ka Shing Foundation Cross-Disciplinary Research Grant (2020LKSFG04D), Science and Technology Major Project of Guangdong Province(STKJ2021005, STKJ202209002), and the Opening Project of GuangDong Province Key Laboratory of Information Security Technology (2020B1212060078).
Author information
Authors and Affiliations
Contributions
W.Y.: Original Draft. C.L.: Review & Editing. X.H.: Review & Editing. W.Z.: Validation. E.C.: Review & Editing. D.J.: Supervision.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, W., Li, C., Hu, X. et al. Dialogue emotion model based on local–global context encoder and commonsense knowledge fusion attention. Int. J. Mach. Learn. & Cyber. 15, 2811–2825 (2024). https://doi.org/10.1007/s13042-023-02066-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-02066-3