Abstract
In this paper, we present a method to identify forum posts expressing user intentions in online discussion forums. The results of this task, for example buying intentions, can be exploited for targeted advertising or other marketing tasks. Our method utilizes labeled data from other domains to help the learning task in the target domain by using a Naive Bayes (NB) framework to combine the data statistics . Because the distributions of data vary from domain to domain, it is important to adjust the contributions of different data sources when constructing the learning model, to achieve accurate results. Here, we propose to adjust the parameters of the NB classifier by optimizing an objective, which is equivalent to maximizing the between-class separation, using stochastic gradient descent. Experimental results show that our method outperforms several competitive baselines on a benchmark dataset consisting of forum posts from four domains: Cellphone, Electronics, Camera, and TV. In addition, we explore the possibility of combining NB posteriors computed during the optimization process with another classifier, namely Support Vector Machines. Experimental results show the usefulness of optimized NB class posteriors when using as features for SVMs in the cross-domain settings.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In experiments, we set \(\gamma = 0.01\).
Cellphone: http://www.howardforums.com/forums.php.
Electronics: http://www.avsforum.com/avs-vb/.
As shown in Chen et al. (2013), Naive Bayes is the suitable method for the task of intention detection in discussion forums.
We used LIBSVM (Chang and Lin 2011) with linear kernel. Software available at: https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
References
Bach, N.X., Phuong, T.M.: Leveraging user ratings for resource-poor sentiment classification. In: Proceedings of the 19th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (KES), pp. 322–331 (2015)
Bach, N.X., Hai, N.D., Phuong, T.M.: Personalized recommendation of stories for commenting in forum-based social media. Inf. Sci. 352–353, 48–60 (2016a)
Bach, N.X., Hai, V.T., Phuong, T.M.: Cross-domain sentiment classification with word embeddings and canonical correlation analysis. In: Proceedings of the 7th International Symposium on Information and Communication Technology (SoICT), pp. 159–166 (2016b)
Bach, N.X., Linh, L.C., Phuong, T.M.: Cross-domain intention detection in discussion forums. In: Proceedings of the Eighth International Symposium on Information and Communication Technology (SoICT), pp. 173–180 (2017)
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 440–447 (2007)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory (COLT) (1998)
Chang, C., Lin, C.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
Chen, Z., Liu, B.: Topic modeling using topics from many domains, lifelong learning and big data. In: Proceedings of the 31st International Conference on Machine Learning (ICML) (2014)
Chen, Z., Liu, B.: Lifelong Machine Learning. Morgan and Claypool, San Rafael (2017)
Chen, Z., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Identifying intention posts in discussion forums. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 1041–1050 (2013)
Chen, Z., Ma, N., Liu, B.: Lifelong learning for sentiment classification. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 750–756 (2015)
Ding, X., Liu, T., Duan, J., Nie, J.Y.: Mining user consumption intention from social media using domain adaptive convolutional neural network. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2389–2395 (2015)
Easley, D., Kleinberg, J.: Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, Cambridge (2010)
Ghani, R.: Using error-correcting codes for text classification. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pp. 303–310 (2000)
Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.: Part-of-speech tagging for twitter: annotation, features, and experiments. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 42–47 (2011)
Hamrouna, M., Gouider, M.S., Said, L.B.: Large scale microblogging intentions analysis with pattern based approach. In: Proceedings of International Conference on Knowledge Based and Intelligent Information and Engineering Systems (KES), pp. 1249–1257 (2016)
Hollerit, B., Kroll, M., Strohmaier, M.: Towards linking buyers and sellers: detecting commercial intent on twitter. In: Proceedings of the World Wide Web Conference (WWW), pp. 629–632 (2013)
Jiang, J.: A literature survey on domain adaptation of statistical classifiers. Technical report, University of Illinois Urbana-Champaign (2008)
Li, Q., Wang, J., Chen, Y., Lin, Z.: User comments for news recommendation in forum-based social media. Inf. Sci. 180(24), 4929–4939 (2010)
Li, L., Wang, D., Li, T., Knox, D., Padmanabhan, B.: Scene: a scalable two-stage personalized news recommendation system. In: Proceedings of the Thirty-Fourth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 125–134 (2011)
Li, C.X., Du, Y.J., Liu, J., Zheng, H., Wang, S.D.: A novel approach of identifying user intents in microblog. In: Proceedings of International Conference on Intelligent Computing (ICIC), pp. 391–400 (2016)
Liu, B.: Sentiment Analysis and Opinion Mining: Synthesis Lectures on Human Languages Technologies. Morgan and Claypool, San Rafael (2012)
Luong, T.L., Tran, T.H., Truong, Q.T., Truong, T.M.N., Phi, T.T., Phan, X.H.: Learning to filter user explicit intents in online Vietnamese social media texts. In: Proceedings of the Asian Conference on Intelligent Information and Database Systems (ACIIDS), pp. 13–24 (2016)
Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: Semeval-2016 task 4: sentiment analysis in twitter. In: Proceedings of SemEval-2016, pp. 1–18 (2016)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011)
Ritter, A., Clark, S., Mausam, Etzioni, O.: Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534 (2011)
Wang, S., Manning, C.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL): Short Papers, vol. 2, pp. 90–94 (2012)
Wang, J., Cong, G., Zhao, W.X., Li, X.: Mining user intents in twitter: a semi-supervised approach to inferring intent categories for tweets. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 339–345 (2015)
Zhu, X.: Semi-supervised learning literature survey. Technical report, University of Wisconsin-Madison (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Phuong, T.M., Linh, L.C. & Bach, N.X. Identifying intentions in forum posts with cross-domain data. J Heuristics 28, 171–192 (2022). https://doi.org/10.1007/s10732-019-09410-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10732-019-09410-3