Abstract
In this paper, we propose an approach to intelligent and automatic keyword selection for the purpose of Twitter data collection and analysis. The proposed approach makes use of a combination of deep learning and evolutionary computing. As some context for application, we present the proposed algorithm using the case study of public health surveillance over Twitter, which is a field with a lot of interest. We also describe an optimization objective function particular to the keyword selection problem, as well as metrics for evaluating Twitter keywords, namely: reach and tweet retreival power, on top of traditional metrics such as precision. In our experiments, our evolutionary computing approach achieved a tweet retreival power of 0.55, compared to 0.35 achieved by the baseline human approach.
Supported by Public Health England.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Chen, L., Hossain, K.T., Butler, P., Ramakrishnan, N., Prakash, B.A.: Syndromic surveillance of flu on Twitter using weakly supervised temporal topic models. Data Min. Knowl. Discov. 30(3), 681–710 (2016)
de Quincey, E., Kostkova, P.: Early warning and outbreak detection using social networking websites: the potential of Twitter. In: Kostkova, P. (ed.) eHealth 2009. LNICST, vol. 27, pp. 21–24. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11745-9_4
Deb, K., Padhye, N.: Improving a particle swarm optimization algorithm using an evolutionary algorithm framework. KanGAL report 2010/003 (2010)
Edo-Osagie, O., De La Iglesia, B., Lake, I., Edeghere, O.: Deep learning for relevance filtering in syndromic surveillance: a case study in asthma/difficulty breathing. In: International Conference on Pattern Recognition Applications and Methods, no. 8 (2019)
Edo-Osagie, O., Lake, I., Edeghere, O., De La Iglesia, B.: Attention-based recurrent neural networks (RNNs) for short text classification: an application in public health monitoring. In: Rojas, I., Joya, G., Catala, A. (eds.) IWANN 2019. LNCS, vol. 11506, pp. 895–911. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20521-8_73
Edo-Osagie, O., Smith, G., Lake, I., Edeghere, O., De La Iglesia, B.: Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance. PloS One 14(7), e0210689 (2019)
George, K.K., Kumar, C.S., Ramachandran, K., Panda, A.: Cosine distance features for improved speaker verification. Electron. Lett. 51(12), 939–941 (2015)
Jin, L., Schuler, W.: A comparison of word similarity performance using explanatory and non-explanatory texts. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 990–994 (2015)
Kennedy, J.: Particle swarm optimization. In: Encyclopedia of Machine Learning, pp. 760–766 (2010)
Kiritchenko, S., Jiline, M.: Keyword optimization in sponsored search via feature selection. In: New Challenges for Feature Selection in Data Mining and Knowledge Discovery, pp. 122–134 (2008)
Lee, D., Kim, K.: Web site keyword selection method by considering semantic similarity based on word2vec. J. Soc. e-Bus. Stud. 23(2) (2019)
Liang, J., Yang, H., Gao, J., Yue, C., Ge, S., Qu, B.: MOPSO-based CNN for keyword selection on Google ads. IEEE Access 7, 125387–125400 (2019)
Liu, A., Srikanth, M., Adams-Cohen, N., Alvarez, R.M., Anandkumar, A.: Finding social media trolls: dynamic keyword selection methods for rapidly-evolving online debates. arXiv preprint arXiv:1911.05332 (2019)
Luong, T., Socher, R., Manning, C.: Better word representations with recursive neural networks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 104–113 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Morstatter, F., Pfeffer, J., Liu, H., Carley, K.M.: Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s firehose. In: Seventh International AAAI Conference on Weblogs and Social Media (2013)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Triple, S.: Assessment of syndromic surveillance in Europe. Lancet (London, England) 378(9806), 1833 (2011)
Umapathy, P., Venkataseshaiah, C., Arumugam, M.S.: Particle swarm optimization with various inertia weight variants for optimal power flow solution. Discrete Dyn. Nat. Soc. 2010, 1–15 (2010). https://doi.org/10.1155/2010/462145
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Edo-Osagie, O., Iglesia, B.D.L., Lake, I., Edeghere, O. (2020). An Evolutionary Approach to Automatic Keyword Selection for Twitter Data Analysis. In: de la Cal, E.A., Villar Flecha, J.R., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2020. Lecture Notes in Computer Science(), vol 12344. Springer, Cham. https://doi.org/10.1007/978-3-030-61705-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-61705-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61704-2
Online ISBN: 978-3-030-61705-9
eBook Packages: Computer ScienceComputer Science (R0)