Abstract:
The shift to social media platforms like Twitter during environmental hazards and emergencies has expanded recently. Yet, the classification of situational awareness twee...Show MoreMetadata
Abstract:
The shift to social media platforms like Twitter during environmental hazards and emergencies has expanded recently. Yet, the classification of situational awareness tweet based on people post is a complicated process due to the high dimensionality of features. In this empirical study, A framework using machine learning and Natural Language Processing techniques was developed for two-stage binary classification of Twitter data. The First stage consists of four models: Random Forest, Support Vector Machine, Naive Bayes and Decision Trees. Whereas, the second stage includes an ensemble learning approach. Text features - TFIDF (term frequency, inverse document frequency), psychometric, and linguistic - were analyzed as predictors of binary classification to categorize each tweet as situational relevant or irrelevant automatically. A manually built and labeled dataset of 4,000 tweets were analyzed for situational awareness of environmental health hazards in Barbados from water, mosquito-borne diseases, and sewage during the period 2014 - 2018. Based on the experiment, our model was able to achieve over 85% accuracy on classifying tweets that contribute to situational awareness. Furthermore, the results indicate that applying ensemble learning in the second stage showed superior results compared to the combined features-based classification models.
Published in: 2019 IEEE International Systems Conference (SysCon)
Date of Conference: 08-11 April 2019
Date Added to IEEE Xplore: 16 September 2019
ISBN Information: