Abstract
Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. Hagiwara. Unnatural Language Processing Contest 2nd will be held at NLP2011 (2010). URL http://bit.ly/dGvUnR
A. Ritter, S. Clark, Mausam, O. Etzioni, in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (ACL, Edinburgh, Scotland, UK., 2011), pp. 1524– 1534
S. Brody, N. Diakopoulos, in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (ACL, Edinburgh, Scotland, UK., 2011), pp. 562–570
C. Shannon, Bell System Technical Journal (27), 379 (1948)
D. Klein, J. Smarr, H. Nguyen, C. Manning, in Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2003), CONLL ’03, pp. 180–183. DOI 10.3115/1119176. 1119204
N. Xue, Computational Linguistics and Chinese Language Processing 8, 29 (2003)
F. Peng, D. Schuurmans, S. Wang, V. Keselj, in Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2003), EACL ’03, pp. 267–274. DOI 10.3115/1067807.1067843
W.B. Cavnar, J.M. Trenkle, in Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval (1994), pp. 161–175
S. Raaijmakers, W. Kraaij, in ICWSM (2008)
R. Pon, A. C’ardenas, D. Buttler, T. Critchlow, in Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on (2007), pp. 354–361. DOI 10.1109/CIDM.2007. 368896. URL http://dx.doi.org/10.1109/CIDM.2007.368896
Q. Ye, Z. Zhang, R. Law, Expert Syst. Appl. 36(3), 6527 (2009). DOI 10.1016/j.eswa.2008. 07.035
F. Peng, D. Schuurmans, S. Wang, in Proc. of HLT-NAACL 03 (2003), pp. 110–117
B. Carpenter. Yahoo group message discussion (2010). URL http://tech.dir.groups.yahoo. com/group/LingPipe/message/917
K. Rybina, Sentiment analysis of contexts around query terms in documents. Master’s thesis (2012)
A. Go, R. Bhayani, L. Huang, Processing 150(12), 1 (2009)
A. Pak, P. Paroubek, in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10) (ELRA, Valletta, Malta, 2010)
D. Bespalov, B. Bai, Y. Qi, A. Shokoufandeh, in Proceedings of the 20th ACM international conference on Information and knowledge management (ACM, New York, NY, USA, 2011), CIKM ’11, pp. 375–382
F.M..B.R. Pennebaker, J.W. Linguistic inquiry and word count: Liwc2001 (2001)
A.D.I. Kramer. Facebook gross national happiness application (2010). URL http://www. facebook.com/gnh/
J. Read, Proceedings of the ACL Student ResearchWorkshop on ACL 05 43(June), 43 (2005)
B. Pang, L. Lee, S. Vaithyanathan, in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, 2002), pp. 79–86
S. Das, M. Chen, in Asia Pacific Finance Assc. Annual Conf. (APFA) (2001)
T. Joachims, Making large-scale SVM learning practical (MIT press, 1999)
Alias-i. Lingpipe 4.1.0 (2008). URL http://alias-i.com/lingpipe
B. Carpenter, in Proceedings of the Workshop on Software (Association for Computational Linguistics, Stroudsburg, PA, USA, 2005), Software ’05, pp. 86–99
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this paper
Cite this paper
Blamey, B., Crick, T., Oatley, G. (2012). R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_16
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4739-8_16
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4738-1
Online ISBN: 978-1-4471-4739-8
eBook Packages: Computer ScienceComputer Science (R0)