R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora

Blamey, Ben; Crick, Tom; Oatley, Giles

doi:10.1007/978-1-4471-4739-8_16

Ben Blamey³,
Tom Crick³ &
Giles Oatley³

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

964 Accesses

Abstract

Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Literature Review on N-Gram Text Classification Models for Hotel Reviews Sentiment Analysis

Sentiment Analysis Based on Machine-Learning Classifiers for Datasets

Sentiment Analysis of IMDb Movie Reviews: A Comparative Analysis of Feature Selection and Feature Extraction Techniques

References

M. Hagiwara. Unnatural Language Processing Contest 2nd will be held at NLP2011 (2010). URL http://bit.ly/dGvUnR
A. Ritter, S. Clark, Mausam, O. Etzioni, in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (ACL, Edinburgh, Scotland, UK., 2011), pp. 1524– 1534
Google Scholar
S. Brody, N. Diakopoulos, in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (ACL, Edinburgh, Scotland, UK., 2011), pp. 562–570
Google Scholar
C. Shannon, Bell System Technical Journal (27), 379 (1948)
Google Scholar
D. Klein, J. Smarr, H. Nguyen, C. Manning, in Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2003), CONLL ’03, pp. 180–183. DOI 10.3115/1119176. 1119204
N. Xue, Computational Linguistics and Chinese Language Processing 8, 29 (2003)
Google Scholar
F. Peng, D. Schuurmans, S. Wang, V. Keselj, in Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2003), EACL ’03, pp. 267–274. DOI 10.3115/1067807.1067843
W.B. Cavnar, J.M. Trenkle, in Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval (1994), pp. 161–175
Google Scholar
S. Raaijmakers, W. Kraaij, in ICWSM (2008)
Google Scholar
R. Pon, A. C’ardenas, D. Buttler, T. Critchlow, in Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on (2007), pp. 354–361. DOI 10.1109/CIDM.2007. 368896. URL http://dx.doi.org/10.1109/CIDM.2007.368896
Q. Ye, Z. Zhang, R. Law, Expert Syst. Appl. 36(3), 6527 (2009). DOI 10.1016/j.eswa.2008. 07.035
F. Peng, D. Schuurmans, S. Wang, in Proc. of HLT-NAACL 03 (2003), pp. 110–117
Google Scholar
B. Carpenter. Yahoo group message discussion (2010). URL http://tech.dir.groups.yahoo. com/group/LingPipe/message/917
K. Rybina, Sentiment analysis of contexts around query terms in documents. Master’s thesis (2012)
Google Scholar
A. Go, R. Bhayani, L. Huang, Processing 150(12), 1 (2009)
Google Scholar
A. Pak, P. Paroubek, in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10) (ELRA, Valletta, Malta, 2010)
Google Scholar
D. Bespalov, B. Bai, Y. Qi, A. Shokoufandeh, in Proceedings of the 20th ACM international conference on Information and knowledge management (ACM, New York, NY, USA, 2011), CIKM ’11, pp. 375–382
Google Scholar
F.M..B.R. Pennebaker, J.W. Linguistic inquiry and word count: Liwc2001 (2001)
Google Scholar
A.D.I. Kramer. Facebook gross national happiness application (2010). URL http://www. facebook.com/gnh/
J. Read, Proceedings of the ACL Student ResearchWorkshop on ACL 05 43(June), 43 (2005)
Google Scholar
B. Pang, L. Lee, S. Vaithyanathan, in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, 2002), pp. 79–86
Google Scholar
S. Das, M. Chen, in Asia Pacific Finance Assc. Annual Conf. (APFA) (2001)
Google Scholar
T. Joachims, Making large-scale SVM learning practical (MIT press, 1999)
Google Scholar
Alias-i. Lingpipe 4.1.0 (2008). URL http://alias-i.com/lingpipe
B. Carpenter, in Proceedings of the Workshop on Software (Association for Computational Linguistics, Stroudsburg, PA, USA, 2005), Software ’05, pp. 86–99
Google Scholar

Download references

Author information

Authors and Affiliations

Cardiff Metropolitan University, Western Avenue, Cardiff, CF5 2YB, UK
Ben Blamey, Tom Crick & Giles Oatley

Authors

Ben Blamey
View author publications
You can also search for this author in PubMed Google Scholar
Tom Crick
View author publications
You can also search for this author in PubMed Google Scholar
Giles Oatley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ben Blamey .

Editor information

Editors and Affiliations

School of Computing, University of Portsmouth, Whitepost Lane The Lilacs, Portsmouth, PO1 3AH, Hampshire, United Kingdom
Max Bramer
School of Computing, Engineering & Mathe, University of Brighton, Lewes Road, Brighton, BN2 4GJ, West Sussex, United Kingdom
Miltos Petridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blamey, B., Crick, T., Oatley, G. (2012). R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_16

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4739-8_16
Published: 09 October 2012
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4738-1
Online ISBN: 978-1-4471-4739-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics