Skip to main content

R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora

  • Conference paper
  • First Online:
Research and Development in Intelligent Systems XXIX (SGAI 2012)

Abstract

Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. M. Hagiwara. Unnatural Language Processing Contest 2nd will be held at NLP2011 (2010). URL http://bit.ly/dGvUnR

  2. A. Ritter, S. Clark, Mausam, O. Etzioni, in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (ACL, Edinburgh, Scotland, UK., 2011), pp. 1524– 1534

    Google Scholar 

  3. S. Brody, N. Diakopoulos, in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (ACL, Edinburgh, Scotland, UK., 2011), pp. 562–570

    Google Scholar 

  4. C. Shannon, Bell System Technical Journal (27), 379 (1948)

    Google Scholar 

  5. D. Klein, J. Smarr, H. Nguyen, C. Manning, in Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2003), CONLL ’03, pp. 180–183. DOI 10.3115/1119176. 1119204

  6. N. Xue, Computational Linguistics and Chinese Language Processing 8, 29 (2003)

    Google Scholar 

  7. F. Peng, D. Schuurmans, S. Wang, V. Keselj, in Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2003), EACL ’03, pp. 267–274. DOI 10.3115/1067807.1067843

  8. W.B. Cavnar, J.M. Trenkle, in Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval (1994), pp. 161–175

    Google Scholar 

  9. S. Raaijmakers, W. Kraaij, in ICWSM (2008)

    Google Scholar 

  10. R. Pon, A. C’ardenas, D. Buttler, T. Critchlow, in Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on (2007), pp. 354–361. DOI 10.1109/CIDM.2007. 368896. URL http://dx.doi.org/10.1109/CIDM.2007.368896

  11. Q. Ye, Z. Zhang, R. Law, Expert Syst. Appl. 36(3), 6527 (2009). DOI 10.1016/j.eswa.2008. 07.035

  12. F. Peng, D. Schuurmans, S. Wang, in Proc. of HLT-NAACL 03 (2003), pp. 110–117

    Google Scholar 

  13. B. Carpenter. Yahoo group message discussion (2010). URL http://tech.dir.groups.yahoo. com/group/LingPipe/message/917

  14. K. Rybina, Sentiment analysis of contexts around query terms in documents. Master’s thesis (2012)

    Google Scholar 

  15. A. Go, R. Bhayani, L. Huang, Processing 150(12), 1 (2009)

    Google Scholar 

  16. A. Pak, P. Paroubek, in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10) (ELRA, Valletta, Malta, 2010)

    Google Scholar 

  17. D. Bespalov, B. Bai, Y. Qi, A. Shokoufandeh, in Proceedings of the 20th ACM international conference on Information and knowledge management (ACM, New York, NY, USA, 2011), CIKM ’11, pp. 375–382

    Google Scholar 

  18. F.M..B.R. Pennebaker, J.W. Linguistic inquiry and word count: Liwc2001 (2001)

    Google Scholar 

  19. A.D.I. Kramer. Facebook gross national happiness application (2010). URL http://www. facebook.com/gnh/

  20. J. Read, Proceedings of the ACL Student ResearchWorkshop on ACL 05 43(June), 43 (2005)

    Google Scholar 

  21. B. Pang, L. Lee, S. Vaithyanathan, in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, 2002), pp. 79–86

    Google Scholar 

  22. S. Das, M. Chen, in Asia Pacific Finance Assc. Annual Conf. (APFA) (2001)

    Google Scholar 

  23. T. Joachims, Making large-scale SVM learning practical (MIT press, 1999)

    Google Scholar 

  24. Alias-i. Lingpipe 4.1.0 (2008). URL http://alias-i.com/lingpipe

  25. B. Carpenter, in Proceedings of the Workshop on Software (Association for Computational Linguistics, Stroudsburg, PA, USA, 2005), Software ’05, pp. 86–99

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben Blamey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag London

About this paper

Cite this paper

Blamey, B., Crick, T., Oatley, G. (2012). R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4739-8_16

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4738-1

  • Online ISBN: 978-1-4471-4739-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics