Abstract
Natural language processing (NLP), or the pragmatic research perspective of computational linguistics, has become increasingly powerful due to data availability and various techniques developed in the past decade. This increasing capability makes it possible to capture sentiments more accurately and semantics in a more nuanced way. Naturally, many applications are starting to seek improvements by adopting cutting-edge NLP techniques. Financial forecasting is no exception. As a result, articles that leverage NLP techniques to predict financial markets are fast accumulating, gradually establishing the research field of natural language based financial forecasting (NLFF), or from the application perspective, stock market prediction. This review article clarifies the scope of NLFF research by ordering and structuring techniques and applications from related work. The survey also aims to increase the understanding of progress and hotspots in NLFF, and bring about discussions across many different disciplines.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Anton M, Polk C (2014) Connected stocks. J Finance 69(3):1099–1127
Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of internet stock message boards. J Finance 59(3):1259–1294
Avramov D, Zhou G (2010) Bayesian portfolio analysis. Annu Rev Financ Econ 2:25–47
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: 7th language resources and evaluation conference, pp 2200–2204
Banko M, Cafarella MJ, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the web. In: International joint conference on artificial intelligence, pp 2670–2676
Bao T, Hommes C, Makarewicz T (2015) Bubble formation and (in)efficient markets in learning-to-forecast and -optimise experiments. Tinbergen Institute Discussion Paper TI 2015-107/II. https://www.econstor.eu/bitstream/10419/125108/1/15107.pdf
Bengio Y, Ducharme R, Vincent P (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84
Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8
Bouchey P, Nemtchinov V, Wong TKL (2015) Volatility harvesting in theory and practice. J Wealth Manage 18(3):89–100
Brabazon A, O’Neill M (2008) An introduction to evolutionary computation in finance. IEEE Comput Intell Mag 3(4):42–55
Brachman RJ, Khabaza T et al (1996) Mining business databases. Commun ACM 39(11):42–48
Brown GW, Cliff MT (2004) Investor sentiment and the near-term stock market. J Empir Finance 11:1–27
Bühler K (1934) Sprachtheorie. Fischer, Jena
Cambria E (2013) An introduction to concept-level sentiment analysis. In: Lecture notes in computer science (LNCS), vol 8266. Springer, pp 478–483
Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107
Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57
Cambria E, Livingstone A, Hussain A (2012) The hourglass of emotions. In: Lecture notes in computer science, vol 7403. Springer, pp 144–157
Cambria E, Wang H, White B (2014) Guest editorial: big social data analysis. Knowl-Based Syst 69:1–2
Cambria E, Poria S, Bajpai R, Schuller B (2016) SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: International conference on computational linguistics (COLING), pp 2666–2677
Cambria E, Poria S, Gelbukh A, Thelwall M (2017) Sentiment analysis is a big suitcase. IEEE Intell Syst 32(6):74–80
Cavalcante RC, Brasileiro RC, Souza VL, Nobrega JP, Oliveira AL (2016) Computational intelligence and financial markets: a survey and future directions. Expert Syst Appl 55:194–211
Chan SW, Chong MW (2017) Sentiment analysis in financial texts. Decis Support Syst 94:53–64
Chan S, Franklin J (2011) A text-based decision support system for financial sequence prediction. Decis Support Syst 52(1):189–198
Chang CY, Zhang Y, Teng Z, Bozanic Z, Ke B (2016) Measuring the information content of financial news. In: Proceedings of the the 26th international conference on computational linguistics
Chaturvedi I, Ong YS, Tsang I, Welsch R, Cambria E (2016) Learning word dependencies in text by means of a deep recurrent belief network. Knowl-Based Syst 108:144–154
Chaturvedi I, Ragusa E, Gastaldo P, Zunino R, Cambria E (2017) Bayesian network based extreme learning machine for subjectivity detection. J Frankl Inst. https://doi.org/10.1016/j.jfranklin.2017.06.007
Chen N, Ribeiro B, Chen A (2016) Financial credit risk assessment: a recent review. Artif Intell Rev 45:1–23
Choi H, Varian H (2012) Predicting the present with google trends. Econ Rec 88(1):2–9
Chomsky N (1956) Three models for the description of language. IRE Trans Inf Theory 2(3):113–124. https://doi.org/10.1109/TIT.1956.1056813
Cohen L, Frazzini A (2008) Economic links and predictable returns. J Finance 63(4):1977–2011
Das SR, Chen MY (2007) Yahoo! for amazon: sentiment extraction from small talk on the web. Manage Sci 53(9):1375–1388
Ding X (2016) Research on methodology of market trends prediction based on social media. Ph.D. thesis, Harbin Institute of Technology
Ding X, Zhang Y, Liu T, Duan J (2015) Deep learning for event-driven stock prediction. In: International joint conference on artificial intelligence
Dong L, Wang Z, Xiong D (2017) Stock market prediction based on text information. Acta Scientiarum Naturalium Universitatis Pekinesis. https://doi.org/10.13209/j.0479-8023.2017.037
Fama EF (1970) Efficient capital markets: a review of theory and empirical work. J Finance 25:383–417
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge
Frazier KB, Ingram RW, Tennyson BM (1984) A methodology for the analysis of narrative accounting disclosures. J Account Res 22(1):318–331
Fung GPC, Yu JX, Lam W (2003) Stock prediction: integrating text mining approach using real-time news. In: 2003 IEEE international conference on computational intelligence for financial engineering, 2003. Proceedings, pp 395–402. https://doi.org/10.1109/CIFER.2003.1196287
Groth SS, Muntermann J (2011) An intraday market risk management approach based on textual analysis. Decis Support Syst 50(4):680–691
Guha RV, Lenat DB (1990) Cyc: a midterm report. AI Mag 11(3):32–59
Hagenau M, Liebmann M, Neumann D (2013) Automated news reading: stock price prediction based on financial news using context-capturing features. Decis Support Syst 55(3):685–697. https://doi.org/10.1016/j.dss.2013.02.006
Hajizadeh E, Ardakani HD, Shahrabi J (2010) Application of data mining techniques in stock markets: a survey. J Econ Int Finance 2(7):109–118
Hamilton WL, Clark K, Leskovec J, Jurafsky D (2016) Inducing domain-specific sentiment lexicons from unlabeled corpora. In: Empirical methods in natural language processing (EMNLP), pp 595–605
Harmer GP, Abbott D (1999) Parrondo’s paradox. Stat Sci 14(2):206–213
Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), pp 174–181
Henry E (2008) Are investors influenced by how earnings press releases are written? Int J Bus Commun 45:363–407
Heston SL, Sinha NR (2016) News versus sentiment: predicting stock returns from news stories. Technical Report 2016-048: Board of Governors of the Federal Reserve System, Washington
Hofman JM, Sharma A, Watts DJ (2017) Prediction and explanation in social systems. Science 355(6324):486–488
Hommes CH (2006) Heterogeneous agent models in economics and finance. In: Tesfatsion L, Judd K (eds) Handbook of computational economics II: agent-based economics. Elsevier, pp 1109–86
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–177
Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688
Kelly EF (1975) Computer recognition of English word senses. Elsevier, Amsterdam
Kittrell J (2011) Sentiment reversals as buy signals. Wiley, Hoboken, pp 231–244. https://doi.org/10.1002/9781118467411.ch9
Koleva N, Paiva D (2009) Copula-based regression models: a survey. J Stat Plan Inference 139(11):3847–3856. https://doi.org/10.1016/j.jspi.2009.05.023
Kumar BS, Ravi V (2016) A survey of the applications of text mining in financial domain. Knowl-Based Syst 114:128–147
Lakonishok J, Maberly E (1990) The weekend effect: trading patterns of individual and institutional investors. J Finance 40:231–243
Lavrenko V, Schmill M, Lawrie D, Ogilvie P, Jensen D, Allan J (2000) Language models for financial news recommendation. In: Proceedings of the ninth international conference on information and knowledge management, pp 389–396
LeBaron B, Arthur W, Palmer R (1999) Time series properties of an artificial stock market. J Econ Dyn Control 23:1487–1516
Leetaru K, Schrodt PA (2013) Gdelt: global data on events, location, and tone, 1979–2012. In: ISA annual convention, vol 2. Citeseer
Li B, Hoi SCH (2014) Online portfolio selection: a survey. ACM Comput Surv 46(3). https://doi.org/10.1145/2512962
Li Q, Wang T, Gong Q, Chen Y, Lin Z, Song SK (2014a) Media-aware quantitative trading based on public web information. Decis Support Syst 61:93–105
Li Q, Wang T, Li P, Liu L, Gong Q, Chen Y (2014b) The effect of news and public mood on stock movements. Inf Sci 278:826–840
Li X, Xie H, Chen L, Wang J, Deng X (2014c) News impact on stock price return via sentiment analysis. Knowl-Based Syst 69:14–23
Li B, Hoi SCH, Sahoo D, Liu ZY (2015) Moving average reversion strategy for on-line portfolio selection. Artif Intell 222:104–123
Li Q, Jiang L, Li P, Chen H (2015) Tensor-based learning for predicting stock movements. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 1784–1790
Li L, Qin B, Ren W, Liu T (2016) Truth discovery with memory network. CoRR arXiv:1611.01868
Liu H, Singh P (2004) ConceptNet—a practical commonsense reasoning tool-kit. BT Technol J 22(4):211–226
Liu C, Hoi SCH, Zhao P, Sun J (2016) Online arima algorithms for time series prediction. In: Thirtieth AAAI conference on artificial intelligence
Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-ks. J Finance 66:67–97
Loughran T, McDonald B (2016) Textual analysis in accounting and finance: a survey. J Account Res 54(4):1187–1230
Ma Y, Cambria E, Gao S (2016) Label embedding for zero-shot fine-grained named entity typing. In: COLING, pp 171–180
Majumder N, Poria S, Gelbukh A, Cambria E (2017) Deep learning based document modeling for personality detection from text. IEEE Intell Syst 32(2):74–79
Malik HH, Bhardwaj VS, Fiorletta H (2011) Accurate information extraction for quantitative financial events. In: Proceedings of the 20th ACM international conference on information and knowledge management
Marsella S, Gratch J (2014) Computationally modeling human emotion. Commun ACM 57(12):56–67
Mihalcea R, Garimella A (2016) What men say, what women hear: finding gender-specific meaning shades. IEEE Intell Syst 31(4):62–67
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. CoRR arXiv:1310.4546
Moniz A, de Jong F (2014) Classifying the influence of negative affect expressed by the financial media on investor behavior. In: Fifth information interaction in context symposium, IIiX ’14, Regensburg, Germany, 26–29 Aug 2014, pp 275–278
Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York,, pp 1089–1098. https://doi.org/10.1145/1835804.1835941
Nassirtoussi AK, Aghabozorgi S, Waha TY, Ngo DCL (2014) Text mining for market prediction: a systematic review. Expert Syst Appl 41:7653–7670
Nguyen TH, Shirai K (2015) Topic modeling based sentiment analysis on social media for stock market prediction. In: The 53rd annual meeting of the association for computational linguistics (ACL), pp 1354–1364
Nguyen TH, Shirai K, Velcin J (2015) Sentiment analysis on social media for stock movement prediction. Expert Syst Appl 42:9603–9611
Njølstad LSH (2014) Sentiment analysis for financial applications. Master’s thesis, Norwegian University of Science and Technology
Nofer M, Hinz O (2015) Using twitter to predict the stock market: where is the mood effect? Bus Inf Syst Eng 57(4):229–242
Oliveira N, Cortez P, Areal N (2016) Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decis Support Syst 85:62–73
Oliveira N, Cortez P, Areal N (2017) The impact of microblogging data for stock market prediction: using twitter to predict returns, volatility, trading volume and survey sentiment indices. Expert Syst Appl 73:125–144
Owyang J (2009) The future of the social web. Forrester Research Inc, Cambridge
Park CH, Irwin SH (2004) The profitability of technical analysis: a review. AgMAS project research report 2004-04, University of Illinois at Urbana-Champaign
Peters EE (1991) A chaotic attractor for the S&P 500. Financ Anal J 47(2):55–62+81. http://www.jstor.org/stable/4479416
Poria S, Cambria E, Gelbukh A (2016a) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108:42–49
Poria S, Cambria E, Hazarika D, Vij P (2016b) A deeper look into sarcastic tweets using deep convolutional neural networks. In: COLING, pp 1601–1612
Poria S, Chaturvedi I, Cambria E, Hussain A (2016c) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: ICDM, Barcelona, pp 439–448
Poria S, Cambria E, Bajpai R, Hussain A (2017) A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fusion 37:98–125
Qian B, Rasheed K (2004) Hurst exponent and financial market predictability. In: Proceedings of the 2nd IASTED international conference on financial engineering and applications, pp 203–209
Rachlin G, Last M, Alberg D, Kandel A (2007) Admiral: a data mining based financial trading system. In: IEEE symposium on computational intelligence and data mining
Rajput V, Bobde S (2016) Stock market forecasting techniques: literature survey. Int J Comput Sci Mob Comput 5(6):500–506
Reuters T (2016) OptiRisk: Marketpsych indices and sentiment analysis toolkit. Products Leaflets Thomson Reuters
Ruiz EJ, Hristidis V, Castillo C, Gionis A, Jaimes A (2012) Correlating financial time series with micro-blogging activity. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 513–522
Sag IA, Baldwin T, Bond F, Copestake A, Flickinger D (2002) Multiword expressions: a pain in the neck for NLP. In: Lecture notes in computer science, vol 2276, pp 1–15
Samo YLK, Vervuurt A (2016) Stochastic portfolio theory: a machine learning approach. In: Proceedings of the thirty-second conference on uncertainty in artificial intelligence (UAI)
Schneider MJ, Gupta S (2016) Forecasting sales of new and existing products using consumer reviews: a random projections approach. Int J Forecast 32:243–256
Schumaker RP, Chen H (2009) Textual analysis of stock market prediction using breaking financial news: the AZFin text system. ACM Trans Inf Syst 27(2):1–19. https://doi.org/10.1145/1462198.1462204
Schumaker RP, Zhang Y, Huang CN, Chen H (2012) Financial fraud detection using vocal, linguistic and financial cues. Decis Support Syst 53:458–464
Sehgal V, Song C (2007) Sops: stock prediction using web sentiment. In: Proceedings of the seventh IEEE international conference on data mining workshops, pp 21–26
Shacham S (1983) A shortened version of the profile of mood states. J Personal Assess 47(3):305–306
Shen W, Wang J, Ma S (2014) Doubly regularized portfolio with risk minimization. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence. AAAI Press, pp 1286–1292
Si J, Mukherjee A, Liu B, Li Q, Li H, Deng X (2013) Exploiting topic based twitter sentiment for stock prediction. In: The 51st annual meeting of the association for computational linguistics (ACL)
Si J, Mukherjee A, Liu B, Pan SJ, Li Q, Li H (2014) Exploiting social relations and sentiment for stock prediction. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1139–1145
Sowa JF (1987) Semantic networks. In: Shapiro SC (eds) Encyclopedia of artificial intelligence. Wiley, pp 1011–1024
Stein D, Bouchey P, Atwill T, Nemtchinov V (2013) Why does diversifying and rebalancing create alpha? White paper, Parametric
Tai Y, Kao H (2013) Automatic domain-specific sentiment lexicon generation with label propagation. In: The 15th international conference on information integration and web-based applications and services, Vienna, Austria
Taleb NN (2008) Finiteness of variance is irrelevant in the practice of quantitative finance. Complexity 14(3):66–76. https://doi.org/10.1002/cplx.20263
Tetlock PC, Saar-Tsechansky M, Macskassy S (2008) More than words: quantifying language to measure firms’ fundamentals. J Finance 63(3):1437–1467
Ticknor JL (2013) A bayesian regularized artificial neural network for stock market forecasting. Expert Syst Appl 40(14):5501–5506
Tkác M, Verner R (2016) Artificial neural networks in business: two decades of research. Appl Soft Comput 38:788–804
Uhl M (2014) Reuters sentiment and stock returns. J Behav Finance 15(4):287–298
Valitutti R (2004) WordNet-affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation, pp 1083–1086
Vui CS et al (2013) A review of stock market prediction with artificial neural network. In: IEEE international conference on control system, computing and engineering, pp 477–482
Wei W, Mao Y, Wang B (2016) Twitter volume spikes and stock options pricing. Comput Commun 73:271–281
Weidmann NB, Ward MD (2010) Predicting conflict in space and time. J Confl Resolut 54(6):883–901
Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005) OpinionFinder: a system for subjectivity analysis. In: Empirical methods in natural language processing (EMNLP)
Witte JH (2015) Volatility harvesting: extracting return from randomness. CoRR arXiv:1508.05241
Wuthrich B, Cho V, Leung S, Permunetilleke D, Sankaran K, Zhang J (1998) Daily stock market forecast from textual web data. In: IEEE international conference on systems, man, and cybernetics, vol 3, pp 2720–2725
Xing FZ, Cambria E, Zou X (2017) Predicting evolving chaotic time series with fuzzy neural networks. In: International joint conference on neural networks (IJCNN), pp 3176–3183
Yoshihara A, Seki K, Uehara K (2016) Leveraging temporal properties of news events for stock market prediction. Artif Intell Res 5(1):103–110
Zhang GP (2003) Time series forecasting using a hybrid arima and neural network model. Neurocomputing 50:159–175
Zhang W, Li C, Ye Y, Li W, Ngai EW (2015) Dynamic business network analysis for correlated stock price movement prediction. IEEE Intell Syst 30(2):26–33
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xing, F.Z., Cambria, E. & Welsch, R.E. Natural language based financial forecasting: a survey. Artif Intell Rev 50, 49–73 (2018). https://doi.org/10.1007/s10462-017-9588-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-017-9588-9