Skip to main content
Log in

Big data analytics for data-driven industry: a review of data sources, tools, challenges, solutions, and research directions

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The study of big data analytics (BDA) methods for the data-driven industries is gaining research attention and implementation in today’s industrial activities, business intelligence, and rapidly changing the perception of industrial revolutions. The uniqueness of big data and BDA has created unprecedented new research calls to solve data generation, storage, visualization, and processing challenges. There are significant gaps in knowledge for researchers and practitioners on the right information and BDA tools to extract knowledge in large significant industrial data that could help to handle big data formats. Notwithstanding various research efforts and scholarly studies that have been proposed recently on big data analytic processes for industrial performance improvements. Comprehensive review and systematic data-driven analysis, comparison, and rigorous evaluation of methods, data sources, applications, major challenges, and appropriate solutions are still lacking. To fill this gap, this paper makes the following contributions: presents an all-inclusive survey of current trends of BDA tools, methods, their strengths, and weaknesses. Identify and discuss data sources and real-life applications where BDA have potential impacts. Other main contributions of this paper include the identification of BDA challenges and solutions, and future research prospects that require further attention by researchers. This study provides an insightful recommendation that could assist researchers, industrial practitioners, big data providers, and governments in the area of BDA on the challenges of the current BDA methods, and solutions that would alleviate these challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

Not applicable.

Code availability

Not applicable.

References

  1. Khan, S., Shakil, K., Alam, M.: PABED a tool for big education data analysis. In: IEEE International Conference on Industrial Technology (ICIT), pp. 1808.00334 (2019)

  2. Vidhya, K., Shanmugalakshmi, R.: Modified adaptive neuro-fuzzy inference system (M-ANFIS) based multi-disease analysis of healthcare Big Data. J. Supercomput. 76, 1–22 (2020)

    Article  Google Scholar 

  3. Chiroma, H., Herawan, T.: Soft computing approach for predicting OPEC countries’ oil consumption. Int. J. Oil Gas Coal Technol. 15, 298–316 (2017). https://doi.org/10.1504/IJOGCT.2017.10005334

    Article  Google Scholar 

  4. Yang, R., Yu, L., Zhao, Y., Yu, H., Xu, G., Wu, Y.: Big data analytics for financial market volatility forecast based on support vector machine. Int. J. Inf. Manag. 50, 452–462 (2020). https://doi.org/10.1016/j.ijinfomgt.2006.01.003

    Article  Google Scholar 

  5. Limba, T.: Industry 4.0 and national security: the phenomenon of disruptive technology. Entrep. Sustain. Issues 6, 1528–1535 (2019)

    Google Scholar 

  6. Alharthi, A., Krotov, V., Bowman, M.: Addressing barriers to big data. Bus. Horiz. 60, 285–292 (2017). https://doi.org/10.1016/j.bushor.2017.01.002

    Article  Google Scholar 

  7. Pejic-Bach, M., Bertoncel, T., Meško, M., Krstić, Ž: Management text mining of industry 4.0 job advertisements. Int. J. Inf. Manag. 50, 416–431 (2020)

    Article  Google Scholar 

  8. Gröger, C.: Building an Industry 4.0 analytics platform: practical challenges, approaches and future research directions. Datenbank-Spektr. 18, 5–14 (2018). https://doi.org/10.1007/s13222-018-0273-1

    Article  Google Scholar 

  9. Oussous, A., Benjelloun, F., Ait, A., Belfkih, S.: Big Data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. 30, 431–448 (2018). https://doi.org/10.1016/j.jksuci.2017.06.001

    Article  Google Scholar 

  10. Bao, R., Chen, Z., Obaidat, M.S.: Challenges and techniques in big data security and privacy: a review. Secur. Priv. 1, e13 (2018). https://doi.org/10.1002/spy2.13

    Article  Google Scholar 

  11. Jain, P., Gyanchandani, M., Khare, N.: Big data privacy: a technological perspective and review. J. Big Data 1, 1–25 (2016). https://doi.org/10.1186/s40537-016-0059-y

    Article  Google Scholar 

  12. Andrew, C.: What will We Do When the World’s Data Hits 163 Zettabytes in 2025? (2017)

  13. Reinsel, D., Gantz, J., Rydning, J.: Data Age 2025: the evolution of data to life-critical. https://assets.ey.com/content/dam/ey-sites/ey-com/en_gl/topics/workforce/Seagate-WP-DataAge2025-March-2017.pdf

  14. Timothy, K.: Total WW data to reach 163 ZB by 2025. https://solutionsreview.com/data-management/idc-data-creation-to-reach-163-zettabytes-by-2025/

  15. Khan, N., Alsaqer, M., Shah, H., Badsha, G., Abbasi, A.A., Salehian, S.: The 10 Vs, issues and challenges of big data. In: Proceedings of the 2018 International Conference on Big Data and Education, pp. 52–56. ACM (2018)

  16. Panimalar, A., Shree, V., Kathrine, V.: The 17 V’s of big data. Int. Res. J. Eng. Technol. 04, 329–333 (2017)

    Google Scholar 

  17. Shafer, T.: The 42 V’s of Big Data and Data Science. https://www.kdnuggets.com/2017/04/42-vs-big-data-data-science.html

  18. Lv, Z., Song, H., Basanta-Val, P., Steed, A., Jo, M.: Next-generation big data analytics: state of the art, challenges, and future research topics. IEEE Trans. Ind. Inform. 13, 1891–1899 (2017). https://doi.org/10.1109/TII.2017.2650204

    Article  Google Scholar 

  19. Tsai, C.W., Lai, C.F., Chao, H.C., Vasilakos, A.: V: Big data analytics: a survey. J. Big Data (2015). https://doi.org/10.1186/s40537-015-0030-3

    Article  Google Scholar 

  20. Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Ullah Khan, S.: The rise of “big data” on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2015). https://doi.org/10.1016/j.is.2014.07.006

    Article  Google Scholar 

  21. Landset, S., Khoshgoftaar, T.M., Richter, A.N., Hasanin, T.: A survey of open source tools for machine learning with big data in the Hadoop ecosystem. J. Big Data 2, 1–36 (2015). https://doi.org/10.1186/s40537-015-0032-1

    Article  Google Scholar 

  22. Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017). https://doi.org/10.1016/j.jbusres.2016.08.001

    Article  Google Scholar 

  23. Mohamed, A., Khanian, M., Yap, N., Wah, B.: The state of the art and taxonomy of big data analytics: view from new big data framework. Artif. Intell. Rev. (2019). https://doi.org/10.1007/s10462-019-09685-9

    Article  Google Scholar 

  24. Cui, Y., Kara, S., Chan, K.C.: Manufacturing big data ecosystem: a systematic literature review. Robot. Comput. Integr. Manuf. 62, 101861 (2020). https://doi.org/10.1016/j.omega.2004.06.002

    Article  Google Scholar 

  25. Nguyen, T., Gosine, R.G., Warrian, P.: A systematic review of big data analytics for oil and gas Industry 4.0. IEEE Access 8, 61183–61201 (2020)

    Article  Google Scholar 

  26. Fournier-Viger, P., Lin, J.C.W., Vo, B., Chi, T.T., Zhang, J., Le, H.B.: A survey of itemset mining. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. e1207, 7–4 (2017)

    Google Scholar 

  27. Gan, W., Lin, J.C.W., Fournier-Viger, P., Chao, H.C., Yu, P.S.: A survey of parallel sequential pattern mining. ACM Trans. Knowl. Discov. Data 13, 1–34 (2019)

    Article  Google Scholar 

  28. Gan, W., Lin, J.C.W., Fournier-Viger, P., Chao, H.C., Tseng, V.S., Philip, S.Y.: A survey of utility-oriented pattern mining. IEEE Trans. Knowl. Data Eng. 33, 1306–1327 (2019)

    Article  Google Scholar 

  29. Gan, W., Lin, J.C.W., Chao, H.C., Zhan, J.: Data mining in distributed environment: a survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. e1216, 7–6 (2017)

    Google Scholar 

  30. Ajah, I.A., Nweke, H.F.: Big data and business analytics: trends, platforms, success factors and applications. Big Data Cogn. Comput. 3, 32 (2019). https://doi.org/10.3390/bdcc3020032

    Article  Google Scholar 

  31. Al-Sai, Z.A., Abualigah, L.M.: Big data and E-government: a review. In: ICIT 2017—8th International Conference on Information Technology, Proceedings, pp. 580–587 (2017)

  32. Azeem, M., Haleem, A., Bahl, S., Javaid, M., Suman, R., Nandan, D.: Big data applications to take up major challenges across manufacturing industries: a brief review. Mater. Today Proc. 49(2), 339–348 (2021)

    Google Scholar 

  33. Knobbe, A.J., Cunha, S.A., Torres, R.S.: Unlocking the potential of big data to support tactical performance analysis in professional soccer: a systematic review. Eur. J. Sports Sci. (2021). https://doi.org/10.1080/17461391.2020.1747552

    Article  Google Scholar 

  34. Fathi, M., Haghi, M., Seyed, K., Jameii, M., Mahdipour, E.: Big data analytics in weather forecasting: a systematic review. Arch. Comput. Methods Eng. 29, 1247–1275 (2021)

    Article  Google Scholar 

  35. Andronie, M., George, L., Iatagan, M., Hurloiu, I., Dijm, I.: Sustainable cyber–physical production systems in big data-driven smart urban economy: a systematic literature review. Sustainability 13(2), 751 (2021)

    Article  Google Scholar 

  36. Kitchenham, B.: Procedures for Performing Systematic Literature Reviews, pp. 1–26. Keele University, Keele (2004)

    Google Scholar 

  37. Verma, C., Pandey, R.: Big Data representation for grade analysis through Hadoop framework. In: Proceedings of the 2016 6th International Conference, Cloud System and Big Data Engineering (Confluence) 2016, pp. 312–315. IEEE (2016). https://doi.org/10.1109/CONFLUENCE.2016.7508134

  38. Kesden, G.: HDFS Architecture. http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

  39. Borthakur, D.: HDFS Architecture Guide: Hadoop Apache Project. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

  40. Bisht, P., Singh, K.: Big data security: a review of big data, security issues and solutions. Int. J. Comput. Sci. Mob. Comput. 5, 142–147 (2016)

    Google Scholar 

  41. Ketaki, S.R.: Big data analytics—Hadoop performance analysis. Master of Science, San Diego University (2014)

  42. Sagiroglu, S., Sinanc, D.: Big data: a review. In: 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 42–47. IEEE (2013)

  43. Fong, S.: Clustering big IoT data by metaheuristic optimized mini-batch and parallel partition-based DGC in Hadoop. Future Gener. Comput. Syst. 86, 1395–1412 (2018). https://doi.org/10.1016/j.future.2018.03.006

    Article  Google Scholar 

  44. Watson, H.J.: Tutorial: big data analytics: concepts, technologies, and applications. Commun. Assoc. Inf. Syst. 34, 1247–1268 (2014)

    Google Scholar 

  45. Anadiotis, G.: Big Data Management. https://www.zdnet.com/article/the-new-cloudera-hortonworks-hadoop-100-open-source-50-boring/

  46. Oliverio, J.: A survey of social media, big data, data mining, and analytics. J. Ind. Integr. Manag. 3, 1–13 (2018). https://doi.org/10.1142/S2424862218500033

    Article  Google Scholar 

  47. Zomaya, A., Sakr, S.: Handbook of Big Data Technologies. Springer, Berlin (2017)

    Book  Google Scholar 

  48. Storey, V.C., Song, I.: Data and knowledge engineering big data technologies and management: what conceptual modeling can do. Data Knowl. Eng. 108, 50–67 (2017). https://doi.org/10.1016/j.datak.2017.01.001

    Article  Google Scholar 

  49. Lopez, G., Seaton, D.T., Ang, A.: Google BigQuery for education: framework for parsing and analyzing edX MOOC data. In: Proceedings of the Fourth ACM Conference on Learning at Scale, pp. 181–184 (2017)

  50. Álvaro, R., Serrhini, M.: Information Systems and Technologies to Support Learning: Proceedings of EMENA-ISTL 2018. Springer, Cham (2019)

  51. Atkinson, K.: Big data real time ingestion and machine learning. In: 2018 IEEE Second International Conference on Data Stream Mining and Processing, pp. 25–31 (2018)

  52. Alhomsi, Y., Alsalemi, A., Al Disi, M., Bensaali, F., Amira, A., Alinier, G.: CouchDB based real-time wireless communication system for clinical simulation. In: Proceedings of the 20th International Conference on High Performance Computing and Communications. 16th International Conference on Smart City 4th International Conference on Data Science and Systems. HPCC/SmartCity/DSS 2018, pp. 1094–1098 (2019). https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00182

  53. Pollmann, T.R., Smith, B.: Database support of detector operation and data analysis in the DEAP-3600 Dark Matter experiment. Eur. Phys. J. C 79, 683 (2019)

    Article  Google Scholar 

  54. Garion, S.: Big data analytics Hadoop and Spark. Ph.D., IBM Research, Haifa, pp. 1–55 (2016)

  55. Gounaris, A., Torres, J.: A methodology for Spark parameter tuning ✩. Big Data Res. 11, 22–32 (2018). https://doi.org/10.1016/j.bdr.2017.05.001

    Article  Google Scholar 

  56. Oneto, L., Fumeo, E., Clerico, G., Canepa, R., Papa, F., Dambra, C., Mazzino, N., Anguita, D.: Train delay prediction systems: a big data analytics perspective. Big Data Res. 11, 54–64 (2018). https://doi.org/10.1016/j.bdr.2017.05.002

    Article  Google Scholar 

  57. Kim, H., Naveed, M., Goethe Rut, W., Roberto, V., Todo, I., Hevin, O., Minsung, H., Tharsis, T., Rajendra, A.: Big Data Methodologies, Tools and Infrastructures. Western Norway Research Institute (2018)

  58. Computing, C., Khan, Z., Anjum, A., Kiani, S.L.: Cloud based big data analytics for smart future cities. In: 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing, pp. 381–386 (2013)

  59. Acharjya, P.D., Ahmed, K.: A survey on big data analytics: challenges, open research issues and tools. Int. J. Adv. Comput. Sci. Appl. (2016). https://doi.org/10.14569/ijacsa.2016.070267

    Article  Google Scholar 

  60. Bharath Goda, S.: Recommender system for recipes. Issues Inf. Syst. 15, 321–327 (2014)

    Google Scholar 

  61. Hadi, M.S., Lawey, A.Q., El-Gorashi, T.E.H., Elmirghani, J.M.H.: Big data analytics for wireless and wired network design: a survey. Comput. Netw. 132, 180–199 (2018). https://doi.org/10.1016/j.comnet.2018.01.016

    Article  Google Scholar 

  62. Iyer, S., Lakhtaria, K.: Practical evaluation and comparative study. Int. J. Innov. Res. Comput. Commun. Eng. 5, 57–64 (2017)

    Google Scholar 

  63. Abuqabita, F., Al-Omoush, R., Alwidian, J.: A comparative study on big data analytics frameworks, data resources and challenges. Mod. Appl. Sci. 13, 1 (2019). https://doi.org/10.5539/mas.v13n7p1

    Article  Google Scholar 

  64. Pääkkönen, P., Pakkala, D.: Reference architecture and classification of technologies, products and services for big data systems. Big Data Res. 2, 166–186 (2015). https://doi.org/10.1016/j.bdr.2015.01.001

    Article  Google Scholar 

  65. Lakhe, B.: Practical Hadoop Migration. Apress, Berkeley (2016)

    Book  Google Scholar 

  66. Vohra, D., Vohra, D.: Using Apache Sqoop. Apress, Berkeley (2016)

    Book  Google Scholar 

  67. Linthicum, D.: Three Types of IoT Data Sources. https://www.rtinsights.com/three-types-of-iot-data-sources

  68. Das, S., Behera, R.K.: Real-time sentiment analysis of Twitter streaming data for stock prediction. In: International Conference on Computational Intelligence in Data Sciences, vol. 132, pp. 956–964 (2018). https://doi.org/10.1016/j.procs.2018.05.111

  69. Stevens, T.: Apache Flume. https://flume.apache.org/

  70. Acharjya, D.P.: A survey on big data analytics: challenges, open research issues and tools. Int. J. Adv. Comput. Sci. Appl. 7, 511–518 (2016)

    Google Scholar 

  71. Inoubli, W., Aridhi, S., Mezni, H., Jung, A.: An experimental survey on big data frameworks. Clin. Orthop. Relat. Res. (2016)

  72. Chen, Z., Chen, N., Gong, J., Sensing, R.: Environmental big data management with the Apache. In: 2015 Fourth International Conference on Agro-Geoinformatics (Agro-geoinformatics), pp. 32–35 (2015)

  73. Chaudhari, A.A., Mulay, P.: SCSI: real-time data analysis with Cassandra and Spark. In: Big Data Processing Using Spark in Cloud, pp. 237–264. Springer (2019). https://doi.org/10.1007/978-981-13-0550-4_11

  74. Techvidvan, T.: Spark Streaming—Architecture, Working and Operations. https://techvidvan.com/tutorials/spark-streaming/

  75. Xhafa, F., Naranjo, V., Caballé, S.: Processing and analytics of big data streams with Yahoo!S4. In: Proceedings of the International Conference on Advanced Information Networking and Applications, AINA, pp. 263–270 (2015). https://doi.org/10.1109/AINA.2015.194

  76. Kumar, A., Mozar, S.: Emerging trends in big data analytics—a study. In: ICCCE: International Conference on Communications and Cyber Physical Engineering 2018, pp. 1–775. Springer, Singapore (2019)

  77. Kejariwal, A.: Real time analytics: algorithms and systems. Proc. VLDB Endow. 8, 2040–2041 (2015). https://doi.org/10.14778/2824032.2824132

    Article  Google Scholar 

  78. Boykin, O., Ritchie, S., Connell, I.O., Lin, J.: SummingBird: a framework for integrating batch and online MapReduce computations. Proc. VLDB Endow. 7, 1441–1451 (2014)

    Article  Google Scholar 

  79. Erraissi, A., Tragha, A.: A comparative study of Hadoop-based big data architectures. Int. J. Web Appl. 9, 129–137 (2017)

    Google Scholar 

  80. Chen, C.L.P., Zhang, C.: Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf. Sci. (NY) 275, 314–347 (2014). https://doi.org/10.1016/j.ins.2014.01.015

    Article  Google Scholar 

  81. Iqbal, R., Doctor, F., More, B., Mahmud, S., Yousuf, U.: Big data analytics: computational intelligence techniques and application areas. Technol. Forecast. Soc. Change (2018). https://doi.org/10.1016/j.techfore.2018.03.024

    Article  Google Scholar 

  82. Anitha, A., Acharjya, D.P.: Crop suitability prediction in Vellore District using rough set on fuzzy approximation space and neural network. Neural Comput. Appl. 30, 3633–3650 (2017). https://doi.org/10.1007/s00521-017-2948-1

    Article  Google Scholar 

  83. Acharjya, D., Anitha, A.: A comparative study of statistical and rough computing models in predictive data analysis. Int. J. Ambient Comput. Intell. 8, 32–51 (2017). https://doi.org/10.4018/IJACI.2017040103

    Article  Google Scholar 

  84. Acharjya, D.P., Das, T.K.: A framework for attribute selection in marketing using rough computing and formal concept analysis. IIMB Manag. Rev. 29, 122–135 (2017). https://doi.org/10.1016/j.iimb.2017.05.002

    Article  Google Scholar 

  85. Rathi, R., Acharjya, D.P.: A rule based classification for vegetable production using rough set and genetic algorithm. Int. J. Fuzzy Syst. Appl. 7, 74–100 (2018). https://doi.org/10.4018/IJFSA.2018010106

    Article  Google Scholar 

  86. Ahmed, K.P., Acharjya, D.P.: A hybrid scheme for heart disease diagnosis using rough set and cuckoo search technique. J. Med. Syst. (2019). https://doi.org/10.1007/s10916-019-1497-9

    Article  Google Scholar 

  87. Abualigah, L., Diabat, A., Elaziz, M.A.: Intelligent workflow scheduling for Big Data applications in IoT cloud computing environments. Clust. Comput. (2021). https://doi.org/10.1007/s10586-021-03291-7

    Article  Google Scholar 

  88. Abd Elaziz, M., Abualigah, L., Attiya, I.: Advanced optimization technique for scheduling IoT tasks in cloud-fog computing environments. Future Gener. Comput. Syst. 124, 142–154 (2021). https://doi.org/10.1016/j.future.2021.05.026

    Article  Google Scholar 

  89. Iqbal, R., Doctor, F., More, B., Mahmud, S., Yousuf, U.: Big data analytics and computational intelligence for cyber–physical systems: recent trends and state of the art applications. Future Gener. Comput. Syst. 105, 766–778 (2017). https://doi.org/10.1016/j.future.2017.10.021

    Article  Google Scholar 

  90. Bello-orgaz, G., Jung, J.J., Camacho, D.: Social big data: recent achievements and new challenges. Inf. Fusion 28, 45–59 (2016). https://doi.org/10.1016/j.inffus.2015.08.005

    Article  Google Scholar 

  91. Pouyanfar, S., Yang, Y., Chen, S.: Multimedia big data analytics: a survey. ACM Comput. Surv. 51, 10–44 (2018)

    Google Scholar 

  92. Oussous, A., Benjelloun, F., Ait, A., Belfkih, S.: Big Data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. (2017). https://doi.org/10.1016/j.jksuci.2017.06.001

  93. Dumbill, E.: What is Apache Hadoop YARN? https://intellipaat.com/blog/apache-hadoop-yarn/

  94. Birjali, M., Hssane, A.B., Erritali, M.: Evaluation of high-level query languages based on MapReduce in Big Data. J. Big Data 5, 36 (2018). https://doi.org/10.1186/s40537-018-0146-3

    Article  Google Scholar 

  95. Adam, K., Adam, M., Fakharaldien, I., Zain, J.M., Majid, M.A.: Big data management and analysis. In: 3rd International Conference on Computer Engineering and Mathematical Sciences (ICCEMS 2014) (2014)

  96. Islam, M.K., Srinivasan, A.: Apache Oozie. O’Reilly Media, Inc., Sebastopol (2015)

    Google Scholar 

  97. Simpli, J.: Advantage and Disadvantage of Apache Flume. https://beyondcorner.com/learn-apache-flume/advantage-disadvantage-apache-flume/

  98. EDUCBA: Difference Between Apache Kafka and Flume (2019)

  99. Yaqoob, I., Abaker, I., Hashem, T., Gani, A., Mokhtar, S., Ahmed, E., Badrul, N., Vasilakos, A.: V: Big data: from beginning to future. Int. J. Inf. Manag. 36, 1231–1247 (2016). https://doi.org/10.1016/j.ijinfomgt.2016.07.009

    Article  Google Scholar 

  100. Rathi, R., Acharjya, D.P.: A framework for prediction using rough set and real coded genetic algorithm. Arab. J. Sci. Eng. 43, 4215–4227 (2018). https://doi.org/10.1007/s13369-017-2838-y

    Article  Google Scholar 

  101. Attaran, M., Stark, J., Stotler, D.: Opportunities and challenges for big data analytics in US higher education: a conceptual model for implementation. Ind. High. Educ. 32, 169–182 (2018). https://doi.org/10.1177/0950422218770937

    Article  Google Scholar 

  102. Buenaño-Fernández, D., Gil, D., Luján-Mora, S.: Application of machine learning in predicting performance for computer engineering students: a case study. Sustainability 11, 1–18 (2019). https://doi.org/10.3390/su11102833

    Article  Google Scholar 

  103. Pierrakeas, C., Koutsonikos, G., Lipitakis, A.D., Kotsiantis, S., Xenos, M., Gravvanis, G.A.: The Variability of the Reasons for Student Dropout in Distance Learning and the Prediction of Dropout-Prone Students. Springer, Cham (2020)

    Book  Google Scholar 

  104. Ikegwu, A.C., Nweke, H.F., Alo, U.R., Okonkwo, O.R.: HMCPAED: a new framework for students’ dropout prediction. In: ICT4NDS2021: ICT and Sustainability in the 5th Industrial Revolution, pp. 131–140. Ilorin (2021)

  105. Manco, G., Ritacco, E., Rullo, P., Gallucci, L., Astill, W., Kimber, D., Antonelli, M.: Fault detection and explanation through big data analysis on sensor streams. Expert Syst. Appl. 87, 141–156 (2017). https://doi.org/10.1016/j.eswa.2017.05.079

    Article  Google Scholar 

  106. Tortonesi, M., Govoni, M., Morelli, A., Riberto, G., Stefanelli, C., Suri, N.: Taming the IoT data deluge: an innovative information-centric service model for fog computing applications. Future Gener. Comput. Syst. 93, 888–902 (2018). https://doi.org/10.1016/j.future.2018.06.009

    Article  Google Scholar 

  107. ur Rehman, A., Fahad, M., Ullah, R., Abdullah, F.: Big data analysis and implementation in different areas using IoT. Int. J. Hyperconnect. Internet Things 1, 12–25 (2018). https://doi.org/10.4018/ijhiot.2017070102

    Article  Google Scholar 

  108. Liu, X., Shin, H., Burns, A.C.: Examining the impact of luxury brand’s social media marketing on customer engagement: using big data analytics and natural language processing. J. Bus. Res. (2019). https://doi.org/10.1016/j.jbusres.2019.04.042

    Article  Google Scholar 

  109. Kim, Y., Kim, C.K., Lee, D.K., Lee, H.W., Andrada, R.I.T.: Quantifying nature-based tourism in protected areas in developing countries by using social big data. Tour. Manag. 72, 249–256 (2019). https://doi.org/10.1016/j.tourman.2018.12.005

    Article  Google Scholar 

  110. Kaufhold, M.A., Rupp, N., Reuter, C., Habdank, M.: Mitigating information overload in social media during conflicts and crises: design and evaluation of a cross-platform alerting system. Behav. Inf. Technol. (2019). https://doi.org/10.1080/0144929X.2019.1620334

    Article  Google Scholar 

  111. Kaufhold, M.A., Gizikis, A., Reuter, C., Habdank, M., Grinko, M.: Avoiding chaotic use of social media before, during, and after emergencies: design and evaluation of citizens’ guidelines. J. Conting. Crisis Manag. 27, 198–213 (2019). https://doi.org/10.1111/1468-5973.12249

    Article  Google Scholar 

  112. Jamali, M., Nejat, A., Ghosh, S., Jin, F., Cao, G.: Management social media data and post-disaster recovery. Int. J. Inf. Manag. 44, 25–37 (2019). https://doi.org/10.1016/j.ijinfomgt.2018.09.005

    Article  Google Scholar 

  113. Nantenaina, S., Rochel, S., Luc, R.J., Victor, M.: Data Science: exploration of machine learning, data mining and big data into image recognition pattern. Int. J. Concept. Comput. Inf. Technol. 7, 6–11 (2019)

    Google Scholar 

  114. Wang, A., Yan, X., Wei, Z.: Platform-independent software package for boimage analysis. Bioinformatics 34, 3238–3240 (2018). https://doi.org/10.7717/peerj.453

    Article  Google Scholar 

  115. Rousseeuw, P.J., Raymaekers, J., Hubert, M., Rousseeuw, P.J., Raymaekers, J., Hubert, M., Measure, A., Rousseeuw, F.P.J., Raymaekers, J., Hubert, M., Rousseeuw, P.J., Raymaekers, J., Hubert, M.: A measure of directional outlyingness with applications to image data and video. J. Comput. Graph. Stat. 27, 345–359 (2018). https://doi.org/10.1080/10618600.2017.1366912

    Article  MathSciNet  MATH  Google Scholar 

  116. Tamrakar, A., Mewada, P., Gubrele, P., Prasad, R., Saurabh, P.: An ANN-based text mining approach over hash tag and blogging text data. Adv. Intell. Syst. Comput. 1057, 399–408 (2020). https://doi.org/10.1007/978-981-15-0184-5_35

    Article  Google Scholar 

  117. Wu, D., Guan, Y.: Artificial intelligence retrieval algorithm for text data from multiple data sources. Int. J. Comput. Appl. (2019). https://doi.org/10.1080/1206212X.2019.1639353

    Article  Google Scholar 

  118. Sarkar, B.K.: Big data for secure healthcare system: a conceptual design. Complex Intell. Syst. 3, 133–151 (2017). https://doi.org/10.1007/s40747-017-0040-1

    Article  Google Scholar 

  119. Forbes, H., Douglas, I., Finn, A., Breuer, J., Bhaskaran, K., Smeeth, L., Packer, S., Langan, S.M., Mansfield, K.E., Marlow, R., Whitaker, H., Warren-Gash, C.: Risk of herpes zoster after exposure to varicella to explore the exogenous boosting hypothesis: self controlled case series study using UK electronic healthcare data. Br. Med. J. (2020). https://doi.org/10.1136/bmj.l6987

    Article  Google Scholar 

  120. Nair, L.R., Shetty, S.D., Shetty, S.D.: Applying spark based machine learning model on streaming big data for health status prediction. Comput. Electr. Eng. (2017). https://doi.org/10.1016/j.compeleceng.2017.03.009

    Article  Google Scholar 

  121. Rehman, M.H.U., Ahmed, E., Yaqoob, I., Hashem, I.A.T., Imran, M., Ahmad, S.: Big data analytics in industrial IoT using a concentric computing model. IEEE Commun. Mag. 56, 37–43 (2018). https://doi.org/10.1109/MCOM.2018.1700632

    Article  Google Scholar 

  122. Rehman, M.H., Yaqoob, I., Salah, K., Imran, M., Jayaraman, P.P., Perera, C.: The role of big data analytics in industrial Internet of Things. Future Gener. Comput. Syst. 99, 247–259 (2019). https://doi.org/10.1016/j.future.2019.04.020

    Article  Google Scholar 

  123. Shi, K., Zhu, L., Zhang, C., Xu, L., Gao, F.: Blockchain-based multimedia sharing in vehicular social networks with privacy protection. Multimed. Tools Appl. (2020). https://doi.org/10.1007/s11042-019-08284-8

    Article  Google Scholar 

  124. Grazia Speranza, M.: Trends in transportation and logistics. Eur. J. Oper. Res. 264, 830–836 (2018). https://doi.org/10.1016/j.ejor.2016.08.032

    Article  MathSciNet  MATH  Google Scholar 

  125. Allam, Z., Dhunny, Z.A.: On big data, artificial intelligence and smart cities. Cities 89, 80–91 (2019)

    Article  Google Scholar 

  126. Sharma, M.M., Ali, M.S., Husain, S.: Implementation of Big Data analytics in Education Industry. IOSR J. Comput. Eng. (2018). https://doi.org/10.9790/0661-1906033639

    Article  Google Scholar 

  127. Jones, M., Collier, G., Reinkensmeyer, D.J., Deruyter, F., Dzivak, J., Zondervan, D., Morris, J.: Big data analytics and sensor-enhanced activity management to improve effectiveness and efficiency of outpatient medical rehabilitation. Int. J. Environ. Res. Public Health 17, 748 (2020)

    Article  Google Scholar 

  128. Manogaran, G., Varatharajan, R., Lopez, D., Malarvizhi, P., Sundarasekar, R., Thota, C.: A new architecture of Internet of Things and big data ecosystem for secured smart healthcare monitoring and alerting system. Future Gener. Comput. Syst. 82, 375–387 (2017). https://doi.org/10.1016/j.future.2017.10.045

    Article  Google Scholar 

  129. Firouzi, F., Rahmani, A.M., Mankodiya, K., Badaroglu, M., Merrett, G.: V: Internet-of-Things and big data for smarter healthcare: from device to architecture, applications and analytics. Future Gener. Comput. Syst. 78, 583–586 (2018). https://doi.org/10.1016/j.future.2017.09.016

    Article  Google Scholar 

  130. Wang, F., Ding, L., Yu, H., Zhao, Y.: Big data analytics on enterprise credit risk evaluation of e-Business platform. Inf. Syst. E-Bus. Manag. 18, 311–350 (2019)

    Article  Google Scholar 

  131. Anbuvizhi, R., Balakumar, V.: Credit/debit card transaction survey using MapReduce in HDFS and implementing Syferlock to prevent fraudulent. Int. J. Comput. Sci. Netw. Secur. 16, 106–110 (2016)

    Google Scholar 

  132. Gurlev, I., Yemelyanova, E., Kilmashkina, T.: Development of communication as a tool for ensuring national security in data-driven world (Russian far North case-study). Stud. Syst. Decis. Control 181, 237–248 (2019). https://doi.org/10.1007/978-3-030-01358-5_21

    Article  Google Scholar 

  133. Akhgar, B., Saathoff, G.B., Arabnia, H.R., Hill, R., Staniforth, A., Bayerl, P.S.: Application of Big Data for National Security: A Practitioner’s Guide to Emerging Technologies. Butterworth-Heinemann, New York (2015)

    Google Scholar 

  134. Al Ghamdi, A., Thomson, T.: The future of data storage: a case study with the Saudi company. J. Electr. Electron. Eng. 6, 1–11 (2018). https://doi.org/10.11648/j.jeee.20180601.11

    Article  Google Scholar 

  135. Hu, W., Lu, Z., Wu, S., Zhang, W.: Real-time transient stability assessment in power system based on improved SVM. J. Mod. Power Syst. Clean Energy 7, 26–37 (2019)

    Article  Google Scholar 

  136. Zhou, Y., Guo, Q., Sun, H., Yu, Z., Wu, J., Hao, L.: Electrical Power and Energy Systems: a novel data-driven approach for transient stability prediction of power systems considering the operational variability. Electr. Power Energy Syst. 107, 379–394 (2019)

    Article  Google Scholar 

  137. Habib, M., Yaqoob, I., Salah, K., Imran, M.: The role of big data analytics in industrial Internet of Things. Future Gener. Comput. Syst. 99, 247–259 (2019)

    Article  Google Scholar 

  138. Djenouri, Y., Srivastava, G., Belhadi, A., Lin, J.C.W.: Intelligent Blockchain management for distributed knowledge graphs in IoT 5G environments. Trans. Emerg. Telecommun. Technol. (2021). https://doi.org/10.1002/ett.4332

    Article  Google Scholar 

  139. Lin, J.C.W., Srivastava, G., Zhang, Y., Djenouri, Y., Aloqaily, M.: Privacy-preserving multi-objective sanitization model in 6G IoT environments. IEEE Internet Things J. 8, 5340–5349 (2020)

    Article  Google Scholar 

  140. Wu, J.M.T., Srivastava, G., Lin, J.C.W., Djenouri, Y., Wei, M., Parizi, R.M., Khan, M.S.: Mining of high-utility patterns in big IoT-based databases. Mob. Netw. Appl. 26, 216–233 (2021)

    Article  Google Scholar 

  141. Cheng, C.F., Chen, Y.C., Lin, J.C.W.: A carrier-based sensor deployment algorithm for perception layer in the IoT architecture. IEEE Sens. J. 20, 10295–10305 (2020)

    Article  Google Scholar 

  142. Elhoseny, H., Elhoseny, M., Riad, A.M.: A framework for big data analysis in smart cities. In: International Conference on Advanced Machine Learning Technologies and Applications, pp. 405–414. Springer, Cham (2018)

  143. Lin, J.C.W., Djenouri, Y., Srivastava, G., Fournier-Viger, P.: Mining profitable and concise patterns in large-scale Internet of Things environments. Wirel. Commun. Mob. Comput. 2021, 16 (2021)

    Article  Google Scholar 

  144. Srivastava, G., Lin, J.C.W., Zhang, X., Li, Y.: Large scale high utility sequential pattern analytics in IoT. IEEE Internet Things J. 8(16), 12669–12678 (2020)

    Article  Google Scholar 

  145. Lin, J.C.W., Djenouri, Y., Srivastava, G.: Efficient closed high-utility pattern fusion model in large-scale databases. Inf. Fusion (2021). https://doi.org/10.1016/j.inffus.2021.05.011

    Article  Google Scholar 

  146. Wu, J.M.T., Wei, M., Srivastava, G., Chen, C.M., Lin, J.C.W.: Mining large-scale high utility patterns in vehicular ad hoc network environments. Trans. Emerg. Telecommun. Technol. (2020). https://doi.org/10.1002/ett.4168

    Article  Google Scholar 

  147. Da Xu, L., Duan, L.: Big data for cyber physical systems in Industry 4.0: a survey. Enterp. Inf. Syst. (2018). https://doi.org/10.1080/17517575.2018.1442934

    Article  Google Scholar 

  148. Matsuoka, S.: Cambrian explosion of computing and big data in the post-Moore era. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, pp. 105–105 (2019)

  149. Shah, N.D., Steyerberg, E.W., Kent, D.M.: Big data and predictive analytics: recalibrating expectations. JAMA Netw. 320, 27–29 (2018). https://doi.org/10.1001/jama.2018.5602

    Article  Google Scholar 

  150. Puri, G.D., Haritha, D.: Survey big data analytics, applications and privacy concerns. Indian J. Sci. Technol. 9, 1–8 (2016). https://doi.org/10.17485/ijst/2016/v9i17/93028

    Article  Google Scholar 

  151. Marjani, M., Nasaruddin, F., Gani, A., Karim, A., Hashem, I.A.T., Siddiqa, A., Yaqoob, I.: Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE Access 5, 5247–5261 (2017). https://doi.org/10.1109/ACCESS.2017.2689040

    Article  Google Scholar 

  152. Liu, X., Tamminen, S., Su, X., Riekki, J.: Enhancing veracity of IoT generated big data in decision making. In: 2018 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Work), pp. 149–154 (2018)

  153. Daniel, B.K.: Big Data and data science: a critical review of issues for educational research. Br. J. Educ. Technol. 50, 101–113 (2019). https://doi.org/10.1111/bjet.12595

    Article  Google Scholar 

  154. Chen, H., Yan, Z.: Security and privacy in big data lifetime: a review security and privacy in big data lifetime: a review. In: International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage, pp. 3–15 (2018). https://doi.org/10.1007/978-3-319-49145-5

  155. Graham-Harrison, E., Cadwalladr, C.: Revealed: 50 million Facebook profiles harvested for Cambridge Analytica in major data breach. Guard, pp. 1–5 (2018)

  156. Taylor, A.: The 5 worst big data privacy risks (and how to guard against them). https://www.csoonline.com/article/2855641/privacy/the-5-worst-big-data-privacy-risks-and-how-to-guard-against-them.html

  157. González, R.J.: Hacking the citizenry? Anthropol. Today 33, 9–12 (2017). https://doi.org/10.1111/1467-8322.12348

    Article  Google Scholar 

  158. Rekha, H.S., Prakash, C., Kavitha, G.: Understanding trust and privacy of big data in social networks: a brief review. In: 2014 3rd International Conference on Eco-friendly Computing and Communication Systems, pp. 138–143. IEEE (2014). https://doi.org/10.1109/Eco-friendly.2014.103

  159. Ballandies, M.C.: Decrypting distributed ledger design—taxonomy, classification and blockchain community evaluation. Clust. Comput. (2021). https://doi.org/10.1007/s10586-021-03256-w

    Article  Google Scholar 

  160. Chaudhary, R., Aujla, G.S., Kumar, N., Rodrigues, J.J.P.C.: Optimized big data management across multi-cloud data centers: software-defined-network-based analysis. IEEE Commun. Mag. 56, 118–126 (2018). https://doi.org/10.1109/MCOM.2018.1700211

    Article  Google Scholar 

  161. Zhan, Y., Hua, K., Li, Y., Kei, Y.: Unlocking the power of big data in new product development. Ann. Oper. Res. 270, 577–595 (2018). https://doi.org/10.1007/s10479-016-2379-x

    Article  Google Scholar 

  162. Castiglione, A., Colace, F., Moscato, V., Palmieri, F.: CHIS: a big data infrastructure to manage digital cultural items. Future Gener. Comput. Syst. 86, 1134–1145 (2018). https://doi.org/10.1016/j.future.2017.04.006

    Article  Google Scholar 

  163. Seuba, X., Geiger, C., Pénin, J.: Intellectual Property and Digital Trade in the Age of Artificial Intelligence and Big Data. International Centre for Trade and Sustainable Development Publications Series. International Centre for Trade and Sustainable Development, Geneva (2018)

    Google Scholar 

  164. Maqbool, Q., Habib, A.: 5 Big data challenges. Control Eng. 66, 33 (2019). https://doi.org/10.4172/2324-9307.1000133

    Article  Google Scholar 

  165. Niu, C., Zheng, Z., Wu, F., Gao, X., Chen, G.: Achieving data truthfulness and privacy preservation in data markets. IEEE Trans. Knowl. Data Eng. 31, 105–119 (2019). https://doi.org/10.1109/TKDE.2018.2822727

    Article  Google Scholar 

  166. Bart, C., Karolina, L., Magdalena, J., Daniel, B., Michael, F., Stefania, A.: Lists of Ethical, Legal, Societal and Economic Issues of Big Data Technologies. Leiden University (2017)

  167. Jia-ke, L.V., Yang, L.I., Xuan, W.: Log data real time analysis using big data analytic framework with storm and Hadoop. MATEC Web Conf. (2018). https://doi.org/10.1051/matecconf/201824603009

    Article  Google Scholar 

  168. Ruidong, Z., Chunming, X., Junfeng, S., Yufeng, Z., Yi, P., Shiwen, C., Feng, D., Xishun, Z.: OTC-28346-MS The Research of Big Data Analysis Platform of Oil and Gas Production, pp. 1–9 (2018)

  169. Nweke, H.F., Ghulam, M., Mohammed, A.., Alo, U.R., Ahmad, W.: Deep learning fusion conceptual frameworks for complex human activity recognition using mobile and wearable sensors. In: 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), pp. 1–7 (2018)

  170. Siddiqa, A., Hashem, I.A.T., Yaqoob, I., Marjani, M., Shamshirband, S., Gani, A., Nasaruddin, F.: A survey of big data management: taxonomy and state-of-the-art. J. Netw. Comput. Appl. 71, 151–166 (2016). https://doi.org/10.1016/j.jnca.2016.04.008

    Article  Google Scholar 

  171. Singh, S.P., Nayyar, A., Kumar, R., Sharma, A.: Fog computing: from architecture to edge computing and big data processing. J. Supercomput. 75, 2070–2105 (2019). https://doi.org/10.1007/s11227-018-2701-2

    Article  Google Scholar 

  172. Tariq, N., Asim, M., Al-Obeidat, F., Farooqi, M.Z., Baker, T., Hammoudeh, M., Ghafir, I.: The security of big data in fog-enabled IoT applications including blockchain: a survey. Sensors (Switz.) 19, 1–33 (2019). https://doi.org/10.3390/s19081788

    Article  Google Scholar 

  173. Hassan, M.M., Gumaei, A., Alsanad, A., Alrubaian, M., Fortino, G.: A hybrid deep learning model for efficient intrusion detection in big data environment. Inf. Sci. (NY) 513, 386–396 (2020). https://doi.org/10.1016/j.ins.2019.10.069

    Article  Google Scholar 

  174. Nweke, H.F., Teh, Y.W., Al-garadi, M.A., Alo, U.R.: Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: state of the art and research challenges. Expert Syst. Appl. 105, 223–261 (2018). https://doi.org/10.1016/j.eswa.2018.03.056

    Article  Google Scholar 

  175. Salah, S., Maciá-Fernández, G., Díaz-Verdejo, J.E.: Fusing information from tickets and alerts to improve the incident resolution process. Inf. Fusion 45, 38–52 (2019). https://doi.org/10.1016/j.inffus.2018.01.011

    Article  Google Scholar 

  176. Nweke, H.F., Teh, Y.W., Mujtaba, G., Al-Garadi, M.A.: Data fusion and multiple classifier systems for human activity detection and health monitoring: review and open research directions. Inf. Fusion 46, 147–170 (2019). https://doi.org/10.1016/j.inffus.2018.06.002

    Article  Google Scholar 

Download references

Funding

There is no external funding received by the authors.

Author information

Authors and Affiliations

Authors

Contributions

All the authors contributed equally in conducting the research. The authors drafted, proofread, and reviewed the manuscript.

Corresponding authors

Correspondence to Anayo Chukwu Ikegwu or Henry Friday Nweke.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest for this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

1.1 Appendix A

The following abbreviations were used in this paper

Abbreviations

Definitions

Abbreviations

Definitions

ABC

Anthropology-based computing

M-ANFIS

Modified adaptive neuro-fuzzy inference system

ABE

Attribute-based encryption

MCE

Multi-cloud environment

ASEAN

Association of Southeast Asian Nations

MMLL

Mahout machine learning library

ASVM

Aggressive support vector machine

NoSQL

Not only structured query language

AUI

Address unique identifier

OTP

One time password

BA

Big data

PoI

Point of interests

BDA

Big data analytics

PaaS

Platform as a service

CDP

Cloudera distributed platform

PKE

Public encryption key

CoP

Common operation picture

PPG

Photoplethysmography

CSVM

Conservative support vector machine

RDD

Resilient distributed dataset

EB

Exabyte

RFID

Radio frequency identification

ECG

Electrocardiography

SaaS

Software as a service

EMR

Electronic medical records

SC

Spark context

ALL

Extraction transformation load

SCADA

Cognitive computing and supervising control and data acquisition

FP-ANN

Feed forward artificial neural network

SCCs

Smart and connect communities

HC

Hybrid Cloud

SSDs

Solid-state disks

HDFS

Hadoop distributed file system

SPE

Storage path encryption

HDP

Hortonworks data platform

TPDM

Truthfulness and privacy preservation in data markets

HME

Homomorphic encryption

YARN

Yet another Resources Negotiator

IaaS

Infrastructure as a service

ZB

Zettabyte

IBE

Identity-based encryption

  

IIoT

Industrial Internet of Things

  

LA

Lambda architecture

  

LS-SVM

Least square support vector machine

  

1.2 Appendix B

Summary of some reviewed studies of big data analytics for data-driven industries and major features related to the studies

Study

Objective

Problem

Tools

Data source

Application

Challenge

Solution

Future direction

[60]

Implementation of valuable recommendations system for rankings show

Difficulty in locating entertainment in the ambient animation domain

Cloudera Oryx

Data is extracted from MyAnimeList dataset from a website profile page

Animated video

Collaborative filtering

Improvement based on animation customers identification

Furtherance for recommendations system in e-commerce and video content provider

[72]

Present a real-time platform to dynamically update the environmental and web sensor services data

Real-time geographical information system data model and sensor web service framework for environmental big data experience update and synchronization issue

Apache Storm

GIS data

Soil moisture monitoring, and Air quality

GIS data model and sensor web service

Implementation of big data Apache Storm via an update the synchronized environmental data

Environmental big data platform evaluation requires more features

[131]

Proposed credit or debit card platform to enhance its usability in big data-driven industry 4.0

Fraud transactions detection and identifying large scale pattern has become trending problems

MapReduce in HDFS

Finance transactions such as credit or debit card data

Fraud detection in finance

There is a problem of security flaws

Enactment of legislative laws, Syferlock, and use of Secure multi-party computation mechanism

Building usability and enhancement integrated secured system

[128]

Developed a recent big data platform to manage sensor data in a more scalable manner for healthcare requests

Data generated from medical devices are complex to process and analyze

Apache Pig and Apache HBase

Data is collected from a sensor such as heart rate, temperature sensor from fog computing

Healthcare and data security

Data complexity and data security prone

Multi-party computation, cloud-based data encryption, and legislation

Enhancement of integrated big data computing paradigm and well-secured system

[120]

Implemented prediction platform for a real-time distant health status with Apache Spark, and utilizes machine learning algorithms for big data streaming

High rate of data generation and accumulation from the field of the healthcare system

Apache Spark-based decision-tree

The training and testing data was extracted from Twitter through processed and dataset of UCI Machine Learning Repository

Twitter, healthcare

Data capturing and processing to inform knowledge

Software-defined based big data management

Development of a complete real-time healthcare monitoring system

[118]

Develop a conceptual distributed framework for a secure healthcare system to protect patient data

There are privacy and security issues in the distributed system while reading patient data

Hadoop platforms such as MapReduce or NoSQL

Electronic healthcare record (EHR)

Healthcare

Lack of confidentiality, security, and privacy for medical data

Data truthfulness, privacy preservation model, multi-party computation, cloud-based data encryption

Enhancement of security in big data analytics distributed healthcare framework

[43]

Proposes recent IoT data fusion

There is a problem with large storage models and high computational complexity using K-means for clustering IoT big data

Hadoop (data fusion), K-means algorithm

IoT sensors

(IoT) patterns, and segmentation behavioral groups

Large scaled storage and high computational complexity

Data mining for big data approach

Building advanced computational model and storage framework

[101]

Present a conceptual approach for effective implementation of big data analytics in higher institution

There is limited progress in rich data accumulation in higher education systems

Tableau, SPSS

Data is generated from students, instructors, administrators, and the public

Colleges and universities

Student rights and privacy

Data preservation and enforcing homomorphic encryption

Collaborative implementation of big data analytics deployment in higher institution

[94]

Investigates common high-level query language developed on MapReduce framework

Luxurious memory and data transformations are comprehensively needed

MapReduce

Textual file

JAQL, Hive, and Pig a high-level query languages

Low-level translation

Implementation of MapReduce-based high-level query languages QL

Reduction of memory to ease big data platform transformation

[160]

Proposes SDN-based big data management framework to reduce the consumption of network resource

An increase in data generation from various smart devices across the network is high

Hadoop, and Spark

Data is generated from smart devices, Google, Amazon, and Microsoft

Cloud data centre base

Controller placement problem, energy consumption, network slicing, flow table management, and security

Infrastructural readiness mechanism

Optimal integration of SDN-based big data distribution platforms

[113]

Develop efficient machine learning algorithms as well as big data frameworks to train different data format generation

Enhancement of different data formats such as heterogeneous and unstructured data affects the performance of image recognition pattern

MongoDB

CSV file and dataset from image recognition domains such as FashionMnist, Mnist

Images recognition pattern

Image preprocessing and analysis of image large dataset

Full implementation of big data analytics frameworks and machine learning model

Implementing big data framework to widen the existing platform

[73]

Developed “Smart Cassandra Spark Integration” as a novel approach to solve NoSQL data stores integration with other big data storage devices

Integrating NoSQL data stores to manage distributed systems with different computing devices is quite challenging

NoSQL, Apache Cassandra, and Spark

Electricity smart meter data

Electricity smart meter

Lack of integration of pervasive big data computing platforms

Effective implementation of Smart Cassandra Spark Integration platforms

Enhance the speed performance of big data stores, MPI/OpenMP with Cassandra integration is sorted

[108]

Investigate the effects of luxury brands' online marketing on customer engagement through big data collected

Limited coverage of superfluity brands and lack of longitudinal studies

MySQL, NLP

Twitter data of 3.78 million collected across the social media

Social media marketing

Inadequate collaboration of customer engagement

Proper design, delivery, and management of luxury brands across social media marketing contents

Varieties sample of luxury brands to enhance customer engagement

[112]

Introduce a new approach to analyze Twitter data and scrutinize community reactions to the consequences of a disaster, based on social media statements

Due to the random appearance and vast extent of users and intentions, social media data has been under-utilized in post-disaster recovery studies

Machine learning

Twitter streaming data through a mobile app

Disaster recovery

Complexity in data generation and management

Implementation of machine learning model and utilization big data tools

To investigate the intentions towards disaster-influenced activities of social media users

[102]

Presents machine learning approach to predict student’s final grades using grades historical performance

High rate of student dropout

Decision-tree

Student’s grades

Education

Historical grades prediction

Implementation of early warning mechanism

Design of full feature and functional big data framework that supports the processing of the large volume of academic data

[4]

The author implements the internal structure and trend of financial system

High-frequency data trend issues

SVM, non-linear analysis

IoT data

Finance

An issue with volatile financial market behavior characteristics

Development of high-frequency data

Application of more attributes prediction in financial data models

[127]

Presents sensor-enhanced based for rehabilitation outcomes of patients with terrible neurological impairment

Inefficiency in information gathering of patient progress in an outpatient clinic

Apache Spark, NoSQL

IoT/sensor and healthcare data

Healthcare, IoT

Delay and lack of platform mechanism and intervention for family therapeutic activities

Implementation of infrastructural readiness and acquire the data required for therapeutic algorithms

Enhancement and integrating more function features of big data analytics distribution platforms

[103]

Identify attributes for the early dropout prediction of students

High dropout in adult education

Machine learning (Decision-tree, Naïve Bayes etc.)

Student’s data enrollment and academic records

Education

Various factors such as economic crises, financial, miscalculation of available time, etc

Early detection of students at risk

Developing cooperate synergy

[7]

Develop job advertisements with a profile of the organization using a text mining approach

There is scarce prerequisite knowledge needed in succeeding industry 4.0 business intelligent

Machine learning

Job advertisement through social media data

Text mining, education

Technological knowledge skill gaps in industry 4.0

Development of online job advertisements analysis

Further analysis using unsupervised text mining

[87]

To presents an intelligent big data task scheduling approach for IoT cloud computing applications using a hybrid dragonfly algorithm

Problem with the effective task scheduling

Dragonfly algorithm

CloudSim infrastructures (toolkit)

IoT cloud computing

Task scheduling problems in IoT cloud computing

Development of an intelligent workflow scheduling using dragonfly algorithm

Incorporation of other local search methods to improve the performance of the drag

[88]

Developed an alternative task scheduling technique for IoT requests in a cloud-fog environment based on a modified artificial ecosystem-based optimization

Task scheduling problem

Salp swarm algorithm

Parallel workload archive from NASA comprising HPC2N (High-Performance Computing Center North)

IoT architectures

Bored down to task scheduling

Deployment of an advanced optimization technique in cloud-fog computing approach

Modification of AEOSSA to handle job shop scheduling and vehicle routing. Also, considering energy consumption and cost for cloud-fog development

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ikegwu, A.C., Nweke, H.F., Anikwe, C.V. et al. Big data analytics for data-driven industry: a review of data sources, tools, challenges, solutions, and research directions. Cluster Comput 25, 3343–3387 (2022). https://doi.org/10.1007/s10586-022-03568-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-022-03568-5

Keywords