Abstract
A massive amount of data is generated at an ever-increasing rate. Social media, mobile phones, sensors, and medical imaging, among others, are examples of data sources. An important characteristic of the data generated by these sources is that the data is commonly either unstructured or semi-structured. Big data analytics comprises software systems that are able to analyze vast amounts of data to uncover information such as patterns and correlations that help decision-makers in making better decisions. Traditional approaches such as data warehousing and the use of a classic relational database management system (RDBMS) have become impractical to analyze such unstructured and semi-structured data. On the other hand, machine learning (ML) algorithms have proven to be successful in analyzing such vast amounts of data. In this chapter, we present some of the most widely used ML algorithms in big data analytics as well as the distributed platforms typically employed for processing the data. We also present a selection of three important application domains where ML algorithms have been applied to perform big data analytics. These application domains include healthcare, weather forecasting, and social networking. Finally, we review relevant approaches used in each domain area, the most commonly used ML algorithms per area, and specific domain area issues that need further research in big data analytics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Reinsel, D., Gantz J., Rydning, J.: The Digitalization of The World: From Edge to Core (2018), https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
Rahman, M.S., Reza, H.: A systematic review towards Big Data analytics in social media. Big Data Min. Anal. 5, 228–244 (2022). https://doi.org/10.26599/BDMA.2022.9020009
Fisher, D., DeLine, R., Czerwinski, M., Drucker, S.: Interactions with Big Data analytics. Interactions. 19, 50–59 (2012). https://doi.org/10.1145/2168931.2168943
Nti, I.K., Quarcoo, J.A., Aning, J., Fosu, G.K.: A mini-review of machine learning in big data analytics: applications, challenges, and prospects. Big Data Min. Anal. 5, 81–97 (2022). https://doi.org/10.26599/BDMA.2021.9020028
Wixom, B., Ariyachandra, T., Douglas, D., Goul, K., Gupta, B., Iyer, L., Kulkarni, U., Mooney, B.J.G., Phillips-Wren, G., Turetken, O.: The current state of business intelligence in academia: the arrival of big data. Commun. Assoc. Inf. Syst. 34, 1–13 (2014). https://doi.org/10.17705/1cais.03401
Laney, D.: 3D data management: Controlling data volume velocity and variety, https://studylib.net/doc/8647594/3d-data-management%2D%2Dcontrolling-data-volume%2D%2Dvelocity%2D%2Dan... (2001)
Qiu, J., Wu, Q., Ding, G., Xu, Y., Feng, S.: A survey of machine learning for big data processing. EURASIP J. Adv. Signal Proc. 2016, 1–16 (2016)
EMC (ed.): Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. Wiley Publishing (2015)
Grover, P., Kar, A.K.: Big Data analytics: a review on theoretical contributions and tools used in literature. Global J. Flex. Syst. Manag. 18, 203–229 (2017). https://doi.org/10.1007/s40171-017-0159-3
Mikalef, P., Pappas, I.O., Krogstie, J., Giannakos, M.: Big data analytics capabilities: a systematic literature review and research agenda. Inf. Syst. E-Bus. Manag. 16, 547–578 (2018). https://doi.org/10.1007/s10257-017-0362-y
Zhou, L., Pan, S., Wang, J., Vasilakos, A.V.: Machine learning on big data: opportunities and challenges. Neurocomputing. 237, 350–361 (2017). https://doi.org/10.1016/j.neucom.2017.01.026
Russell, S., Norvig, P.: Artificial Intelligence: a Modern Approach. Prentice Hall (2010)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press (2000)
Sun, Z.Q., Fox, G.C.: Study on parallel SVM based on MapReduce. In: International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 495–561, Las Vegas, NV, USA (2012)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Taylor & Francis (1984)
Dai, W., Ji, W.-Z.: A MapReduce implementation of C4.5 Decision Tree algorithm. Int. J. Database Theory Appl. 7, 49–60 (2014)
Purdilă, V., Pentiuc, Ş.-G.: MR-Tree-A scalable MapReduce algorithm for building decision trees. J. Appl. Comput. Sci. Math. 8, 16–19 (2014)
Mahdavinejad, M.S., Rezvan, M., Barekatain, M., Adibi, P., Barnaghi, P., Sheth, A.P.: Machine learning for internet of things data analysis: a survey. Digit. Commun. Netw. 4, 161–175 (2018). https://doi.org/10.1016/j.dcan.2017.10.002
Kaur, N., Lal, N.: Clustering of social networking data using SparkR in Big Data. In: Mayank, S., Gupta, P.K., T.V, F.J, Ö.T (eds.) Advances in Computing and Data Sciences, pp. 217–226. Springer Singapore, Singapore (2018)
Arora, P., Deepali, Varshney, S.: Analysis of K-means and K-Medoids algorithm for Big Data. In: International Conference on Information Security & Privacy (ICISP2015), pp. 507–512 (2016)
Prabhu, C.S.R., Chivukula, A.S., Mogadala, A., Ghosh, R., Livingston, L.M.J.: Big Data Analytics: Systems, Algorithms, Applications. Springer, Singapore (2019)
Ray, S.: A quick review of Machine Learning algorithms. In: 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), pp. 35–39 (2019)
Yuan, C., Yang, H.: Research on K-value selection method of K-means clustering algorithm. J (Basel). 2, 226–235 (2019). https://doi.org/10.3390/j2020016
Narayanan, B.N., Djaneye-Boundjou, O., Kebede, T.M.: Performance analysis of machine learning and pattern recognition algorithms for Malware classification. In: 2016 IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), pp. 338–342 (2016)
Narayanan, B.N., Hardie, R.C., Kebede, T.M.: Performance analysis of a computer-aided detection system for lung nodules in CT at different slice thicknesses. J. Med. Imag. 5, 14504 (2018). https://doi.org/10.1117/1.JMI.5.1.014504
Narayanan, B.N., Hardie, R.C., Kebede, T.M., Sprague, M.J.: Optimized feature selection-based clustering approach for computer-aided detection of lung nodules in different modalities. Pattern Anal. Appl. 22, 559–571 (2019). https://doi.org/10.1007/s10044-017-0653-4
Al-Yaseen, W.L., Othman, Z.A., Nazri, M.Z.A.: Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system. Expert Syst. Appl. 67, 296–303 (2017). https://doi.org/10.1016/j.eswa.2016.09.041
Ge, Y., Tang, K.: Distributed dynamic cluster algorithm for wireless sensor networks. In: 6th International Conference on Wireless, Mobile and Multi-Media (ICWMMN 2015), pp. 23–27 (2015)
Ran, X., Zhou, X., Lei, M., Tepsan, W., Deng, W.: A novel K-means clustering algorithm with a noise algorithm for capturing urban hotspots. Appl. Sci. (Switzerland). 11 (2021). https://doi.org/10.3390/app112311202
Bendechache, M., Kechadi, M.-T.: Distributed clustering algorithm for spatial data mining. In: 2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM), pp. 60–65 (2015)
Chiroma, H., Abdullahi, U.A., Abdulhamid, S.M., Abdulsalam Alarood, A., Gabralla, L.A., Rana, N., Shuib, L., Targio Hashem, I.A., Gbenga, D.E., Abubakar, A.I., Zeki, A.M., Herawan, T.: Progress on artificial neural networks for Big Data analytics: a survey. IEEE Access. 7, 70535–70551 (2019). https://doi.org/10.1109/ACCESS.2018.2880694
Shen, D., Wu, G., Suk, H.-I.: Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017). https://doi.org/10.1146/annurev-bioeng-071516-044442
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM. 51, 107–113 (2008). https://doi.org/10.1145/1327452.1327492
Harerimana, G., Jang, B., Kim, J.W., Park, H.K.: Health Big Data analytics: a technology survey. IEEE Access. 6, 65661–65678 (2018). https://doi.org/10.1109/ACCESS.2018.2878254
Apache Software Foundation: Apache Hadoop, https://hadoop.apache.org/
Ketu, S., Mishra, P.K., Agarwal, S.: Performance analysis of distributed computing frameworks for Big Data analytics: Hadoop vs Spark. Computación y Sistemas. 24, 669–686 (2020). https://doi.org/10.13053/CyS-24-2-3401
Mohd, A.B., Banu, A., Yakub, M.: Evolution of big data and tools for big data analytics. J. Interdiscipl. Cycle Res. 12, 309–316 (2020)
Gupta, P., Sharma, A., Jindal, R.: Scalable machine-learning algorithms for big data analytics: a comprehensive review. WIREs Data Min. Knowl. Discov. 6, 194–214 (2016). https://doi.org/10.1002/widm.1194
Raza, M.U., XuJian, Z.: A comprehensive overview of BIG DATA technologies: a survey. In: Proceedings of the 5th International Conference on Big Data and Computing, pp. 23–31. Association for Computing Machinery, New York, NY, USA (2020)
Venkatram, K., Geetha, M.A.: Review on Big Data & analytics – concepts, philosophy, process and applications. Cybern. Inf. Technol. 17, 3–27 (2017). https://doi.org/10.1515/cait-2017-0013
Ikegwu, A.C., Nweke, H.F., Anikwe, C.V., Alo, U.R., Okonkwo, O.R.: Big data analytics for data-driven industry: a review of data sources, tools, challenges, solutions, and research directions. Cluster Comput. (2022). https://doi.org/10.1007/s10586-022-03568-5
Faridoon, A., Imran, M.: Big data storage tools using NoSQL databases and their applications in various domains: a systematic review. Comput. Inf. 40, 489–521 (2021). https://doi.org/10.31577/cai_2021_3_489
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J., DATA, M.: Practical machine learning tools and techniques. In: Data Mining (2005)
R Core Team: R.: A Language and Environment for Statistical Computing, https://www.R-project.org/ (2022)
Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74, 2561–2573 (2014). https://doi.org/10.1016/j.jpdc.2014.01.003
Galetsi, P., Katsaliaki, K.: A review of the literature on big data analytics in healthcare. J. Oper. Res. Soc. 71, 1511–1529 (2020). https://doi.org/10.1080/01605682.2019.1630328
Cirillo, D., Valencia, A.: Big data analytics for personalized medicine. Curr. Opin. Biotechnol. 58, 161–167 (2019). https://doi.org/10.1016/j.copbio.2019.03.004
Akundi, S.H., Soujanya, R., Madhuri, P.M.: Big Data analytics in healthcare using Machine Learning algorithms: a comparative study. Int. J. Online Biomed. Eng. (iJOE). 16, 19–32 (2020). https://doi.org/10.3991/ijoe.v16i13.18609
Agarwal, R., Dhar, V.: Editorial—Big Data, data science, and analytics: the opportunity and challenge for IS research. Inf. Syst. Res. 25, 443–448 (2014). https://doi.org/10.1287/isre.2014.0546
Sunil Kumar, M.S.: Big Data analytics for healthcare industry: impact, applications, and tools. Big Data Min. Anal. 2, 48 (2019). https://doi.org/10.26599/BDMA.2018.9020031
Ristevski, B., Chen, M.: Big Data analytics in medicine and healthcare. J. Integr. Bioinform. 15 (2018). https://doi.org/10.1515/jib-2017-0030
Gostin, L.O., Halabi, S.F., Wilson, K.: Health data and privacy in the digital era. JAMA. 320, 233–234 (2018). https://doi.org/10.1001/jama.2018.8374
Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., Kim, R., Raman, R., Nelson, P.C., Mega, J.L., Webster, D.R.: Development and validation of a Deep Learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 316, 2402–2410 (2016). https://doi.org/10.1001/jama.2016.17216
Yuvaraj, N., SriPreethaa, K.R.: Diabetes prediction in healthcare systems using machine learning algorithms on Hadoop cluster. Cluster Comput. 22, 1–9 (2019). https://doi.org/10.1007/s10586-017-1532-x
Chen, M., Hao, Y., Hwang, K., Wang, L., Wang, L.: Disease prediction by machine learning over big data from healthcare communities. IEEE Access. 5, 8869–8879 (2017). https://doi.org/10.1109/ACCESS.2017.2694446
Dugan, T.M., Mukhopadhyay, S., Carroll, A., Downs, S.: Machine learning techniques for prediction of early childhood obesity. Appl. Clin. Inform. 06, 506–520 (2015)
Alotaibi, S., Mehmood, R., Katib, I., Rana, O., Albeshri, A.: Sehaa: a Big Data analytics tool for healthcare symptoms and diseases detection using Twitter, Apache Spark, and machine learning. Appl. Sci. 10 (2020). https://doi.org/10.3390/app10041398
Richardson, L.F., Lynch, P.: Weather Prediction by Numerical Process. Cambridge University Press (2007)
NCAR/UCAR.: WRF model users site, http://www2.mmm.ucar.edu/wrf/users/
Powers, J.G., Klemp, J.B., Skamarock, W.C., Davis, C.A., Dudhia, J., Gill, D.O., Coen, J.L., Gochis, D.J., Ahmadov, R., Peckham, S.E., Grell, G.A., Michalakes, J., Trahan, S., Benjamin, S.G., Alexander, C.R., Dimego, G.J., Wang, W., Schwartz, C.S., Romine, G.S., Liu, Z., Snyder, C., Chen, F., Barlage, M.J., Yu, W., Duda, M.G.: The weather research and forecasting model: overview, system efforts, and future directions. Bull. Am. Meteorol. Soc. 98, 1717–1737 (2017). https://doi.org/10.1175/BAMS-D-15-00308.1
Hewage, P., Trovati, M., Pereira, E., Behera, A.: Deep learning-based effective fine-grained weather forecasting model. Pattern Anal. Appl. 24, 343–366 (2021). https://doi.org/10.1007/s10044-020-00898-1
Ahmadi, A., Zargaran, Z., Mohebi, A., Taghavi, F.: Hybrid model for weather forecasting using ensemble of neural networks and mutual information. In: 2014 IEEE Geoscience and Remote Sensing Symposium, pp. 3774–3777 (2014)
Patil, K., Deo, M.C.: Basin-scale prediction of sea surface temperature with artificial neural networks. In: 2018 OCEANS - MTS/IEEE Kobe Techno-Oceans (OTO), p. 1–5 (2018)
Rodriguez-Fernandez, N.-J., de Rosnay, P., Albergel, C., Aires, F.: SMOS Neural Network Soil Moisture Data Assimilation. (2017)
Sharaff, A., Roy, S.R.: Comparative analysis of temperature prediction using regression methods and back propagation neural network. In: 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 739–742 (2018)
Liu, J.N.K., Hu, Y.-X., You, J.J., Chan, P.W.: Deep neural network based feature representation for weather forecasting. In: The 2014 World Congress in Computer Science, Computer Engineering, and Applied Computing (2014)
Dalto, M., Matuško, J., Vašak, M.: Deep neural networks for ultra-short-term wind forecasting. In: 2015 IEEE International Conference on Industrial Technology (ICIT), pp. 1657–1663 (2015)
Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W., Woo, W.: Convolutional LSTM network: a machine learning approach for precipitation Nowcasting. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 802–810. MIT Press, Cambridge, MA (2015)
Hossain, M., Rekabdar, B., Louis, S.J., Dascalu, S.: Forecasting the weather of Nevada: a deep learning approach. In: 2015 International Joint Conference on Neural Networks (IJCNN), p. 1–6 (2015)
Yonekura, K., Hattori, H., Suzuki, T.: Short-term local weather forecast using dense weather station by deep neural network. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 1683–1690 (2018)
Voyant, C., Notton, G., Kalogirou, S., Nivet, M.-L., Paoli, C., Motte, F., Fouilloy, A.: Machine learning methods for solar radiation forecasting: a review. Renew. Energy. 105, 569–582 (2017). https://doi.org/10.1016/j.renene.2016.12
Rasel, R.I., Sultana, N., Meesad, P.: An application of data mining and machine learning for weather forecasting. In: Meesad, P., Sodsee, S., Unger, H. (eds.) Recent Advances in Information and Communication Technology 2017, pp. 169–178. Springer International Publishing, Cham (2018)
Mahmood, M.R., Patra, R.K., Raja, R., Sinha, G.R.: A novel approach for weather prediction using forecasting analysis and data mining techniques. In: Saini, H.S., Singh, R.K., Kumar, G., Rather, G.M., Santhi, K. (eds.) Innovations in Electronics and Communication Engineering, pp. 479–489. Springer Singapore, Singapore (2019)
Zhan, Y., Zhang, H., Liu, Y.: Forecast of meteorological and hydrological features based on SVR model. In: 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), pp. 579–583 (2021)
Maliyeckel, M.B., Sai, B.C., Naveen, J.: A comparative study of LGBM-SVR hybrid machine learning model for rainfall prediction. In: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), p. 1–7 (2021)
Fathi, M., Haghi Kashani, M., Jameii, S.M., Mahdipour, E.: Big Data analytics in weather forecasting: a systematic review. Arch. Comput. Methods Eng. 29, 1247–1275 (2022). https://doi.org/10.1007/s11831-021-09616-4
Zhou, K., Zheng, Y., Li, B., Dong, W., Zhang, X.: Forecasting different types of convective weather: a deep learning approach. J. Meteorolog. Res. 33, 797–809 (2019). https://doi.org/10.1007/s13351-019-8162-6
Mehrkanoon, S.: Deep shared representation learning for weather elements forecasting. Knowledge-Based Syst. 179, 120–128 (2019). https://doi.org/10.1016/j.knosys.2019.05.009
Troncoso, A., Salcedo-Sanz, S., Casanova-Mateo, C., Riquelme, J.C., Prieto, L.: Local models-based regression trees for very short-term wind speed prediction. Renew. Energy. 81, 589–598 (2015). https://doi.org/10.1016/j.renene.2015.03.071
Lee, Z.-J., Lee, C.-Y., Yuan, X.-J., Chu, K.-C.: Rainfall forecasting of landslides using support vector regression. In: 2020 3rd IEEE International Conference on Knowledge Innovation and Invention (ICKII), pp. 1–3 (2020)
Faroukhi, A.Z., Alaoui, I., Gahi, Y., Amine, A.: An adaptable big data value chain framework for end-to-end big data monetization. Big Data Cogn. Comput. 4, 1–27 (2020). https://doi.org/10.3390/bdcc4040034
Latif, M.H., Afzal, H.: Prediction of movies popularity using machine learning techniques. Int. J. Comput. Sci. Netw Secur. 16, 127–131 (2016)
Lakshmanaprabu, S.K., Shankar, K., Khanna, A., Gupta, D., Rodrigues, J.J.P.C., Pinheiro, P.R., de Albuquerque, V.H.C.: Effective features to classify big data using social internet of things. IEEE Access. 6, 24196–24204 (2018)
Patgiri, R., Varshney, U., Akutota, T., Kunde, R.: An investigation on intrusion detection system using machine learning. In: Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence, SSCI 2018, p. 1684–1691. Institute of Electrical and Electronics Engineers Inc. (2019)
Liang, F., Hatcher, W.G., Liao, W., Gao, W., Yu, W.: Machine learning for security and the Internet of Things: the good, the bad, and the ugly. IEEE Access. 7, 158126–158147 (2019). https://doi.org/10.1109/ACCESS.2019.2948912
Zheng, X., Chen, W., Wang, P., Shen, D., Chen, S., Wang, X., Zhang, Q., Yang, L.: Big Data for social transportation. IEEE Trans. Intell. Transp. Syst. 17, 620–630 (2016). https://doi.org/10.1109/TITS.2015.2480157
Jain, A., Shakya, A., Khatter, H., Gupta, A.K.: A smart system for fake news detection using machine learning. In: 2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), p. 1–4 (2019)
Nallaperuma, D., Nawaratne, R., Bandaragoda, T., Adikari, A., Nguyen, S., Kempitiya, T., de Silva, D., Alahakoon, D., Pothuhera, D.: Online incremental machine learning platform for Big Data-driven smart traffic management. IEEE Trans. Intell. Transp. Syst. 20, 4679–4690 (2019). https://doi.org/10.1109/TITS.2019.2924883
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Duran-Limon, H.A., Chavoya, A., Hernández-Ochoa, M. (2024). The Role of Machine Learning in Big Data Analytics: Current Practices and Challenges. In: Mora, M., Wang, F., Marx Gomez, J., Duran-Limon, H. (eds) Development Methodologies for Big Data Analytics Systems. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-031-40956-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-40956-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40955-4
Online ISBN: 978-3-031-40956-1
eBook Packages: EngineeringEngineering (R0)