Abstract
In this paper we present a global overview of the present usage and future trends of the different big data ecosystems in the E-Health’s scientific domains. Indeed, bioinformaticians as well as medicine practitioners are actually generating very large amounts of data, and thus storing, managing, and analyzing these large scale data-sets still represent a big challenge. The used Big Data ecosystems are involved at different steps of the production chain, i.e., from the acquisition of both structured and non-structured data, the storage in traditional and/or NoSQL databases, and finally the analytics using the Map Reduce framework. We will discuss in this smooth survey, all these parts of the ecosystem and will give some use cases on real data-sets in the domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Inmon, W.H., Linstedt, D.: A brief history of big data. In: Inmon, W.H., Linstedt, D. (eds.) Data Architecture: A Primer for the Data Scientist, pp. 45–48. Morgan Kaufmann, Boston (2015)
Secchi, P., Paganoni, A.M.: Advances in Complex Data Modeling. Springer, Heidelberg (2014)
Fawcett, T., Provost, F.: Data Science for Business What You Need to Know about Data Mining and Data-Analytic Thinking. OReilly Media, Sebastopol (2013)
Zou, Q., Li, X.-B., Jiang, W.-R., Lin, Z.-Y., Li, G.-L., Chen, K.: Survey of mapreduce frame operation in bioinformatics. Brief. Bioinf. 15(4), 637–647 (2014)
Linstedt, D., Inmon, W.H.: Data Architecture: A Primer for the Data Scientist, Big Data, Data Warehouse and Data Vault. OReilly Media, Sebastopol (2014)
Dimitrov, D.V.: Medical internet of things and big data in healthcare. Healthc. Inf. Res. 22(3), 156–163 (2016)
Coulouris, G., Dollimore, J., Kindberg, T., Blair, G.: Distributed Systems: Concepts and Design, 5th edn. Addison-Wesley Publishing Company, Boston (2011)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)
Berman, F., Fox, G., Hey, A.J.G.: Grid Computing: Making the Global Infrastructure a Reality. Wiley, New York (2003)
Mohammed, E.A., Far, B.H., Naugler, C.: Applications of the mapreduce programming framework to clinical big data analysis: current landscape and future trends. BioData Min. 7(1), 22 (2014)
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP 2003, pp. 29–43. ACM, New York (2003)
White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2) (2008)
Lam, C.: Hadoop in Action, 1st edn. Manning Publications Co., Greenwich (2010)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, Berkeley, CA, USA, p. 10. USENIX Association (2010)
Larus, J.R.: The cloud will change everything. SIGPLAN Not. 46(3), 1–2 (2011)
Juan, H.F., Huang, H.C.: Bioinformatics. Humana Press, Totowa (2007). pp. 405–416
Hoogendoorn, M., Szolovits, P., Moons, L.M.G., Numans, M.E.: Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer. Artif. Intell. Med. 69, 53–61 (2016)
Siuly, S., Li, Y., Zhang, Y.: EEG Signal Analysis and Classification - Techniques and Applications. Health Information Science. Springer, Heidelberg (2016)
Kafkas, S., Kim, J.H., Pi, X., McEntyre, J.R.: Database citation in supplementary data linked to europe pubmed central full text biomedical articles. J. Biomed. Semant. 6, 1 (2015)
Benabderrahmane, S., Smaïl-Tabbone, M., Poch, O., Napoli, A., Devignes, M.-D.: IntelliGO: a new vector-based semantic similarity measure including annotation origin. BMC Bioinform. 11, 588 (2010)
Yu, N., Li, B., Pan, Y.: A cloud-assisted application over apache spark for investigating epigenetic markers on DNA genome sequences. In: 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom), BDCloud-SocialCom-SustainCom 2016, Atlanta, GA, USA, 8–10 October 2016, pp. 67–74 (2016)
Ahmed, Z., Saman, Z., Dandekar, T.: Mining biomedical images towards valuable information retrieval in biomedical and life sciences. Database 2016 (2016)
Fiore, S., DAnca, A., Palazzo, C., Foster, I., Williams, D.N., Aloisio, G.: Ophidia: towardbig data analytics for escience. Procedia Comput. Sci. 18, 2376–2385 (2013)
Schumacher, A., Pireddu, L., Niemenmaa, M., Kallio, A., Korpelainen, E., Zanetti, G., Heljanko, K.: SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop. Bioinformatics 30(1), 119–120 (2014)
Pireddu, L., Leo, S., Soranzo, N., Zanetti, G.: A Hadoop-galaxy adapter for user-friendly and scalable data-intensive bioinformatics in galaxy. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2014, Newport Beach, California, USA, 20–23 September 2014, pp. 184–191 (2014)
Leo, S., Santoni, F., Zanetti, G.: Biodoop: bioinformatics on hadoop. In: International Conference on Parallel Processing Workshops, ICPPW 2009, Vienna, Austria, 22–25 September 2009, pp. 415–422 (2009)
ODriscoll, A., Daugelaite, J., Sleator, R.D.: Big data, Hadoop and cloud computing in genomics. J. Biomed. Inf. 46(5), 774–781 (2013)
Matsunaga, A.M., Tsugawa, M.O., Fortes, J.A.B.: Cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: e-Science 2008 Fourth International Conference on e-Science, Indianapolis, IN, USA, 7–12 December 2008, pp. 222–229 (2008)
Schatz, M.C.: Cloudburst: highly sensitive read mapping with mapreduce. Bioinformatics 25(11), 1363–1369 (2009)
Venkata, V., Prasad, S., Loshma, G.: HPC-MAQ: a parallel short-read reference assembler
Langmead, B., Hansen, K.D., Leek, J.T.: Cloud-scale RNA-sequencing differential expression analysis with myrna. Genome Biol. 11(8), R83 (2010)
Berrada, G., Keulen, M., Habib, M.B.: Hadoop for EEG storage and processing: a feasibility study. In: Ślȩzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS (LNAI), vol. 8609, pp. 218–230. Springer, Heidelberg (2014). doi:10.1007/978-3-319-09891-3_21
Markonis, D., Schaer, R., Eggel, I., Müller, H., Depeursinge, A.: Using mapreduce for large-scale medical image analysis. In: 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology, HISB 2012, La Jolla, CA, USA, 27–28 September 2012, p. 1 (2012)
Mangla, S., Raghava, N.S.: Iris recognition on hadoop: a biometrics system implementation on cloud computing. In: 2011 IEEE International Conference on Cloud Computing and Intelligence Systems, pp. 482–485, September 2011
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9), 1297–1303 (2010)
Gurtowski, J., Schatz, M.C., Langmead, B.: Genotyping in the cloud with crossbow (2002)
Brock, M., Goscinski, A.: Execution of compute intensive applications on hybrid clouds (case study with mpiblast). In: Sixth International Conference on Complex, Intelligent, and Software Intensive Systems, CISIS 2012, Palermo, Italy, 4–6 July 2012, pp. 995–1000 (2012)
Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)
Benabderrahmane, S.: Enhancing transcriptomic data mining with semantic ranking: towards a new functional spectral representation. In: Rojas, I., Guzman, F.M.O.(eds.) Proceedings of the International Work-Conference on Bioinformatics and Biomedical Engineering, IWBBIO 2013, Granada, Spain, 18–20 March 2013, pp. 721–730. Copicentro Editorial (2013)
Hong, D., Rhie, A., Park, S.S., Lee, J., Ju, Y.S., Kim, S., Yu, S.B., Bleazard, T., Park, H.S., Rhee, H., Chong, H., Yang, K.S., Lee, Y.S., Kim, I.H., Lee, J.S., Kim, J.I., Seo, J.S.: FX: an RNA-Seq analysis tool on the cloud. Bioinformatics 28(5), 721–723 (2012)
Wang, L., Chen, D., Ranjan, R., Khan, S.U., Kolodziej, J., Wang, J.: Parallel processing of massive EEG data with mapreduce. In: 18th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2012, Singapore, 17–19 December 2012, pp. 164–171 (2012)
Markonis, D., Schaer, R., Eggel, I., Müller, H., Depeursinge, A.: Using mapreduce for large-scale medical image analysis. CoRR, abs/1510.06937 (2015)
Alyass, A., Turcotte, M., Meyre, D.: From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med. Genomics 8(1), 33 (2015)
Naseer, A., Alkazemi, B.Y., Waraich, E.U.: A big data approach for proactive healthcare monitoring of chronic patients. In: 2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN), pp. 943–945, July 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Benabderrahmane, S. (2017). What Can the Big Data Eco-System and Data Analytics Do for E-Health? A Smooth Review Study. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10208. Springer, Cham. https://doi.org/10.1007/978-3-319-56148-6_56
Download citation
DOI: https://doi.org/10.1007/978-3-319-56148-6_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56147-9
Online ISBN: 978-3-319-56148-6
eBook Packages: Computer ScienceComputer Science (R0)