Abstract
The fast and explosive growth of digital data in social media and World Wide Web has led to numerous opportunities and research activities in multimedia big data. Among them, disaster management applications have attracted a lot of attention in recent years due to its impacts on society and government. This study targets content analysis and mining for disaster management. Specifically, a multimedia big data framework based on the advanced deep learning techniques is proposed. First, a video dataset of natural disasters is collected from YouTube. Then, two separate deep networks including a temporal audio model and a spatio-temporal visual model are presented to analyze the audio-visual modalities in video clips effectively. Thereafter, the results of both models are integrated using the proposed fusion model based on the Multiple Correspondence Analysis (MCA) algorithm which considers the correlations between data modalities and final classes. The proposed multimodal framework is evaluated on the collected disaster dataset and compared with several state-of-the-art single modality and fusion techniques. The results demonstrate the effectiveness of both visual model and fusion model compared to the baseline approaches. Specifically, the accuracy of the final multi-class classification using the proposed MCA-based fusion reaches to 73% on this challenging dataset.






Similar content being viewed by others
Notes
Available at https://github.com/Breakthrough/PySceneDetect
References
Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: Learning sound representations from unlabeled video. In: Advances in neural information processing systems, pp. 892–900 (2016)
Baecchi, C., Uricchio, T., Bertini, M., Del Bimbo, A.: A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed. Tools Appl. 75(5), 2507–2525 (2016)
Bartlett, M.S., Movellan, J.R., Sejnowski, T.J.: Face recognition by independent component analysis. IEEE Trans. Neural Netw. 13(6), 1450–1464 (2002)
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: ACM international conference on multimodal interfaces, pp. 205–211 (2004)
Careem, M., De Silva, C., De Silva, R., Raschid, L., Weerawarana, S.: Sahana: Overview of a disaster management system. In: IEEE international conference on information and automation, pp. 361–366 (2006)
Chang, K.I., Bowyer, K.W., Flynn, P.J.: An evaluation of multimodal 2D + 3D face biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 619–624 (2005)
Che, X., Ip, B., Lin, L.: A survey of current youtube video characteristics. IEEE Multimed. 22(2), 56–63 (2015)
Chen, C., Zhu, Q., Lin, L., Shyu, M.L.: Web media semantic concept retrieval via tag removal and model fusion. ACM Trans. Intell. Syst. Technol. 4(4), 61 (2013)
Chen, M., Chen, S.C., Shyu, M.L., Wickramaratna, K.: Semantic event detection via multimodal data mining. IEEE Signal Process. Mag. 23(2), 38–46 (2006)
Chen, S.C., Shyu, M.L., Kashyap, R.L.: Augmented transition network as a semantic model for video data. Int. J. Netw. Inf. Syst., Special Issue on Video Data 3(1), 9–25 (2000)
Chen, S.C., Shyu, M.L., Peeta, S., Zhang, C.: Learning-based spatio-temporal vehicle tracking and indexing for transportation multimedia database systems. IEEE Trans. Intell. Transp. Syst. 4(3), 154–167 (2003)
Chen, S.C., Shyu, M.L., Zhang, C.: Innovative shot boundary detection for video indexing. In: Deb, S. (ed.) Video Data Management and Information Retrieval, 217–236. Idea Group Publishing (2005)
Chen, X., Zhang, C., Chen, S.C., Rubin, S.: A human-centered multiple instance learning framework for semantic video retrieval. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 39(2), 228–233 (2009)
Fang, R., Pouyanfar, S., Yang, Y., Chen, S.C., Iyengar, S.S.: Computational health informatics in the big data age: A survey. ACM Comput. Surv. 49(1), 12:1–12:36 (2016)
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649 (2013)
Greenacre, M., Blasius, J.: Multiple correspondence analysis and related methods. Chapman and Hall/CRC press, London (2006)
Grosky, W.I., Zhang, C., Chen, S.C.: Intelligent and pervasive multimedia systems. IEEE MultiMed. 16(1), 14–15 (2009)
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Hu, X., Deng, F., Li, K., Zhang, T., Chen, H., Jiang, X., Lv, J., Zhu, D., Faraco, C., Zhang, D., et al.: Bridging low-level features and high-level semantics via fMRI brain imaging for video classification. In: ACM international conference on multimedia, pp. 451–460 (2010)
Huh, M., Agrawal, P., Efros, A.A.: What makes ImageNet good for transfer learning?. arXiv:1608.08614 (2016)
Josse, J., Chavent, M., Liquet, B., Husson, F.: Handling missing values with regularized iterative multiple correspondence analysis. J. Classif. 29(1), 91–116 (2012)
Kahou, S.E., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, K., Jean, S., Froumenty, P., Dauphin, Y., Boulanger-Lewandowski, N., et al.: Emonets: Multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces 10(2), 99–111 (2016)
Kessous, L., Castellano, G., Caridakis, G.: Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J. Multimodal User Interfaces 3(1-2), 33–48 (2010)
Khan, S., Yong, S.P.: A comparison of deep learning and hand crafted features in medical image modality classification. In: IEEE international conference on computer and information sciences, pp. 633–638 (2016)
Kim, Y.: Convolutional neural networks for sentence classification. In: Conference on empirical methods in natural language processing, pp. 1746–1751 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
Lahat, D., Adali, T., Jutten, C.: Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE 103(9), 1449–1477 (2015)
Li, T., Xie, N., Zeng, C., Zhou, W., Zheng, L., Jiang, Y., Yang, Y., Ha, H.Y., Xue, W., Huang, Y., et al.: Data-driven techniques in disaster information management. ACM Comput. Surv. 50(1), 1:1–1:45 (2017)
Li, Y., Gai, K., Ming, Z., Zhao, H., Qiu, M.: Intercrossed access controls for secure financial services on multimedia big data in cloud systems. ACM Trans. Multimed. Comput. Commun. Appl. 12(4), 67:1–67:18 (2016)
Lin, L., Chen, C., Shyu, M.L., Chen, S.C.: Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE MultiMed. 18(3), 32–43 (2011)
Lin, L., Ravitz, G., Shyu, M.L., Chen, S.C.: Effective feature space reduction with imbalanced data for semantic concept detection. In: IEEE international conference on sensor networks, ubiquitous and trustworthy computing, pp. 262–269 (2008)
Lin, L., Shyu, M.L.: Weighted association rule mining for video semantic detection. Methods and Innovations for Multimedia Database Content Management 1 (1), 37–54 (2012)
Maestre, E., Papiotis, P., Marchini, M., Llimona, Q., Mayor, O., Pérez, A., Wanderley, M.M.: Enriched multimodal representations of music performances: Online access and visualization. IEEE MultiMed. 24(1), 24–34 (2017)
McDonald, K., Smeaton, A.F.: A comparison of score, rank and probability-based fusion methods for video shot retrieval. In: International conference on image and video retrieval, pp. 61–70 (2005)
Meissner, A., Luckenbach, T., Risse, T., Kirste, T., Kirchner, H.: Design challenges for an integrated disaster management communication and information system. In: IEEE workshop on disaster recovery networks (2002)
Meng, T., Shyu, M.L.: Leveraging concept association network for multimedia rare concept mining and retrieval. In: IEEE international conference on multimedia and expo, pp. 860–865 (2012)
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: IEEE international conference on computer vision, pp. 104–111 (2009)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International conference on machine learning, pp. 689–696 (2011)
Pan, L., Pouyanfar, S., Chen, H., Qin, J., Chen, S.C.: Deepfood: Automatic multi-class classification of food ingredients using deep learning. In: IEEE international conference on collaboration and internet computing, pp. 181–189 (2017)
Poria, S., Cambria, E., Hussain, A., Huang, G.B.: Towards an intelligent framework for multimodal affective data analysis. Neural Netw. 63, 104–116 (2015)
Pouyanfar, S., Chen, S.C.: Semantic concept detection using weighted discretization multiple correspondence analysis for disaster information management. In: IEEE international conference on information reuse and integration, pp. 556–564 (2016)
Pouyanfar, S., Chen, S.C.: Automatic video event detection for imbalance data using enhanced ensemble deep learning. Int. J. Semantic Comput. 11(01), 85–109 (2017)
Pouyanfar, S., Chen, S.C., Shyu, M.L.: Deep spatio-temporal representation learning for multi-class imbalanced data classification. In: IEEE international conference on information reuse and integration for data science, pp. 386–393 (2018)
Pouyanfar, S., Yang, Y., Chen, S.C., Shyu, M.L., Iyengar, S.S.: Multimedia big data analytics: A survey. ACM Comput. Surv. 51(1), 10:1–10:34 (2018)
Saporta, G.: Data fusion and data grafting. Comput. Stat. Data Anal. 38(4), 465–473 (2002)
Shyu, M.L., Chen, S.C.: Emerging multimedia research and applications. IEEE MultiMed. 22(4), 11–13 (2015)
Shyu, M.L., Chen, S.C., Kashyap, R.L.: Generalized affinity-based association rule mining for multimedia database queries. Knowl. Inf. Syst. 3(3), 319–337 (2001)
Shyu, M.L., Sarinnapakorn, K., Kuruppu-Appuhamilage, I., Chen, S.C., Chang, L., Goldring, T.: Handling nominal features in anomaly intrusion detection problems. In: International workshop on research issues in data engineering: Stream data mining and applications, pp. 55–62 (2005)
Smith, J.R.: Riding the multimedia big data wave. In: ACM SIGIR conference on research and development in information retrieval, pp. 1–2 (2013)
Song, F., Guo, Z., Mei, D.: Feature selection using principal component analysis. In: IEEE international conference on system science, engineering design and manufacturing informatization, pp. 27–30 (2010)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE conference on computer vision and pattern recognition, pp. 2818–2826 (2016)
Tian, H., Chen, S.C.: MCA-NN: Multiple correspondence analysis based neural network for disaster information detection. In: IEEE international conference on multimedia big data, pp. 268–275 (2017)
Tian, H., Chen, S.C., Rubin, S.H., Grefe, W.K.: FA-MCADF: Feature affinity based multiple correspondence analysis and decision fusion framework for disaster information management. In: IEEE international conference on information reuse and integration, pp. 198–206 (2017)
Tian, Y., Chen, S.C., Shyu, M. L., Huang, T., Sheu, P., Del Bimbo, A.: Multimedia big data. IEEE MultiMed. 22(3), 93–95 (2015)
Walch, M., Lange, K., Baumann, M., Weber, M.: Autonomous driving: investigating the feasibility of car-driver handover assistance. In: ACM international conference on automotive user interfaces and interactive vehicular applications, pp. 11–18 (2015)
Wang, Z., Kuan, K., Ravaut, M., Manek, G., Song, S., Fang, Y., Kim, S., Chen, N., D’Haro, L.F., Tuan, L.A., et al.: Truly multi-modal youtube-8m video classification with video, audio, and text. arXiv:1706.05461 (2017)
Weill, P., Vitale, M.: Place to space: Migrating to eBusiness models. Harvard Business Press, Brighton (2001)
Wöllmer, M., Metallinou, A., Eyben, F., Schuller, B., Narayanan, S.: Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In: Annual conference of the international speech communication association, pp. 2362–2365 (2010)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Wu, Y., Chang, E.Y., Chang, K.C.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: ACM international conference on multimedia, pp. 572–579 (2004)
Yang, Y., Pouyanfar, S., Tian, H., Chen, M., Chen, S.C., Shyu, M.L.: IF-MCA: Importance factor-based multiple correspondence analysis for multimedia data analytics. IEEE Trans. Multimed. 20(4), 1024–1032 (2018)
Yates, D., Paquette, S.: Emergency knowledge management and social media technologies: A case study of the 2010 haitian earthquake. Int. J. Inf. Manag. 31(1), 6–13 (2011)
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp. 487–495 (2014)
Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Feature selection using correlation and reliability based scoring metric for video semantic detection. In: IEEE international conference on semantic computing, pp. 462–469 (2010)
Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Effective supervised discretization for classification based on correlation maximization. In: IEEE international conference on information reuse and integration, pp. 390–395 (2011)
Zhu, W., Cui, P., Wang, Z., Hua, G.: Multimedia big data computing. IEEE Multimed. 22(3), 96–105 (2015)
Acknowledgments
This research is partially supported by NSF CNS-1461926.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Big Data for Effective Disaster Management
Guest Editors: Xuan Song, Song Guo, and Haizhong Wang
Rights and permissions
About this article
Cite this article
Pouyanfar, S., Tao, Y., Tian, H. et al. Multimodal deep learning based on multiple correspondence analysis for disaster management. World Wide Web 22, 1893–1911 (2019). https://doi.org/10.1007/s11280-018-0636-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-018-0636-4