Abstract
In the modern-day scenario, machines and humans are expected to work together and collaborate in several social and manufacturing environments. The machines should predict humans’ next move for effective collaborations by observing their present move. Human motion modelling and prediction are fundamental and challenging problems involving computer vision and graphics. To help solve some of the challenges, in the present investigation, we propose an innovative idea of developing a new cost function as the objective function based on adaptive sampling, which is subsequently used with an ’Adam’ optimizer for training and validating a specially configured Deep Learning architecture. Our proposed development produced significantly improved results regarding future pose estimation/predictions. The adaptiveness of the proposed cost function is based on a bell-shaped locally weighted function. It has been observed that the area covered by the cost function plays a vital role during training, and the bell-shaped function’s width helps decide the region of importance for the training samples. The proposed cost function has been used for training a gated recurrent unit (GRU) based encoder-decoder architecture. The encoder takes the observed input sequences, extracts the input sequence’s significant variability, and passes it to the decoder. The decoder takes it as input, trains using the adaptive sampling-based method, and predicts future poses. We have experimented with this function in various sizes and shapes and compared the results obtained with some state-of-the-art research results. As elaborated in this paper, we obtained much-improved results in almost all the cases.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability statement
Data sharing does not apply to this article as no datasets were generated or analyzed during the current study.
References
Levine S, Wang JM, Haraux A, Popović Z, Koltun V (2012) Continuous character control with low-dimensional embeddings. ACM Trans Graph (TOG) 31(4):1–10
Koppula H, Saxena A (2013) Learning spatio-temporal structure from RGB-D videos for human activity detection and anticipation. In: International conference on machine learning, pp 792–800
Koppula HS, Saxena A (2013) Anticipating human activities for reactive robotic response. In: IROS, p 2071 Tokyo
Gupta A, Martinez J, Little JJ, Woodham RJ (2014) 3D pose from motion for cross-view action recognition via non-linear circulant temporal encoding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2601–2608
Gong H, Sim J, Likhachev M, Shi J (2011) Multi-hypothesis motion planning for visual object tracking. In: 2011 International conference on computer vision, pp 619–626 IEEE
Urtasun R, Fleet DJ, Fua P (2006) 3D people tracking with Gaussian process dynamical models. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 1, pp 238–245 IEEE
Troje NF (2002) Decomposing biological motion: a framework for analysis and synthesis of human gait patterns. J Vis 2(5):2–2
Fragkiadaki K, Levine S, Felsen P, Malik J (2015) Recurrent network models for human dynamics. In: Proceedings of the IEEE international conference on computer vision, pp 4346–4354
Jain A, Zamir AR, Savarese S, Saxena A (2016) Structural-RNN: deep learning on spatio-temporal graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5308–5317
Martinez J, Black MJ, Romero J (2017) On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2891–2900
Ingram JN, Körding KP, Howard IS, Wolpert DM (2008) The statistics of natural hand movements. Exp Brain Res 188(2):223–236
Gupta S, Yadav GK, Nandi GC (2023) Development of human motion prediction strategy using inception residual block. Multimedia Tools Appl:1–15
Wang JM, Fleet DJ, Hertzmann A (2007) Gaussian process dynamical models for human motion. IEEE Trans Pattern Anal Mach Intell 30(2):283–298
Brand M, Hertzmann A (2000) Style machines. In: Proceedings of the 27th annual conference on computer graphics and interactive techniques, pp 183–192
Taylor GW, Hinton GE, Roweis ST (2007) Modeling human motion using binary latent variables. In: Adv Neural Inf Process Syst, pp 1345–1352
Lehrmann AM, Gehler PV, Nowozin S (2014) Efficient nonlinear Markov models for human motion. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1314–1321
Mao W, Liu M, Salzmann M, Li H (2019) Learning trajectory dependencies for human motion prediction. In: Proceedings of the IEEE international conference on computer vision, pp 9489–9497
Wang B, Adeli E, Chiu H-K, Huang D-A, Niebles JC (2019) Imitation learning for human pose prediction. In: Proceedings of the IEEE international conference on computer vision, pp 7124–7133
Sidenbladh H, Black MJ, Sigal L (2002) Implicit probabilistic models of human motion for synthesis and tracking. In: European conference on computer vision, pp 784–800 Springer
Pavlovic V, Rehg JM, MacCormick J (2001) Learning switching linear models of human motion. In: Adv Neural Inf Process Syst, pp 981–987
Hernandez A, Gall J, Moreno-Noguer F (2019) Human motion prediction via spatio-temporal inpainting. In: Proceedings of the IEEE international conference on computer vision, pp 7134–7143
Lebailly T, Kiciroglu S, Salzmann M, Fua P, Wang W (2020) Motion prediction using temporal inception module. In: Proceedings of the asian conference on computer vision
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Yadav GK, Abdel-Nasser M, Rashwan HA, Puig D, Nandi G (2023) Implicit regularization of a deep augmented neural network model for human motion prediction. Appl Intell:1–14
Liu Z, Wu S, Jin S, Liu Q, Lu S, Zimmermann R, Cheng L (2019) Towards natural and accurate future motion prediction of humans and animals. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10004–10012
Kundu JN, Gor M, Babu RV (2019) BiHMP-GAN: bidirectional 3D human motion prediction GAN. Proc AAAI Conf Artif intell 33:8553–8560
Barsoum E, Kender J, Liu Z (2018) HP-GAN: probabilistic 3D human motion prediction via GAN. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1418–1427
Butepage J, Black MJ, Kragic D, Kjellstrom H (2017) Deep representation learning for human motion prediction and classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6158–6166
Yadav GK, Nandi GC (2020) Development of adaptive sampling based strategy for human activity predictions using sequential networks. In: 2020 IEEE 4th conference on information & communication technology (CICT), pp 1–6 IEEE
Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 3(7):1325–1339
Schaal S, Atkeson CG, Vijayakumar S (2002) Scalable techniques from nonparametric statistics for real time robot learning. Appl Intell 17(1):49–60
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Gui L-Y, Wang Y-X, Liang X, Moura JM (2018) Adversarial geometry-aware human motion prediction. In: Proceedings of the European conference on computer vision (ECCV), pp 786–803
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors have no conflicts of interest to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yadav, G.K., Puig, D. & Nandi, G.C. Designing an adaptive cost function for dynamic human pose predictions. Multimed Tools Appl 83, 53201–53219 (2024). https://doi.org/10.1007/s11042-023-17736-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17736-1