Abstract
In skeleton-based action recognition, treating skeleton data as pseudoimages using convolutional neural networks (CNNs) has proven to be effective. However, among existing CNN-based approaches, most focus on modeling information at the joint-level ignoring the size and direction information of the skeleton edges, which play an important role in action recognition, and these approaches may not be optimal. In addition, combining the directionality of human motion to portray action motion variation information is rarely considered in existing approaches, although it is more natural and reasonable for action sequence modeling. In this work, we propose a novel direction-guided two-stream convolutional neural network for skeleton-based action recognition. In the first stream, our model focuses on our defined edge-level information (including edge and edge_motion information) with directionality in the skeleton data to explore the spatiotemporal features of the action. In the second stream, since the motion is directional, we define different skeleton edge directions and extract different motion information (including translation and rotation information) in different directions to better exploit the motion features of the action. In addition, we propose a description of human motion inscribed by a combination of translation and rotation, and explore how they are integrated. We conducted extensive experiments on two challenging datasets, the NTU-RGB+D 60 and NTU-RGB+D 120 datasets, to verify the superiority of our proposed method over state-of-the-art methods. The experimental results demonstrate that the proposed direction-guided edge-level information and motion information complement each other for better action recognition.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Caetano C, Sena J, Brémond F et al. (2019) Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp 1–8
Chen H, Jiang Y, Ko H (2021) Action recognition with domain invariant features of skeleton image. In: 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS), IEEE, pp 1–7
Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), IEEE, pp 579–583
Hou Y, Li Z, Wang P et al (2016) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circuits Syst Video Technol 28(3):807–811
Hou Y, Yu H, Zhou D et al (2021) Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition. Neural Comput Appl 33(23):16,439-16,450
Jing C, Wei P, Sun H et al (2020) Spatiotemporal neural networks for action recognition based on joint loss. Neural Comput Appl 32(9):4293–4302
Ke Q, Bennamoun M, An S et al (2018) Learning clip representations for skeleton-based 3d action recognition. IEEE Trans Image Process 27(6):2842–2855
Kim TS, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE, pp 1623–1631
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst, 25
Li C, Hou Y, Wang P et al (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628
Li C, Zhong Q, Xie D, et al. (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv preprint arXiv:1804.06055
Li M, Chen S, Chen X, et al. (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595–3603
Liu H, Tu J, Liu M (2017a) Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint arXiv:1705.08106
Liu J, Shahroudy A, Xu D, et al (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision, Springer, pp 816–833
Liu J, Wang G, Duan LY et al (2017) Skeleton-based human action recognition with global context-aware attention lstm networks. IEEE Trans Image Process 27(4):1586–1599
Liu J, Shahroudy A, Perez M et al (2019) Ntu rgb+ d 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68:346–362
Liu X, Li Y, Xia R (2021) Adaptive multi-view graph convolutional networks for skeleton-based action recognition. Neurocomputing 444:288–300
Naveenkumar M, Domnic S (2021) Spatio temporal joint distance maps for skeleton-based action recognition using convolutional neural networks. Int J Image Graphics 21(05):2140,001
Naveenkumar M, Domnic S, et al (2020) Learning representations from spatio-temporal distance maps for 3d action recognition with convolutional neural networks
Qin Y, Mo L, Li C et al (2020) Skeleton-based action recognition by part-aware graph convolutional networks. Visual Comput 36(3):621–631
Qin Z, Liu Y, Ji P, et al (2021) Fusing higher-order features in graph neural networks for skeleton-based action recognition. arXiv preprint arXiv:2105.01563
Ren B, Liu M, Ding R, et al (2020) A survey on 3d skeleton-based action recognition using learning method. arXiv preprint arXiv:2002.05907
Shahroudy A, Liu J, Ng TT, et al (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Shi L, Zhang Y, Cheng J, et al (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7912–7921
Si C, Jing Y, Wang W et al (2020) Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network. Pattern Recogn 107(107):511
Si C, Jing Y, Wang W et al (2020) Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network. Pattern Recogn 107(107):511
Trelinski J, Kwolek B (2021) Cnn-based and dtw features for human activity recognition on depth maps. Neural Comput Appl 33(21):14,551-14,563
Wang P, Li W, Li C et al (2018) Action recognition based on joint trajectory maps with convolutional neural networks. Knowl Based Syst 158:43–53
Xia R, Li Y, Luo W (2021) Laga-net: Local-and-global attention network for skeleton based action recognition. IEEE Trans Multimedia
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence
Yao S, Muqing W, Weiyao X (2020) Two-stream convolutional neural network for skeleton-based action recognition. In: 2020 IEEE 6th international conference on computer and communications (ICCC), IEEE, pp 2436–2440
Yun L, Panpan X, Hui L et al (2021) A review of action recognition using joints based on deep learning. J Electronics Inf 43(6):1789–1802
Zhang P, Lan C, Xing J, et al (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision, pp 2117–2126
Zhang P, Xue J, Lan C, et al (2018) Adding attentiveness to the neurons in recurrent neural networks. In: Proceedings of the European conference on computer vision (ECCV), pp 135–151
Acknowledgements
This work was supported in part by the Leading Talent Team Project of Anhui, the province, and the Anqing Normal University and Tongling University Joint Training Postgraduate Research Innovation Fund Project (tlaqsflhy2).
Author information
Authors and Affiliations
Contributions
All author contributed to the study conception and design, data collection and analysis were performed by Peng Zhang, Manzhen Sun and Min Sheng. The first draft of the manuscript was written by Benyue Su, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Su, B., Zhang, P., Sun, M. et al. Direction-guided two-stream convolutional neural networks for skeleton-based action recognition. Soft Comput 27, 11833–11842 (2023). https://doi.org/10.1007/s00500-023-07862-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-07862-1