Dense Hand-Object (HO) GraspNet with Full Grasping Taxonomy and Dynamics

Cho, Woojin; Lee, Jihyun; Yi, Minjae; Kim, Minje; Woo, Taeyun; Kim, Donghwan; Ha, Taewook; Lee, Hyokeun; Ryu, Je-Hwan; Woo, Woontack; Kim, Tae-Kyun

doi:10.1007/978-3-031-73007-8_17

Woojin Cho¹³,
Jihyun Lee¹³,
Minjae Yi¹³,
Minje Kim¹³,
Taeyun Woo¹³,
Donghwan Kim¹³,
Taewook Ha¹³,
Hyokeun Lee¹⁵,
Je-Hwan Ryu¹⁶,
Woontack Woo¹³ &
…
Tae-Kyun Kim^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15140))

Included in the following conference series:

European Conference on Computer Vision

343 Accesses

Abstract

Existing datasets for 3D hand-object interaction are limited either in the data cardinality, data variations in interaction scenarios, or the quality of annotations. In this work, we present a comprehensive new training dataset for hand-object interaction called HOGraspNet. It is the only real dataset that captures full grasp taxonomies, providing grasp annotation and wide intraclass variations. Using grasp taxonomies as atomic actions, their space and time combinatorial can represent complex hand activities around objects. We select 22 rigid objects from the YCB dataset and 8 other compound objects using shape and size taxonomies, ensuring coverage of all hand grasp configurations. The dataset includes diverse hand shapes from 99 participants aged 10 to 74, continuous video frames, and a 1.5M RGB-Depth of sparse frames with annotations. It offers labels for 3D hand and object meshes, 3D keypoints, contact maps, and grasp labels. Accurate hand and object 3D meshes are obtained by fitting the hand parametric model (MANO) and the hand implicit function (HALO) to multi-view RGBD frames, with the MoCap system only for objects. Note that HALO fitting does not require any parameter tuning, enabling scalability to the dataset’s size with comparable accuracy to MANO. We evaluate HOGraspNet on relevant tasks: grasp classification and 3D hand pose estimation. The result shows performance variations based on grasp type and object class, indicating the potential importance of the interaction space captured by our dataset. The provided data aims at learning universal shape priors or foundation models for 3D hand-object interaction. Our dataset and code are available at https://hograspnet2024.github.io/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ContactPose: A Dataset of Grasps with Object Contact and Hand Pose

GRAB: A Dataset of Whole-Body Human Grasping of Objects

TransGrasp: Grasp Pose Estimation of a Category of Objects by Transferring Grasps from Only One Labeled Instance

References

Arapi, V., Della Santina, C., Averta, G., Bicchi, A., Bianchi, M.: Understanding human manipulation with the environment: a novel taxonomy for video labelling. IEEE Robot. Autom. Lett. 6(4), 6537–6544 (2021)
Article Google Scholar
Bhatnagar, B.L., Xie, X., Petrov, I., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Behave: dataset and method for tracking human object interactions. In: CVPR (2022)
Google Scholar
Brahmbhatt, S., Ham, C., Kemp, C.C., Hays, J.: ContactDB: analyzing and predicting grasp contact via thermal imaging. In: CVPR (2019)
Google Scholar
Brahmbhatt, S., Tang, C., Twigg, C.D., Kemp, C.C., Hays, J.: ContactPose: a dataset of grasps with object contact and hand pose. In: ECCV (2020)
Google Scholar
Calli, B., Singh, A., Walsman, A., Srinivasa, S., Abbeel, P., Dollar, A.M.: The YCB object and model set: towards common benchmarks for manipulation research. In: ICAR (2015)
Google Scholar
Cao, Z., Radosavovic, I., Kanazawa, A., Malik, J.: Reconstructing hand-object interactions in the wild. In: ICCV (2021)
Google Scholar
Caramalau, R., Bhattarai, B., Kim, T.K.: Active learning for Bayesian 3D hand pose estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3419–3428 (2021)
Google Scholar
Chao, Y.W., et al.: DexYCB: a benchmark for capturing hand grasping of objects. In: CVPR (2021)
Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Chen, Y., et al.: Joint hand-object 3D reconstruction from a single image with cross-branch feature fusion. TIP (2021)
Google Scholar
Chen, Z., Chen, S., Schmid, C., Laptev, I.: gSDF: geometry-driven signed distance functions for 3D hand-object reconstruction. In: CVPR (2023)
Google Scholar
Chen, Z., Hasson, Y., Schmid, C., Laptev, I.: AlignSDF: pose-aligned signed distance fields for hand-object reconstruction. In: ECCV (2022)
Google Scholar
Cho, W., Park, G., Woo, W.: Tracking an object-grabbing hand using occluded depth reconstruction. In: ISMAR-Adjunct (2018)
Google Scholar
Cho, W., Park, G., Woo, W.: Bare-hand depth inpainting for 3D tracking of hand interacting with object. In: ISMAR (2020)
Google Scholar
Cini, F., Ortenzi, V., Corke, P., Controzzi, M.: On the choice of grasp type and location when handing over an object. Sci. Robot. 4(27), eaau9757 (2019)
Google Scholar
Corona, E., Pumarola, A., Alenya, G., Moreno-Noguer, F., Rogez, G.: Ganhand: predicting human grasp affordances in multi-object scenes. In: CVPR (2020)
Google Scholar
Damen, D., et al.: Rescaling egocentric vision: collection, pipeline and challenges for epic-kitchens-100. IJCV (2022)
Google Scholar
Doosti, B., Naha, S., Mirbagheri, M., Crandall, D.J.: Hope-net: a graph-based model for hand-object pose estimation. In: CVPR (2020)
Google Scholar
Fan, Z., et al.: ARCTIC: a dataset for dexterous bimanual hand-object manipulation. In: CVPR (2023)
Google Scholar
Feix, T., Romero, J., Schmiedmayer, H.B., Dollar, A.M., Kragic, D.: The grasp taxonomy of human grasp types. IEEE Trans. Hum.-Mach. Syst. 46(1), 66–77 (2015)
Article Google Scholar
Fieraru, M., Zanfir, M., Oneata, E., Popa, A.I., Olaru, V., Sminchisescu, C.: Three-dimensional reconstruction of human interactions. In: CVPR (2020)
Google Scholar
Fu, Q., Liu, X., Xu, R., Niebles, J.C., Kitani, K.M.: Deformer: dynamic fusion transformer for robust hand pose estimation. arXiv preprint arXiv:2303.04991 (2023)
Garcia-Hernando, G., Johns, E., Kim, T.K.: Physics-based dexterous manipulations with estimated hand poses and residual reinforcement learning. In: IROS (2020)
Google Scholar
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-D videos and 3d hand pose annotations. In: CVPR (2018)
Google Scholar
Gomez-Donoso, F., Orts-Escolano, S., Cazorla, M.: Large-scale multiview 3D hand pose dataset. IVC (2019)
Google Scholar
Goyal, M., Modi, S., Goyal, R., Gupta, S.: Human hands as probes for interactive object understanding. In: CVPR (2022)
Google Scholar
Grady, P., Tang, C., Twigg, C.D., Vo, M., Brahmbhatt, S., Kemp, C.C.: ContactOpt: optimizing contact to improve grasps. In: CVPR (2021)
Google Scholar
Hampali, S., Rad, M., Oberweger, M., Lepetit, V.: Honnotate: a method for 3D annotation of hand and object poses. In: CVPR (2020)
Google Scholar
Hampali, S., Sarkar, S.D., Rad, M., Lepetit, V.: Keypoint transformer: solving joint identification in challenging hands and object interactions for accurate 3d pose estimation. In: CVPR (2022)
Google Scholar
Hassan, M., Choutas, V., Tzionas, D., Black, M.J.: Resolving 3D human pose ambiguities with 3D scene constraints. In: ICCV (2019)
Google Scholar
Hasson, Y., Tekin, B., Bogo, F., Laptev, I., Pollefeys, M., Schmid, C.: Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: CVPR (2020)
Google Scholar
Hasson, Y., Varol, G., Schmid, C., Laptev, I.: Towards unconstrained joint hand-object reconstruction from RGB videos. In: 3DV (2021)
Google Scholar
Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11807–11816 (2019)
Google Scholar
Hu, H., Yi, X., Zhang, H., Yong, J.H., Xu, F.: Physical interaction: reconstructing hand-object interactions with physics. In: SIGGRAPH Asia (2022)
Google Scholar
Huang, C.H.P., et al.: Capturing and inferring dense full-body human-scene contact. In: CVPR (2022)
Google Scholar
Huang, Y., Taheri, O., Black, M.J., Tzionas, D.: InterCap: joint markerless 3D tracking of humans and objects in interaction from multi-view RGB-D images. IJCV (2024)
Google Scholar
Jiang, N., et al.: Full-body articulated human-object interaction. In: ICCV (2023)
Google Scholar
Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: ICCV (2015)
Google Scholar
Joo, H., Neverova, N., Vedaldi, A.: Exemplar fine-tuning for 3D human pose fitting towards in-the-wild 3D human pose estimation. In: 3DV (2020)
Google Scholar
Karunratanakul, K., Spurr, A., Fan, Z., Hilliges, O., Tang, S.: A skeleton-driven neural occupancy representation for articulated hands. In: 3DV (2021)
Google Scholar
Kwon, T., Tekin, B., Stühmer, J., Bogo, F., Pollefeys, M.: H2O: two hands manipulating objects for first person interaction recognition. In: ICCV (2021)
Google Scholar
Lee, J., Saito, S., Nam, G., Sung, M., Kim, T.K.: InterHandGen: two-hand interaction generation via cascaded reverse diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 527–537 (2024)
Google Scholar
Lee, J., Sung, M., Choi, H., Kim, T.K.: Im2hands: learning attentive implicit representation of interacting two-hand shapes. In: CVPR (2023)
Google Scholar
Leroy, V., Weinzaepfel, P., Brégier, R., Combaluzier, H., Rogez, G.: SMPLy benchmarking 3D human pose estimation in the wild. In: 3DV (2020)
Google Scholar
Li, M., et al.: Interacting attention graph for single image two-hand reconstruction. In: CVPR (2022)
Google Scholar
Lin, K., Wang, L., Liu, Z.: Mesh graphormer. In: ICCV (2021)
Google Scholar
Lin, P., et al.: HandDiffuse: generative controllers for two-hand interactions via diffusion models. In: CoRR, vol. abs/2312.04867 (2023)
Google Scholar
Lin, Z., Ding, C., Yao, H., Kuang, Z., Huang, S.: Harmonious feature learning for interactive hand-object pose estimation. In: CVPR (2023)
Google Scholar
Liu, J., Feng, F., Nakamura, Y.C., Pollard, N.S.: A taxonomy of everyday grasps in action. In: 2014 IEEE-RAS International Conference on Humanoid Robots, pp. 573–580. IEEE (2014)
Google Scholar
Liu, S., Jiang, H., Xu, J., Liu, S., Wang, X.: Semi-supervised 3D hand-object poses estimation with interactions in time. In: CVPR (2021)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM TOG (2015)
Google Scholar
Lugaresi, C., et al.: MediaPipe: a framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. (2008)
Google Scholar
Mehta, D., et al.: Single-shot multi-person 3D pose estimation from monocular RGB. In: 3DV (2018)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: CVPR (2019)
Google Scholar
Moon, G., et al.: A dataset of relighted 3d interacting hands. In: NeurIPS (2024)
Google Scholar
Moon, G., Yu, S.I., Wen, H., Shiratori, T., Lee, K.M.: InterHand2.6M: a dataset and baseline for 3D interacting hand pose estimation from a single RGB image. In: ECCV (2020)
Google Scholar
Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: ICCV (2017)
Google Scholar
Park, G., Kim, T.K., Woo, W.: 3D hand pose estimation with a single infrared camera via domain transfer learning. In: ISMAR (2020)
Google Scholar
Patel, P., Huang, C.H.P., Tesch, J., Hoffmann, D.T., Tripathi, S., Black, M.J.: AGORA: avatars in geography optimized for regression analysis. In: CVPR (2021)
Google Scholar
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR (2019)
Google Scholar
Pavlakos, G., Shan, D., Radosavovic, I., Kanazawa, A., Fouhey, D., Malik, J.: Reconstructing hands in 3D with transformers. In: CVPR (2024)
Google Scholar
Pumarola, A., Sanchez, J., Choi, G., Sanfeliu, A., Moreno-Noguer, F.: 3DPeople: modeling the geometry of dressed humans. In: ICCV (2019)
Google Scholar
Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: CVPR (2014)
Google Scholar
Qu, W., et al.: Novel-view synthesis and pose estimation for hand-object interaction from sparse views. In: ICCV (2023)
Google Scholar
Ranjan, A., Bolkart, T., Sanyal, S., Black, M.J.: Generating 3D faces using convolutional mesh autoencoders. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM TOG (2017)
Google Scholar
Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)
Google Scholar
Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using RGB and depth data. In: ICCV (2013)
Google Scholar
Stival, F., Michieletto, S., Cognolato, M., Pagello, E., Müller, H., Atzori, M.: A quantitative taxonomy of human hand grasps. J. Neuroeng. Rehabil. 16, 1–17 (2019)
Article Google Scholar
Sun, Y., Liu, W., Bao, Q., Fu, Y., Mei, T., Black, M.J.: Putting people in their place: monocular regression of 3D people in depth. In: CVPR (2022)
Google Scholar
Swamy, A., et al.: SHOWMe: benchmarking object-agnostic hand-object 3D reconstruction. In: ICCV (2023)
Google Scholar
Taheri, O., Ghorbani, N., Black, M.J., Tzionas, D.: Grab: a dataset of whole-body human grasping of objects. In: ECCV 2020 (2020)
Google Scholar
Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: structured estimation of 3D articulated hand posture. In: CVPR (2014)
Google Scholar
Tekin, B., Bogo, F., Pollefeys, M.: H+O: unified egocentric recognition of 3D hand-object poses and interactions. In: CVPR (2019)
Google Scholar
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM TOG (2014)
Google Scholar
Tse, T.H.E., Zhang, Z., Kim, K.I., Leonardis, A., Zheng, F., Chang, H.J.: S2 contact: graph-based network for 3D hand-object contact estimation with semi-supervised learning. In: ECCV (2022)
Google Scholar
Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. IJCV (2016)
Google Scholar
Wang, J., et al.: RGB2Hands: real-time tracking of 3D hand interactions from monocular RGB video. ACM TOG (2020)
Google Scholar
Wen, G., Xiaoyu, B., Xavier, A.P., Francesc, M.N.: Multi-person extreme motion prediction. In: CVPR (2022)
Google Scholar
Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: ICCV (2013)
Google Scholar
Xu, H., Wang, T., Tang, X., Fu, C.W.: H2ONet: hand-occlusion-and-orientation-aware network for real-time 3D hand mesh reconstruction. In: CVPR (2023)
Google Scholar
Yang, L., et al.: OakInk: a large-scale knowledge repository for understanding hand-object interaction. In: CVPR (2022)
Google Scholar
Yang, L., Zhan, X., Li, K., Xu, W., Li, J., Lu, C.: CPF: learning a contact potential field to model the hand-object interaction. In: ICCV (2021)
Google Scholar
Yin, Y., Guo, C., Kaufmann, M., Zarate, J., Song, J., Hilliges, O.: Hi4D: 4D instance segmentation of close human interaction. In: CVPR (2023)
Google Scholar
Yu, Z., Yang, L., Chen, S., Yao, A.: Local and global point cloud reconstruction for 3D hand pose estimation. In: BMVC (2021)
Google Scholar
Yuan, S., Ye, Q., Stenger, B., Jain, S., Kim, T.K.: BigHand2.2M benchmark: hand pose dataset and state of the art analysis. In: CVPR (2017)
Google Scholar
Zhang, B., et al.: Interacting two-hand 3D pose and shape reconstruction from single color image. In: ICCV (2021)
Google Scholar
Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: 3D hand pose tracking and estimation using stereo matching. In: ICIP (2017)
Google Scholar
Zhang, S., et al.: EgoBody: human body shape and motion of interacting people from head-mounted devices. In: ECCV (2022)
Google Scholar
Zhang, X., et al.: Hand image understanding via deep multi-task learning. In: ICCV (2021)
Google Scholar
Zheng, X., Wen, C., Xue, Z., Ren, P., Wang, J.: HaMuCo: hand pose estimation via multiview collaborative self-supervised learning. In: ICCV (2023)
Google Scholar
Zheng, Y., et al.: Deepmulticap: performance capture of multiple characters using sparse multiview cameras. In: ICCV (2021)
Google Scholar
Zimmermann, C., Argus, M., Brox, T.: Contrastive representation learning for hand shape estimation. In: GCPR (2021)
Google Scholar
Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: ICCV (2017)
Google Scholar
Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single RGB images. In: ICCV (2019)
Google Scholar
Zuo, B., Zhao, Z., Sun, W., Xie, W., Xue, Z., Wang, Y.: Reconstructing interacting hands with interaction prior from monocular images. In: ICCV (2023)
Google Scholar

Download references

Acknowledgement

This work was in part sponsored by NST grant (CRC 21011, MSIT), IITP grant (No. 2019-0-01270 and RS-2023-00228996, MSIT).

Author information

Authors and Affiliations

KAIST, Daejeon, South Korea
Woojin Cho, Jihyun Lee, Minjae Yi, Minje Kim, Taeyun Woo, Donghwan Kim, Taewook Ha, Woontack Woo & Tae-Kyun Kim
Imperial College London, London, UK
Tae-Kyun Kim
Kwangwoon University, Seoul, South Korea
Hyokeun Lee
Surromind, Seoul, South Korea
Je-Hwan Ryu

Authors

Woojin Cho
View author publications
You can also search for this author in PubMed Google Scholar
Jihyun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Minjae Yi
View author publications
You can also search for this author in PubMed Google Scholar
Minje Kim
View author publications
You can also search for this author in PubMed Google Scholar
Taeyun Woo
View author publications
You can also search for this author in PubMed Google Scholar
Donghwan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Taewook Ha
View author publications
You can also search for this author in PubMed Google Scholar
Hyokeun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Je-Hwan Ryu
View author publications
You can also search for this author in PubMed Google Scholar
Woontack Woo
View author publications
You can also search for this author in PubMed Google Scholar
Tae-Kyun Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Woojin Cho .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 26776 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cho, W. et al. (2025). Dense Hand-Object (HO) GraspNet with Full Grasping Taxonomy and Dynamics. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15140. Springer, Cham. https://doi.org/10.1007/978-3-031-73007-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-73007-8_17
Published: 01 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73006-1
Online ISBN: 978-3-031-73007-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dense Hand-Object (HO) GraspNet with Full Grasping Taxonomy and Dynamics