Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images

Liu, Yuan; Wen, Yilin; Peng, Sida; Lin, Cheng; Long, Xiaoxiao; Komura, Taku; Wang, Wenping

doi:10.1007/978-3-031-19824-3_18

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13692))

Included in the following conference series:

European Conference on Computer Vision

4231 Accesses
3 Altmetric

Abstract

In this paper, we present a generalizable model-free 6-DoF object pose estimator called Gen6D. Existing generalizable pose estimators either need the high-quality object models or require additional depth maps or object masks in test time, which significantly limits their application scope. In contrast, our pose estimator only requires some posed images of the unseen object and is able to accurately predict poses of the object in arbitrary environments. Gen6D consists of an object detector, a viewpoint selector and a pose refiner, all of which do not require the 3D object model and can generalize to unseen objects. Experiments show that Gen6D achieves state-of-the-art results on two model-free datasets: the MOPED dataset and a new GenMOP dataset. In addition, on the LINEMOD dataset, Gen6D achieves competitive results compared with instance-specific pose estimators. Project page: https://liuyuan-pal.github.io/Gen6D/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

GRPoseNet: a generalizable and robust 6D object pose estimation network using sparse RGB views

Article 11 March 2025

A Hybrid Approach for 6DoF Pose Estimation

Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

References

Ammirato, P., Fu, C.Y., Shvets, M., Kosecka, J., Berg, A.C.: Target driven instance detection. arXiv preprint arXiv:1803.04610 (2018)
Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.K.: Pose guided RGBD feature learning for 3D object pose estimation. In: ICCV (2017)
Google Scholar
Banani, M.E., Corso, J.J., Fouhey, D.F.: Novel object viewpoint estimation through reconstruction alignment. In: CVPR (2020)
Google Scholar
Busam, B., Jung, H.J., Navab, N.: I like to move it: 6D pose estimation as an action decision process. arXiv preprint arXiv:2009.12678 (2020)
Cai, D., Heikkilä, J., Rahtu, E.: OVE6D: object viewpoint encoding for depth-based 6D object pose estimation. In: CVPR (2022)
Google Scholar
Cai, M., Reid, I.: Reconstruct locally, localize globally: a model free method for object pose estimation. In: CVPR (2020)
Google Scholar
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Chen, D., Li, J., Wang, Z., Xu, K.: Learning canonical shape space for category-level 6D object pose and size estimation. In: CVPR (2020)
Google Scholar
Chen, K., Dou, Q.: SGPA: structure-guided prior adaptation for category-level 6D object pose estimation. In: ICCV (2021)
Google Scholar
Chen, W., Jia, X., Chang, H.J., Duan, J., Shen, L., Leonardis, A.: FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. In: CVPR (2021)
Google Scholar
Chen, X., Dong, Z., Song, J., Geiger, A., Hilliges, O.: Category level object pose estimation via neural analysis-by-synthesis. In: ECCV (2020)
Google Scholar
Dani, M., Narain, K., Hebbalaguppe, R.: 3DPoselite: a compact 3D pose estimation using node embeddings. In: WACV (2021)
Google Scholar
Deng, X., Geng, J., Bretl, T., Xiang, Y., Fox, D.: iCaps: iterative category-level object pose and shape estimation. IEEE Robot. Autom. Lett. 7, 1784–1791 (2022)
Article Google Scholar
Di, Y., Manhardt, F., Wang, G., Ji, X., Navab, N., Tombari, F.: So-Pose: exploiting self-occlusion for direct 6D pose estimation. In: CVPR (2021)
Google Scholar
Di, Y., et al.: GPV-Pose: category-level object pose estimation via geometry-guided point-wise voting. arXiv preprint arXiv:2203.07918 (2022)
Goodwin, W., Vaze, S., Havoutis, I., Posner, I.: Zero-shot category-level object pose estimation. arXiv preprint arXiv:2204.03635 (2022)
Gou, M., Pan, H., Fang, H.S., Liu, Z., Lu, C., Tan, P.: Unseen object 6D pose estimation: a benchmark and baselines. arXiv preprint arXiv:2206.11808 (2022)
Grabner, A., et al.: Geometric correspondence fields: learned differentiable rendering for 3D pose refinement in the wild. In: ECCV (2020)
Google Scholar
Gu, Q., Okorn, B., Held, D.: OSSID: online self-supervised instance detection by (and for) pose estimation. IEEE Robot. Autom. Lett. 7, 3022–3029 (2022)
Article Google Scholar
He, Y., Wang, Y., Fan, H., Sun, J., Chen, Q.: FS6D: few-shot 6D pose estimation of novel objects. In: CVPR (2022)
Google Scholar
Hinterstoisser, S., et al.: Gradient response maps for real-time detection of texture-less objects. T-PAMI 34(5), 876–888 (2011)
Article Google Scholar
Hinterstoisser, S., et al.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: ICCV (2011)
Google Scholar
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: ACCV (2012)
Google Scholar
Hodan, T., Barath, D., Matas, J.: EPOS: estimating 6D pose of objects with symmetries. In: CVPR (2020)
Google Scholar
Hodan, T., et al.: BOP: benchmark for 6D object pose estimation. In: ECCV (2018)
Google Scholar
Hodaň, T., et al.: Bop challenge 2020 on 6D object localization. In: ECCV (2020)
Google Scholar
Hu, Y., Fua, P., Wang, W., Salzmann, M.: Single-stage 6D object pose estimation. In: CVPR (2020)
Google Scholar
Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: ECCV (2020)
Google Scholar
Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. In: ECCV (2018)
Google Scholar
Lin, J., Li, H., Chen, K., Lu, J., Jia, K.: Sparse steerable convolutions: an efficient learning of se (3)-equivariant features for estimation and tracking of object poses in 3D space. NeurIPS (2021)
Google Scholar
Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y.: DualPoseNet: category-level 6D object pose and size estimation using dual pose network with refined learning of pose consistency. In: ICCV (2021)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, X., Iwase, S., Kitani, K.M.: StereOBJ-1M: large-scale stereo image dataset for 6D object pose estimation. In: CVPR (2021)
Google Scholar
Liu, X., Jonschkowski, R., Angelova, A., Konolige, K.: KeyPose: multi-view 3D labeling and keypoint estimation for transparent objects. In: CVPR (2020)
Google Scholar
Mercier, J.P., Garon, M., Giguere, P., Lalonde, J.F.: Deep template-based object instance detection. In: WACV (2021)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Google Scholar
Nguyen, V.N., Hu, Y., Xiao, Y., Salzmann, M., Lepetit, V.: Templates for 3D object pose estimation revisited: generalization to new objects and robustness to occlusions. In: CVPR (2022)
Google Scholar
Okorn, B., Gu, Q., Hebert, M., Held, D.: ZePHyR: zero-shot pose hypothesis rating. In: ICRA (2021)
Google Scholar
Osokin, A., Sumin, D., Lomakin, V.: OS2D: one-stage one-shot object detection by matching anchor features. In: ECCV (2020)
Google Scholar
Park, J., Cho, N.I.: DProST: 6-DoF object pose estimation using space carving and dynamic projective spatial transformer. arXiv preprint arXiv:2112.08775 (2021)
Park, K., Mousavian, A., Xiang, Y., Fox, D.: LatentFusion: end-to-end differentiable reconstruction and rendering for unseen object pose estimation. In: CVPR (2020)
Google Scholar
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6-DoF pose estimation. In: CVPR (2019)
Google Scholar
Pitteri, G., Bugeau, A., Ilic, S., Lepetit, V.: 3D object detection and pose estimation of unseen objects in color images with local surface embeddings. In: ACCV (2020)
Google Scholar
Pitteri, G., Ilic, S., Lepetit, V.: CorNet: generic 3D corners for 6D pose estimation of new objects without retraining. In: ICCVW (2019)
Google Scholar
Pitteri, G., Ramamonjisoa, M., Ilic, S., Lepetit, V.: On object symmetries and 6D pose estimation from images. In: 3DV (2019)
Google Scholar
Ponimatkin, G., Labbé, Y., Russell, B., Aubry, M., Sivic, J.: Focal length and object pose estimation via render and compare. In: CVPR (2022)
Google Scholar
Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: CVPR (2017)
Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Google Scholar
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Google Scholar
Shugurov, I., Li, F., Busam, B., Ilic, S.: OSOP: a multi-stage one shot object pose estimation framework. In: CVPR (2022)
Google Scholar
Simeonov, A., et al.: Neural descriptor fields: Se (3)-equivariant object representations for manipulation. arXiv preprint arXiv:2112.05124 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Song, C., Song, J., Huang, Q.: HybridPose: 6D object pose estimation under hybrid representations. In: CVPR (2020)
Google Scholar
Su, Y., et al.: ZebraPose: coarse to fine surface encoding for 6DoF object pose estimation. In: CVPR (2022)
Google Scholar
Sun, J., et al.: OnePose: one-shot object pose estimation without CAD models. CVPR (2022)
Google Scholar
Sundermeyer, M., et al.: Multi-path learning for object pose estimation across domains. In: CVPR (2020)
Google Scholar
Sundermeyer, M., Marton, Z.C., Durner, M., Brucker, M., Triebel, R.: Implicit 3D orientation learning for 6D object detection from RGB images. In: ECCV (2018)
Google Scholar
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: CVPR (2018)
Google Scholar
Tian, M., Ang, M.H., Lee, G.H.: Shape prior deformation for categorical 6D object pose and size estimation. In: ECCV (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Wang, G., Manhardt, F., Shao, J., Ji, X., Navab, N., Tombari, F.: Self6D: self-supervised monocular 6D object pose estimation. In: ECCV (2020)
Google Scholar
Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-Net: geometry-guided direct regression network for monocular 6D object pose estimation. In: CVPR (2021)
Google Scholar
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: CVPR (2019)
Google Scholar
Wang, Q., et al.: IBRNet: learning multi-view image-based rendering. In: CVPR (2021)
Google Scholar
Wen, B., Bekris, K.: BundleTrack: 6D pose tracking for novel objects without instance or category-level 3D models. In: IROS (2021)
Google Scholar
Wen, Y., et al.: Disentangled implicit shape and pose learning for scalable 6D pose estimation. arXiv preprint arXiv:2107.12549 (2021)
Wen, Y., Pan, H., Yang, L., Wang, W.: Edge enhanced implicit orientation learning with geometric prior for 6D pose estimation. In: IROS (2020)
Google Scholar
Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR (2015)
Google Scholar
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robot.: Sci. Syst. (2018)
Google Scholar
Xiao, Y., Marlet, R.: Few-shot object detection and viewpoint estimation for objects in the wild. In: ECCV (2020)
Google Scholar
Xiao, Y., Qiu, X., Langlois, P.A., Aubry, M., Marlet, R.: Pose from Shape: deep pose estimation for arbitrary 3D objects. In: BMVC (2019)
Google Scholar
Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: INeRF: inverting neural radiance fields for pose estimation. In: IROS (2021)
Google Scholar
Zakharov, S., Shugurov, I., Ilic, S.: DPOD: 6D pose object detector and refiner. In: ICCV (2019)
Google Scholar
Zhao, C., Hu, Y., Salzmann, M.: Fusing local similarities for retrieval-based 3D orientation estimation of unseen objects. arXiv preprint arXiv:2203.08472 (2022)

Download references

Author information

Authors and Affiliations

The University of Hong Kong, Hong Kong, China
Yuan Liu, Yilin Wen, Xiaoxiao Long & Taku Komura
Zhejiang University, Hangzhou, China
Sida Peng
Tencent, Shenzhen, China
Cheng Lin
Texas A &M University, College Station, USA
Wenping Wang

Authors

Yuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yilin Wen
View author publications
You can also search for this author in PubMed Google Scholar
Sida Peng
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxiao Long
View author publications
You can also search for this author in PubMed Google Scholar
Taku Komura
View author publications
You can also search for this author in PubMed Google Scholar
Wenping Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenping Wang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3734 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y. et al. (2022). Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13692. Springer, Cham. https://doi.org/10.1007/978-3-031-19824-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-19824-3_18
Published: 11 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19823-6
Online ISBN: 978-3-031-19824-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images