Abstract
In this paper, we present a generalizable model-free 6-DoF object pose estimator called Gen6D. Existing generalizable pose estimators either need the high-quality object models or require additional depth maps or object masks in test time, which significantly limits their application scope. In contrast, our pose estimator only requires some posed images of the unseen object and is able to accurately predict poses of the object in arbitrary environments. Gen6D consists of an object detector, a viewpoint selector and a pose refiner, all of which do not require the 3D object model and can generalize to unseen objects. Experiments show that Gen6D achieves state-of-the-art results on two model-free datasets: the MOPED dataset and a new GenMOP dataset. In addition, on the LINEMOD dataset, Gen6D achieves competitive results compared with instance-specific pose estimators. Project page: https://liuyuan-pal.github.io/Gen6D/.
Similar content being viewed by others
References
Ammirato, P., Fu, C.Y., Shvets, M., Kosecka, J., Berg, A.C.: Target driven instance detection. arXiv preprint arXiv:1803.04610 (2018)
Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.K.: Pose guided RGBD feature learning for 3D object pose estimation. In: ICCV (2017)
Banani, M.E., Corso, J.J., Fouhey, D.F.: Novel object viewpoint estimation through reconstruction alignment. In: CVPR (2020)
Busam, B., Jung, H.J., Navab, N.: I like to move it: 6D pose estimation as an action decision process. arXiv preprint arXiv:2009.12678 (2020)
Cai, D., Heikkilä, J., Rahtu, E.: OVE6D: object viewpoint encoding for depth-based 6D object pose estimation. In: CVPR (2022)
Cai, M., Reid, I.: Reconstruct locally, localize globally: a model free method for object pose estimation. In: CVPR (2020)
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Chen, D., Li, J., Wang, Z., Xu, K.: Learning canonical shape space for category-level 6D object pose and size estimation. In: CVPR (2020)
Chen, K., Dou, Q.: SGPA: structure-guided prior adaptation for category-level 6D object pose estimation. In: ICCV (2021)
Chen, W., Jia, X., Chang, H.J., Duan, J., Shen, L., Leonardis, A.: FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. In: CVPR (2021)
Chen, X., Dong, Z., Song, J., Geiger, A., Hilliges, O.: Category level object pose estimation via neural analysis-by-synthesis. In: ECCV (2020)
Dani, M., Narain, K., Hebbalaguppe, R.: 3DPoselite: a compact 3D pose estimation using node embeddings. In: WACV (2021)
Deng, X., Geng, J., Bretl, T., Xiang, Y., Fox, D.: iCaps: iterative category-level object pose and shape estimation. IEEE Robot. Autom. Lett. 7, 1784–1791 (2022)
Di, Y., Manhardt, F., Wang, G., Ji, X., Navab, N., Tombari, F.: So-Pose: exploiting self-occlusion for direct 6D pose estimation. In: CVPR (2021)
Di, Y., et al.: GPV-Pose: category-level object pose estimation via geometry-guided point-wise voting. arXiv preprint arXiv:2203.07918 (2022)
Goodwin, W., Vaze, S., Havoutis, I., Posner, I.: Zero-shot category-level object pose estimation. arXiv preprint arXiv:2204.03635 (2022)
Gou, M., Pan, H., Fang, H.S., Liu, Z., Lu, C., Tan, P.: Unseen object 6D pose estimation: a benchmark and baselines. arXiv preprint arXiv:2206.11808 (2022)
Grabner, A., et al.: Geometric correspondence fields: learned differentiable rendering for 3D pose refinement in the wild. In: ECCV (2020)
Gu, Q., Okorn, B., Held, D.: OSSID: online self-supervised instance detection by (and for) pose estimation. IEEE Robot. Autom. Lett. 7, 3022–3029 (2022)
He, Y., Wang, Y., Fan, H., Sun, J., Chen, Q.: FS6D: few-shot 6D pose estimation of novel objects. In: CVPR (2022)
Hinterstoisser, S., et al.: Gradient response maps for real-time detection of texture-less objects. T-PAMI 34(5), 876–888 (2011)
Hinterstoisser, S., et al.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: ICCV (2011)
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: ACCV (2012)
Hodan, T., Barath, D., Matas, J.: EPOS: estimating 6D pose of objects with symmetries. In: CVPR (2020)
Hodan, T., et al.: BOP: benchmark for 6D object pose estimation. In: ECCV (2018)
Hodaň, T., et al.: Bop challenge 2020 on 6D object localization. In: ECCV (2020)
Hu, Y., Fua, P., Wang, W., Salzmann, M.: Single-stage 6D object pose estimation. In: CVPR (2020)
Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6D pose estimation. In: ECCV (2020)
Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: deep iterative matching for 6D pose estimation. In: ECCV (2018)
Lin, J., Li, H., Chen, K., Lu, J., Jia, K.: Sparse steerable convolutions: an efficient learning of se (3)-equivariant features for estimation and tracking of object poses in 3D space. NeurIPS (2021)
Lin, J., Wei, Z., Li, Z., Xu, S., Jia, K., Li, Y.: DualPoseNet: category-level 6D object pose and size estimation using dual pose network with refined learning of pose consistency. In: ICCV (2021)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, X., Iwase, S., Kitani, K.M.: StereOBJ-1M: large-scale stereo image dataset for 6D object pose estimation. In: CVPR (2021)
Liu, X., Jonschkowski, R., Angelova, A., Konolige, K.: KeyPose: multi-view 3D labeling and keypoint estimation for transparent objects. In: CVPR (2020)
Mercier, J.P., Garon, M., Giguere, P., Lalonde, J.F.: Deep template-based object instance detection. In: WACV (2021)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Nguyen, V.N., Hu, Y., Xiao, Y., Salzmann, M., Lepetit, V.: Templates for 3D object pose estimation revisited: generalization to new objects and robustness to occlusions. In: CVPR (2022)
Okorn, B., Gu, Q., Hebert, M., Held, D.: ZePHyR: zero-shot pose hypothesis rating. In: ICRA (2021)
Osokin, A., Sumin, D., Lomakin, V.: OS2D: one-stage one-shot object detection by matching anchor features. In: ECCV (2020)
Park, J., Cho, N.I.: DProST: 6-DoF object pose estimation using space carving and dynamic projective spatial transformer. arXiv preprint arXiv:2112.08775 (2021)
Park, K., Mousavian, A., Xiang, Y., Fox, D.: LatentFusion: end-to-end differentiable reconstruction and rendering for unseen object pose estimation. In: CVPR (2020)
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6-DoF pose estimation. In: CVPR (2019)
Pitteri, G., Bugeau, A., Ilic, S., Lepetit, V.: 3D object detection and pose estimation of unseen objects in color images with local surface embeddings. In: ACCV (2020)
Pitteri, G., Ilic, S., Lepetit, V.: CorNet: generic 3D corners for 6D pose estimation of new objects without retraining. In: ICCVW (2019)
Pitteri, G., Ramamonjisoa, M., Ilic, S., Lepetit, V.: On object symmetries and 6D pose estimation from images. In: 3DV (2019)
Ponimatkin, G., Labbé, Y., Russell, B., Aubry, M., Sivic, J.: Focal length and object pose estimation via render and compare. In: CVPR (2022)
Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: CVPR (2017)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Shugurov, I., Li, F., Busam, B., Ilic, S.: OSOP: a multi-stage one shot object pose estimation framework. In: CVPR (2022)
Simeonov, A., et al.: Neural descriptor fields: Se (3)-equivariant object representations for manipulation. arXiv preprint arXiv:2112.05124 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Song, C., Song, J., Huang, Q.: HybridPose: 6D object pose estimation under hybrid representations. In: CVPR (2020)
Su, Y., et al.: ZebraPose: coarse to fine surface encoding for 6DoF object pose estimation. In: CVPR (2022)
Sun, J., et al.: OnePose: one-shot object pose estimation without CAD models. CVPR (2022)
Sundermeyer, M., et al.: Multi-path learning for object pose estimation across domains. In: CVPR (2020)
Sundermeyer, M., Marton, Z.C., Durner, M., Brucker, M., Triebel, R.: Implicit 3D orientation learning for 6D object detection from RGB images. In: ECCV (2018)
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: CVPR (2018)
Tian, M., Ang, M.H., Lee, G.H.: Shape prior deformation for categorical 6D object pose and size estimation. In: ECCV (2020)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Wang, G., Manhardt, F., Shao, J., Ji, X., Navab, N., Tombari, F.: Self6D: self-supervised monocular 6D object pose estimation. In: ECCV (2020)
Wang, G., Manhardt, F., Tombari, F., Ji, X.: GDR-Net: geometry-guided direct regression network for monocular 6D object pose estimation. In: CVPR (2021)
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: CVPR (2019)
Wang, Q., et al.: IBRNet: learning multi-view image-based rendering. In: CVPR (2021)
Wen, B., Bekris, K.: BundleTrack: 6D pose tracking for novel objects without instance or category-level 3D models. In: IROS (2021)
Wen, Y., et al.: Disentangled implicit shape and pose learning for scalable 6D pose estimation. arXiv preprint arXiv:2107.12549 (2021)
Wen, Y., Pan, H., Yang, L., Wang, W.: Edge enhanced implicit orientation learning with geometric prior for 6D pose estimation. In: IROS (2020)
Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR (2015)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robot.: Sci. Syst. (2018)
Xiao, Y., Marlet, R.: Few-shot object detection and viewpoint estimation for objects in the wild. In: ECCV (2020)
Xiao, Y., Qiu, X., Langlois, P.A., Aubry, M., Marlet, R.: Pose from Shape: deep pose estimation for arbitrary 3D objects. In: BMVC (2019)
Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: INeRF: inverting neural radiance fields for pose estimation. In: IROS (2021)
Zakharov, S., Shugurov, I., Ilic, S.: DPOD: 6D pose object detector and refiner. In: ICCV (2019)
Zhao, C., Hu, Y., Salzmann, M.: Fusing local similarities for retrieval-based 3D orientation estimation of unseen objects. arXiv preprint arXiv:2203.08472 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y. et al. (2022). Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13692. Springer, Cham. https://doi.org/10.1007/978-3-031-19824-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-19824-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19823-6
Online ISBN: 978-3-031-19824-3
eBook Packages: Computer ScienceComputer Science (R0)