Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry

Lin, Yimin; Liu, Zhaoxiang; Huang, Jianfeng; Wang, Chaopeng; Du, Guoguang; Bai, Jinqiang; Lian, Shiguo

doi:10.1007/978-3-030-29911-8_35

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11671))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

3097 Accesses
9 Citations
3 Altmetric

Abstract

Although a wide variety of deep neural networks for robust Visual Odometry (VO) can be found in the literature, they are still unable to solve the drift problem in long-term robot navigation. Thus, this paper aims to propose novel deep end-to-end networks for long-term 6-DoF VO task. It mainly fuses relative and global networks based on Recurrent Convolutional Neural Networks (RCNNs) to improve the monocular localization accuracy. Indeed, the relative sub-networks are implemented to smooth the VO trajectory, while global sub-networks are designed to avoid drift problem. All the parameters are jointly optimized using Cross Transformation Constraints (CTC), which represents temporal geometric consistency of the consecutive frames, and Mean Square Error (MSE) between the predicted pose and ground truth. The experimental results on both indoor and outdoor datasets show that our method outperforms other state-of-the-art learning-based VO methods in terms of pose accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning the Frame-2-Frame Ego-Motion for Visual Odometry with Convolutional Neural Network

Independent Learning of Motion Parameters for Deep Visual Odometry

LPL-VIO: monocular visual-inertial odometry with deep learning-based point and line features

Article 03 October 2024

References

Cadena, C., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Rob. 32(6), 1309–1332 (2016)
Article Google Scholar
Özyeşil, O., Voroninski, V., Basri, R., Singer, A.: A survey of structure from motion. Acta Numerica 26, 305–364 (2017)
Article MathSciNet Google Scholar
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular slam system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)
Article Google Scholar
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: 6th IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR), pp. 225–234 (2007)
Google Scholar
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_54
Chapter Google Scholar
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(3), 611–625 (2018)
Article Google Scholar
Li, R., Wang, S., Gu, D.: Ongoing evolution of visual SLAM from geometry to deep learning: challenges and opportunities. Cogn. Comput. 10(6), 875–889 (2018)
Article Google Scholar
Clark, R., Wang, S., Markham, A., Trigoni, N., Wen, H.: VidLoc: a deep spatio-temporal model for 6-DoF video-clip relocalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 3, pp. 2652–2660 (2017)
Google Scholar
Valada, A., Radwan, N., Burgard, W.: Deep auxiliary learning for visual localization and odometry. arXiv preprint arXiv:1803.03642 (2018)
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2938–2946 (2015)
Google Scholar
Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using LSTMS for structured feature correlation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 627–637 (2017)
Google Scholar
Costante, G., Ciarfuglia, T.A.: LS-VO: learning dense optical subspace for robust visual odometry estimation. IEEE Robot. Autom. Lett. 3(3), 1735–1742 (2018)
Article Google Scholar
Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., Fragkiadaki, K.: SfM-Net: learning of structure and motion from video. arXiv preprint arXiv:1704.07804 (2017)
Iyer, G., Murthy, J.K., Gunshi Gupta, K., Paull, L.: Geometric consistency for self-supervised end-to-end visual odometry. arXiv preprint arXiv:1804.03789 (2018)
Brahmbhatt, S., Gu, J., Kim, K., Hays, J., Kautz, J.: Geometry-aware learning of maps for camera localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2616–2625 (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). arXiv preprint arXiv:1511.07289 (2015)
Wang, S., Clark, R., Wen, H., Trigoni, N.: End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int. J. Robot. Res. 37(4–5), 513–542 (2018)
Article Google Scholar
Zaremba, W., Sutskever, I.: Learning to execute. arXiv preprint arXiv:1410.4615 (2014)
Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2930–2937 (2013)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361 (2012)
Google Scholar
Mohanty, V., Agrawal, S., Datta, S., Ghosh, A., Sharma, V.D., Chakravarty, D.: DeepVO: a deep learning approach for monocular visual odometry. arXiv preprint arXiv:1611.06069 (2016)
Zhao, C., Sun, L., Purkait, P., Duckett, T., Stolkin, R.: Learning monocular visual odometry with dense 3D mapping from dense 3D flow. arXiv preprint arXiv:1803.02286 (2018)

Download references

Author information

Authors and Affiliations

AI Department, CloudMinds Technologies Co. Ltd., Beijing, China
Yimin Lin, Zhaoxiang Liu, Jianfeng Huang, Chaopeng Wang, Guoguang Du & Shiguo Lian
School of Electronic Information Engineering, Beihang University, Beijing, China
Jinqiang Bai

Authors

Yimin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoxiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chaopeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guoguang Du
View author publications
You can also search for this author in PubMed Google Scholar
Jinqiang Bai
View author publications
You can also search for this author in PubMed Google Scholar
Shiguo Lian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yimin Lin .

Editor information

Editors and Affiliations

Department of Computing, Macquarie University, Sydney, NSW, Australia
Abhaya C. Nayak
RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Alok Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, Y. et al. (2019). Deep Global-Relative Networks for End-to-End 6-DoF Visual Localization and Odometry. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11671. Springer, Cham. https://doi.org/10.1007/978-3-030-29911-8_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-29911-8_35
Published: 23 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29910-1
Online ISBN: 978-3-030-29911-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics