Monocular depth map estimation based on a multi-scale deep architecture and curvilinear saliency feature boosting

Abdulwahab, Saddam; Rashwan, Hatem A.; Garcia, Miguel Angel; Masoumian, Armin; Puig, Domenec

doi:10.1007/s00521-022-07663-x

Monocular depth map estimation based on a multi-scale deep architecture and curvilinear saliency feature boosting

Review
Published: 04 August 2022

Volume 34, pages 16423–16440, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Saddam Abdulwahab ORCID: orcid.org/0000-0003-0902-7245¹,
Hatem A. Rashwan¹^na1,
Miguel Angel Garcia²^na1,
Armin Masoumian¹^na1 &
…
Domenec Puig¹^na1

592 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Estimating depth from a monocular camera is a must for many applications, including scene understanding and reconstruction, robot vision, and self-driving cars. However, generating depth maps from single RGB images is still a challenge as object shapes are to be inferred from intensity images strongly affected by viewpoint changes, texture content and light conditions. Therefore, most current solutions produce blurry approximations of low-resolution depth maps. We propose a novel depth map estimation technique based on an autoencoder network. This network is endowed with a multi-scale architecture and a multi-level depth estimator that preserve high-level information extracted from coarse feature maps as well as detailed local information present in fine feature maps. Curvilinear saliency, which is related to curvature estimation, is exploited as a loss function to boost the depth accuracy at object boundaries and raise the performance of the estimated high-resolution depth maps. We evaluate our model on the public NYU Depth v2 and Make3D datasets. The proposed model yields superior performance on both datasets compared to the state-of-the-art, achieving an accuracy of $~86\%$ and showing exceptional performance at the preservation of object boundaries and small 3D structures. The code of the proposed model is publicly available at https://github.com/SaddamAbdulrhman/MDACSFB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Optimizing depth estimation with attention U-Net

Article 20 July 2024

Digging into the multi-scale structure for a more refined depth map and 3D reconstruction

Article 03 February 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Andhare P, Rawat S (2016 Aug) Pick and place industrial robot controller with computer vision. In: 2016 International Conference on Computing Communication Control and automation (ICCUBEA) vol 12, pp. 1-4
Agarwal N, Chiang CW, Sharma A (2018) A study on computer vision techniques for self-driving cars. InInternational Conference on Frontier Computing, Springer, Singapore, vol 3, pp. 629-634
Kanbara M, Okuma T, Takemura H, Yokoya N (2000) A stereoscopic video see-through augmented reality system based on real-time vision-based registration. In: Proceedings IEEE Virtual Reality 2000 (Cat. No. 00CB37048), vol 18, pp. 255–262
Ding Y et al (2020) Digging into the multi-scale structure for a more refined depth map and 3D reconstruction. Neural Comput Appl 32(15):11217–11228
Article Google Scholar
Trelinski J, Kwolek B (2021) CNN-based and DTW features for human activity recognition on depth maps. Neural Comput Appl 33(21):14551–14563
Article Google Scholar
Saxena A, Chung S, Andrew N (2005) Learning depth from single monocular images. Adv Neural Inf Process Syst 18
Saxena A, Schulte J, Andrew NY (2007) Depth estimation using monocular and stereo cues. IJCAI. 7:2197
Google Scholar
Choi Y et al (2018) Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang YG (2018) Pixel2mesh: generating 3d mesh models from single rgb images. In: Proceedings of the European conference on computer vision (ECCV). pp 52-67
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Ronneberger O, Philipp F, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. Springer, Cham
Xu Shuzhen, Zhu Qing, Wang Jin (2020) Generative image completion with image-to-image translation. Neural Comput Appl 32(11):7333–7345
Article Google Scholar
Sun H et al (2021) Scale-free heterogeneous cycleGAN for defogging from a single image for autonomous driving in fog. Neural Comput Appl pp 1-15
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Process Syst 27
Ge L, Liang H, Yuan J, Thalmann D (2017) 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 1991-2000
Wiles O, Gkioxari G, Szeliski R, Johnson J (2020) Synsin: End-to-end view synthesis from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7467-7477
Wu J et al (2022) Fast monocular depth estimation via side prediction aggregation with continuous spatial refinement. In: IEEE Transactions on Multimedia
Liu J et al (2016) Retrieval compensated group structured sparsity for image super-resolution. IEEE Trans Multimed 19(2):302–316
Article Google Scholar
Jun J et al (2021) Monocular human depth estimation via pose estimation. In: IEEE Access 9: 151444-151457
Alhashim I, Wonka P (2018) High quality monocular depth estimation via transfer learning
Lin L, Huang G, Chen Y, Zhang L, He B (2020) Efficient and high-quality monocular depth estimation via gated multi-scale network. IEEE Access 7(8):7709–18
Article Google Scholar
Rashwan HA, Chambon S, Gurdjos P, Morin G, Charvillat V (2019) Using curvilinear features in focus for registering a single image to a 3D object. IEEE Trans Image Process 28(9):4429–43
Article MathSciNet MATH Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1492-1500
Pirvu M, Robu V, Licaret V, Costea D, Marcu A, Slusanschi E, Sukthankar R, Leordeanu M (2021) Depth distillation: unsupervised metric depth estimation for UAVs by finding consensus between kinematics, optical flow and deep learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3215–3223
Schonberger JL, Frahm JM (2016) Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4104–4113
Lowe David G (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Li B et al (2015) Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Achanta R et al (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282
Article Google Scholar
Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Ling, Chuanwu, Zhang Xiaogang, Chen Hua (2021) ‘Unsupervised monocular depth estimation using attention and multi-warp reconstruction.‘ IEEE Transactions on Multimedia
Ji R et al (2019) Semi-supervised adversarial monocular depth estimation. IEEE Trans Pattern Anal Mach Intell 42(10):2410–2422
Article Google Scholar
Shen G, Zhang Y, Li J, Wei M, Wang Q, Chen G, Heng PA (2021) Learning regularizer for monocular depth estimation with adversarial guidance. In: Proceedings of the 29th ACM International Conference on Multimedia, vol 17, pp 5222–5230
Abdulwahab S et al (2020) Adversarial learning for depth and viewpoint estimation from a single image. IEEE Trans Circuits Syst Video Technol 30(9):2947–2958
Article Google Scholar
Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2002–2011
Hao Z, Li Y, You S, Lu F (2018 Sep) Detail preserving depth estimation from a single image using attention guided networks. In: 2018 International Conference on 3D Vision (3DV), pp 304–313
Laina I et al (2016) Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth international conference on 3D vision (3DV). IEEE
Zheng Jin, Peng Lihui (2018) An autoencoder-based image reconstruction for electrical capacitance tomography. IEEE Sens J 18(13):5464–5474
Article Google Scholar
Blendowski Max, Bouteldja Nassim, Heinrich Mattias P (2020) Multimodal 3D medical image registration guided by shape encoder-decoder networks. Int J Comput Assist Radiol Surg 15(2):269–276
Article Google Scholar
Abdallah BM et al (2018) Noise-estimation-based anisotropic diffusion approach for retinal blood vessel segmentation. Neural Comput Appl 29(8):159–180
Article Google Scholar
Luo B et al (2020) Decomposition algorithm for depth image of human health posture based on brain health. Neural Comput Appl 32(10):6327–6342
Article Google Scholar
Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised CNN for single view depth estimation: geometry to the rescue. Eur Conf Comput Vis 8:740–756
Google Scholar
Wofk D, Ma F, Yang TJ, Karaman S, Sze V (2019) Fastdepth: fast monocular depth estimation on embedded systems. In: 2019 International Conference on Robotics and Automation (ICRA), vol 20, pp 6101–6108
PUIG Domenec (2019) Mgnet: depth map prediction from a single photograph using a multi-generative network. In: Artificial Intelligence Research and Development: Proceedings of the 22nd International Conference of the Catalan Association for Artificial Intelligence. Vol. 319. IOS Press
Kostadinov D, Ivanovski Z (2012) Single image depth estimation using local gradient-based features. In: 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP) vol 11, pp 596–599
Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 270–279
Rashwan HA, Chambon S, Gurdjos P, Morin G, Charvillat V (2016) Towards multi-scale feature detection repeatable over intensity and depth images. IEEE Int Conf Image Process (ICIP) 25:36–40
Google Scholar
Rashwan Hatem A et al (2019) Using curvilinear features in focus for registering a single image to a 3D object. IEEE Trans Image Process 28(9):4429–4443
Article MathSciNet MATH Google Scholar
Abdulwahab S, Rashwan HA, Cristiano J, Chambon S, Puig D (2019) Effective 2D/3D registration using curvilinear saliency features and multi-class SVM. VISIGRAPP 5:354–361
Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, vol 20, pp 248–255
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770-778
Lehtinen J, Munkberg J, Hasselgren J, Laine S, Karras T, Aittala M, Aila T (2018) Noise2Noise: Learning Image Restoration without Clean Data. Int Conf Mach Learn 3:2965–2974
Google Scholar
Maas AL, Hannun AY, Andrew NY (2013) Rectifier nonlinearities improve neural network acoustic models. Proc icml 30:1
Google Scholar
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision. Springer, Berlin, vol 7, pp 746–760
Saxena A, Sun M, Ng AY (2008) Make3D: depth perception from a single still image. AAAI 3:1571–1576
Google Scholar
Kingma DP, and Jimmy LB ADAM: AMETHOD FOR STOCHASTIC OPTIMIZATION.‘
Paszke A, Gross S, Chintala S, Chanan G (2017) Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration. PyTorch Tensors Dyn Neural Netw Python Strong GPU Accel 6(3):67
Google Scholar
Ramamonjisoa M, Firman M, Watson J, Lepetit V, Turmukhambetov D (2021) Single Image Depth Estimation using Wavelet Decomposition.‘
Tang M et al (2021) Encoder-decoder structure with the feature pyramid for depth estimation from a single image. IEEE Access 9:22640–22650
Article Google Scholar
Karsch K, Liu C, Kang SB (2012) Depth extraction from video using non-parametric sampling. In: European conference on computer vision. Springer, Berlin
Kuznietsov Y, Stuckler J, Leibe B (2017) Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6647–6655
Karsch K, Liu C, Kang SB (2014) Depth transfer: depth extraction from video using non-parametric sampling. In: IEEE transactions on pattern analysis and machine intelligence 36 11 : 2144–2158

Download references

Acknowledgements

Financial support was given by the pre-doctoral grant (FI 2020) funded by the Catalan government.

Author information

Hatem A. Rashwan, Miguel Angel Garcia, Armin Masoumian and Domenec Puig contributed equally to this work.

Authors and Affiliations

Department of Computer Engineering and Mathematics, Universitat Rovira i Virgil, Carretera de Valls, 43007, Tarragona, Tarragona, Spain
Saddam Abdulwahab, Hatem A. Rashwan, Armin Masoumian & Domenec Puig
Department of Electronic and Communications Technology, Universidad Autnoma de Madrid, Ciudad Universitaria de Cantoblanco, 28049, Madrid, Madrid, Spain
Miguel Angel Garcia

Authors

Saddam Abdulwahab
View author publications
You can also search for this author inPubMed Google Scholar
Hatem A. Rashwan
View author publications
You can also search for this author inPubMed Google Scholar
Miguel Angel Garcia
View author publications
You can also search for this author inPubMed Google Scholar
Armin Masoumian
View author publications
You can also search for this author inPubMed Google Scholar
Domenec Puig
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Saddam Abdulwahab.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Abdulwahab, S., Rashwan, H.A., Garcia, M.A. et al. Monocular depth map estimation based on a multi-scale deep architecture and curvilinear saliency feature boosting. Neural Comput & Applic 34, 16423–16440 (2022). https://doi.org/10.1007/s00521-022-07663-x

Download citation

Received: 22 March 2022
Accepted: 18 July 2022
Published: 04 August 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00521-022-07663-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Monocular depth map estimation based on a multi-scale deep architecture and curvilinear saliency feature boosting

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Optimizing depth estimation with attention U-Net

Digging into the multi-scale structure for a more refined depth map and 3D reconstruction

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now