Skip to main content
Log in

An improved deep network-based RGB-D semantic segmentation method for indoor scenes

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Semantic segmentation is a hot research issue in the field of image processing. The introduction of depth images improves the effect of semantic segmentation. However, most existing methods do not take into account the differences between RGB and depth features, leading to poor segmentation accuracy. To fully utilize the RGB and depth features, an asymmetric two-branch convolutional neural network structure is proposed in this paper. In the depth feature extraction branch, a feature enhancement module is proposed to reduce noise. Meanwhile, in the branch of RGB feature extraction, a skip connection structure is introduced to extract more abundant RGB features. In addition, a fusion module based on attention mechanism is proposed to make full use of the effective information from the two modals. Finally, extensive experiments are conducted, and the results show that the proposed model can complete the semantic segmentation task for indoor scenes efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability statement

Publicly available datasets were analyzed in this study. These data can be found here: https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html, and http://rgbd.cs.princeton.edu/challenge.html.

References

  1. Lv N, Zhang Z, Li C, Deng J, Su T, Chen C, Zhou Y (2023) A hybrid-attention semantic segmentation network for remote sensing interpretation in land-use surveillance. Int J Mach Learn Cybern 14(2):395–406

    Google Scholar 

  2. Ni J, Shen K, Chen Y, Cao W, Yang SX (2022) An improved deep network-based scene classification method for self-driving cars. IEEE Trans Instrum Meas 71:1–14

    Google Scholar 

  3. Liu Z, Cai Y, Wang H, Chen L, Gao H, Jia Y, Li Y (2021) Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions. IEEE Trans Intell Transp Syst 99:1–14

    Google Scholar 

  4. Chambers C, Seethapathi N, Saluja R, Loeb H, Pierce SR, Bogen DK, Prosser L, Johnson MJ, Kording KP (2020) Computer vision to automatically assess infant neuromotor risk. IEEE Trans Neural Syst Rehabilit Eng 28(11):2431–2442

    Google Scholar 

  5. Pang L, Cao Z, Yu J, Guan P, Chen X, Zhang W (2019) A robust visual person-following approach for mobile robots in disturbing environments. IEEE Syst J 14(2):2965–2968

    Google Scholar 

  6. Sharma N, Gupta S, Mehta P, Cheng X, Shankar A, Singh P, Nayak SR (2022) Offline signature verification using deep neural network with application to computer vision. J Electron Imaging 31(4):041210

    Google Scholar 

  7. Ni J, Wang X, Gong T, Xie Y (2022) An improved adaptive ORB-SLAM method for monocular vision robot under dynamic environments. Int J Mach Learn Cybern 13(12):3821–3836

    Google Scholar 

  8. Chen Y, Zhao H, Hu Z, Peng J (2021) Attention-based context aggregation network for monocular depth estimation. Int J Mach Learn Cybern 12(6):1583–1596

    Google Scholar 

  9. Feng Z, Guo Y, Liang Q, Bhutta MUM, Wang H, Liu M, Sun Y (2022) MAFNet: segmentation of road potholes with multimodal attention fusion network for autonomous vehicles. IEEE Trans Instrum Meas 71:1–12

    Google Scholar 

  10. Rasib M, Butt MA, Riaz F, Sulaiman A, Akram M (2021) Pixel level segmentation based drivable road region detection and steering angle estimation method for autonomous driving on unstructured roads. IEEE Access 9:167855–167867

    Google Scholar 

  11. Reiß S, Seibold C, Freytag A, Rodner E, Stiefelhagen R (2021) Every annotation counts: multi-label deep supervision for medical image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 9532–9542

  12. Ren Z, Wang S, Zhang Y (2023) Weakly supervised machine learning. CAAI Trans Intell Technol 1–32 (Article in press)

  13. Zebari DA, Zeebaree DQ, Abdulazeez AM, Haron H, Hamed HNA (2020) Improved threshold based and trainable fully automated segmentation for breast cancer boundary and pectoral muscle in mammogram images. IEEE Access 8:203097–203116

    Google Scholar 

  14. Pratondo A, Chui C-K, Ong S-H (2015) Robust edge-stop functions for edge-based active contour models in medical image segmentation. IEEE Signal Process Lett 23(2):222–226

    Google Scholar 

  15. Cheng G, Liu L (2020) Survey of image segmentation methods based on clustering. In: 2020 IEEE international conference on information technology, big data and artificial intelligence (ICIBA), vol 1. IEEE, pp 1111–1115

  16. Yi F, Moon I (2012) Image segmentation: a survey of graph-cut methods. In: 2012 international conference on systems and informatics (ICSAI2012). IEEE, pp 1936–1941

  17. Jiang Y, Wang M, Xu H (2012) A survey for region-based level set image segmentation. In: 2012 11th international symposium on distributed computing and applications to business, engineering and science. IEEE, pp 413–416

  18. Cao J, Wu W, Wang R, Kwong S (2022) No-reference image quality assessment by using convolutional neural networks via object detection. Int J Mach Learn Cybern 13(11):3543–3554

    Google Scholar 

  19. Ni J, Chen Y, Chen Y, Zhu J, Ali D, Cao W (2020) A survey on theories and applications for self-driving cars based on deep learning methods. Appl Sci Basel 10(8):2749

    Google Scholar 

  20. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3431–3440

  21. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Analy Mach Intell 39(12):2481–2495

    Google Scholar 

  22. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: 3rd international conference on learning representations. ICLR 2015, San Diego

  23. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Google Scholar 

  24. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), Munich, pp 801–818

  25. Lin G, Milan A, Shen C, Reid I (2017) Refinenet: mlti-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, pp 1925–1934

  26. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, pp 2881–2890

  27. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: 15th European conference on computer vision, ECCV 2018, Munich, pp 325–341

  28. Liu S, Tian G, Zhang Y, Zhang M, Liu S (2022) Service planning oriented efficient object search: a knowledge-based framework for home service robot. Expert Syst Appl 187:115853

    Google Scholar 

  29. Peng J, Ye H, He Q, Qin Y, Wan Z, Lu J (2021) Design of smart home service robot based on ros. Mob Inf Syst 2021:1–14

    Google Scholar 

  30. Chen CS, Lin CJ, Lai CC (2022) Non-contact service robot development in fast-food restaurants. IEEE Access 10:31466–31479

    Google Scholar 

  31. Pan T, Wang B, Ding G, Yong J-H (2017) Fully convolutional neural networks with full-scale-features for semantic segmentation. San Francisco, pp 4240–4246

  32. Qiu Z, Zhuang Y, Yan F, Hu H, Wang W (2019) RGB-DI images and full convolution neural network-based outdoor scene understanding for mobile robots. IEEE Trans Instrum Meas 68(1):27–37

    Google Scholar 

  33. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: 18th international conference on medical image computing and computer-assisted intervention. MICCAI 2015, Munich, pp 234–241

  34. Ikromjanov K, Bhattacharjee S, Sumon RI, Hwang Y-B, Rahman H, Lee M-J, Kim H-C, Park E, Cho N-H, Choi H-K (2023) Region segmentation of whole-slide images for analyzing histological differentiation of prostate adenocarcinoma using ensemble EfficientNetB2 U-Net with transfer learning mechanism. Cancers 15(3):762

    Google Scholar 

  35. Sharma N, Gupta S, Koundal D, Alyami S, Alshahrani H, Asiri Y, Shaikh A (2023) U-Net model with transfer learning model as a backbone for segmentation of gastrointestinal tract. Bioengineering 10(1):119

    Google Scholar 

  36. Jiang J, Lyu C, Liu S, He Y, Hao X (2020) RWSNet: a semantic segmentation network based on SegNet combined with random walk for remote sensing. Int J Remote Sens 41(2):487–505

    Google Scholar 

  37. Zhu D, Qian C, Qu C, He M, Zhang S, Tu Q, Wei W (2022) An improved SegNet network model for accurate detection and segmentation of car body welding slags. Int J Adv Manuf Technol 120(1–2):1095–1105

    Google Scholar 

  38. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Google Scholar 

  39. Xu Z-G, Wang J, Wang L-Y (2018)Infrared image semantic segmentation based on improved DeepLab and residual network. In: 10th International conference on modelling, identification and control. ICMIC 2018, Guiyang

  40. Si H, Shi Z, Hu X, Wang Y, Yang C (2020) Image semantic segmentation based on improved DeepLab V3 model. Int J Model Identif Control 36(2):116–125

    Google Scholar 

  41. Shia W-C, Hsu F-R, Dai S-T, Guo S-L, Chen D-R (2022) Semantic segmentation of the malignant breast imaging reporting and data system lexicon on breast ultrasound images by using DeepLab v3+. Sensors 22(14):5352

    Google Scholar 

  42. Lin D, Zhang R, Ji Y, Li P, Huang H (2018) SCN: switchable context network for semantic segmentation of RGB-D images. IEEE Trans Cybern 50(3):1120–1131

    Google Scholar 

  43. Zhang G, Xue J-H, Xie P, Yang S, Wang G (2021) Non-local aggregation for RGB-D semantic segmentation. IEEE Signal Process Lett 28:658–662

    Google Scholar 

  44. Yan X, Hou S, Karim A, Jia W (2021) RAFNet: RGB-D attention feature fusion network for indoor semantic segmentation. Displays 70:102082

    Google Scholar 

  45. Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: 13th European conference on computer vision. ECCV 2014, Zurich, pp 345–360

  46. Zou W, Peng Y, Zhang Z, Tian S, Li X (2022) RGB-D Gate-guided edge distillation for indoor semantic segmentation. Multimed Tools Appl 81(25):35815–35830

    Google Scholar 

  47. Ren Z, Zhang Y, Wang S (2022) LCDAE: data augmented ensemble framework for lung cancer classification. Technol Cancer Res Treat 21:1–14

    Google Scholar 

  48. Ni J, Shen K, Chen Y, Yang SX (2023) An improved ssd-like deep network-based object detection method for indoor scenes. IEEE Trans Instrum Meas 72:5006915

    Google Scholar 

  49. Song X, Herranz L, Jiang S (2017) Depth CNNs for RGB-D scene recognition: Learning from scratch better than transferring from RGB-CNNs. In: 31st AAAI conference on artificial intelligence. AAAI 2017, San Francisco

  50. Cao J, Leng H, Cohen-Or D, Lischinski D, Chen Y, Tu C, Li Y (2021) RGB\(\times\)D: learning depth-weighted RGB patches for RGB-D indoor semantic segmentation. Neurocomputing 462:568–580

    Google Scholar 

  51. Jiang D, Li G, Tan C, Huang L, Sun Y, Kong J (2021) Semantic segmentation for multiscale target based on object recognition using the improved Faster-RCNN model. Future Gener Comput Syst 123:94–104

    Google Scholar 

  52. Bai L, Yang J, Tian C, Sun Y, Mao M, Xu Y, Xu W (2022) DCANet: differential convolution attention network for RGB-D semantic segmentation. arXiv preprint arXiv:2210.06747

  53. Zhou W, Yuan J, Lei J, Luo T (2020) TSNet: three-stream self-attention network for RGB-D indoor semantic segmentation. IEEE Intell Syst 36(4):73–78

    Google Scholar 

  54. Li Y, Zhang J, Cheng Y, Huang K, Tan T (2017) Semantics-guided multi-level RGB-D feature fusion for indoor semantic segmentation. In: 24th IEEE international conference on image processing, ICIP 2017. IEEE, Beijing, pp 1262–1266

  55. Liu H, Wu W, Wang X, Qian Y (2018) RGB-D joint modelling with scene geometric information for indoor semantic segmentation. Multimed Tools Appl 77(17):22475–22488

    Google Scholar 

  56. Jiao J, Wei Y, Jie Z, Shi H, Lau RW, Huang TS (2019) Geometry-aware distillation for indoor semantic segmentation. In: 32nd IEEE/CVF conference on computer vision and pattern recognition. CVPR 2019, Long Beach, pp 2869–2878

  57. Zhou F, Lai Y-K, Rosin PL, Zhang F, Hu Y (2022) Scale-aware network with modality-awareness for RGB-D indoor semantic segmentation. Neurocomputing 492:464–473

    Google Scholar 

  58. Lian G, Wang Y, Qin H, Chen G (2022) Towards unified on-road object detection and depth estimation from a single image. Int J Mach Learn Cybern 13(5):1231–1241

    Google Scholar 

  59. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: 31st meeting of the IEEE/CVF conference on computer vision and pattern recognition. CVPR 2018, Salt Lake City, pp 4510–4520

  60. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 30th IEEE conference on computer vision and pattern recognition. CVPR 2017, Honolulu, pp 1251–1258

  61. Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks. In: 30th IEEE conference on computer vision and pattern recognition. CVPR 2017, Honolulu, pp 472–480

  62. Hao Z, Qi X (2022) End-to-end concrete appearance analysis based on pixel-wise semantic segmentation and cie lab. Cem Concr Res 161:106926

    Google Scholar 

  63. Qi X, Liao R, Jia J, Fidler S, Urtasun R (2017) 3d graph neural networks for rgbd semantic segmentation. In: 16th IEEE international conference on computer vision. ICCV 2017, Venice, pp 5199–5208

  64. Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: 15th European conference on computer vision. ECCV 2018, Munich, pp 135–150

  65. Lee S, Park S-J, Hong K-S (2017) RDFNet: RGB-D multi-level residual feature fusion for indoor semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, vol 2017. Venice, pp 4990–4999

  66. Zhou L, Cui Z, Xu C, Zhang Z, Wang C, Zhang T, Yang J (2020) Pattern-structure diffusion for multi-task learning. In: 2020 IEEE/CVF conference on computer vision and pattern recognition. CVPR 2020, Virtual, pp 4514–4523

  67. Chen L-Z, Lin Z, Wang Z, Yang Y-L, Cheng M-M (2021) Spatial information guided convolution for real-time rgbd semantic segmentation. IEEE Trans Image Process 30:2313–2324

    Google Scholar 

  68. Seichter D, Fischedick SB, Köhler M, Groß H-M (2022) Efficient multi-task rgb-d scene analysis for indoor environments. In: 2022 International joint conference on neural networks. IJCNN 2022, Padua, pp 1–10

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (61873086) and the Science and Technology Support Program of Changzhou (CE20215022).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianjun Ni.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ni, J., Zhang, Z., Shen, K. et al. An improved deep network-based RGB-D semantic segmentation method for indoor scenes. Int. J. Mach. Learn. & Cyber. 15, 589–604 (2024). https://doi.org/10.1007/s13042-023-01927-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01927-1

Keywords