Abstract
This paper focuses on the problem of RGB-D semantic segmentation for indoor scenes. We introduce a novel gravity direction detection method based on vertical lines fitting combined 2D vision information and 3D geometric information to improve the original HHA depth encoding. Then to fuse two-stream networks of deep convolutional networks from RGB and depth encoding, we propose a joint modelling method by learning a weighted summing layer to fuse the prediction results. Finally, to refine the pixel-wise score maps, we adopt fully-connected CRF as a post-processing and propose a pairwise potential function combined normal kernel to explore geometric information. Experimental results show our proposed approach achieves state-of-the-art performance of RGB-D semantic segmentation on public dataset.





Similar content being viewed by others
References
Anand A, Koppula HS, Joachims T, Saxena A (2013) Contextually guided semantic labeling and search for three-dimensional point clouds. Int J Robot Res 32(1):19–34
Banica D, Sminchisescu C (2015) Second-order constrained parametric proposals and sequential search-based structured prediction for semantic segmentation in rgb-d images. In: Computer Vision and Pattern Recognition
Bingjie W, Junpeng Z, Chunjie W (2014) Spatial straightness error evaluation based on three-dimensional least squares method. Journal of Beijing University of Aeronautics and Astronautics 40:1477–1480 (in Chinese)
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. Comp Sci 357–361. https://arxiv.org/abs/1412.7062
Couprie C, Farabet C, Najman L, LeCun Y (2013) Indoor semantic segmentation using depth information. In: international conference on learning Representa- tions. Number arXiv preprint arXiv:1301.3572
Deng Z, Todorovic S, Latecki L J (2015) Semantic segmentation of rgbd images with mutex constraints. In: ICCV
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929
Filliat D, Battesti E, Bazeille S, et al (2012) RGBD object recognition and visual texture classification for indoor semantic mapping. Technologies for Practical Robot Applications (TePRA), 2012 I.E. International Conference on IEEE, pp. 127–132
Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from rgb-d images. In: CVPR. 564–571
Gupta S, Girshick R, Arbelaez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: ECCV
He Y, Chiu WC, Keuper M, Fritz M (2017) Std2p: rgbd semantic segmentation using spatio-temporal data-driven pooling. In CVPR, 7158–7167
Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi- supervised semantic segmentation. NIPS 2015
Khan S, Bennamoun M, Sohel F, Togneri R (2014) Geometry driven semantic labeling of indoor scenes. ECCV 2014 8689:679–694
Koppula H S, Anand A, Joachims T, et al (2011) Semantic labeling of 3D point clouds for indoor scenes. International Conference on Neural Information Processing Systems. Curran Associates Inc, pp. 244–252
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In NIPS
Li Z, Gan Y, Liang X, et al (2016) LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling. In: European Conference on Computer Vision. Springer International Publishing, 541–557
Liu F, Lin G, Shen C (2016) Discriminative Training of Deep Fully-connected Continuous CRF with Task-specific Loss. arXiv preprint arXiv:1601.07649
Long J, Shelhamer E, and Darrell T (2015) Fully convolutional networks for semantic segmentation, In CVPR, pp. 3431–3440
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmen- tation. arXiv preprint arXiv:1505.04366
Ren X, Bo L, Fox D (2012) Rgb-(d) scene labeling: features and algorithms. In: CVPR 2759–2766
Shuai B, Zuo Z, Wang B, et al (2016) DAG-recurrent neural networks for scene labeling. In: Computer Vision and Pattern Recognition. IEEE, pp. 3620–3629
Shuai B, Zuo Z, Wang G, Wang B (2016) Scene parsing with integration of parametric and non-parametric models. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 25(5):2379–2391
Silberman N, Fergus R (2011) Indoor scene segmentation using a structured light sensor. In: ICCV Workshops 601–608
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: ECCV, pp. 746–760
Simonyan K and Zisserman A (2014) Very deep convolu- tional networks for large-scale image recognition. CoRR, abs/1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, and Rabinovich A (2014) Going deeper with convolutions. CoRR, abs/1409.4842
Wang J, Wang Z, Tao D, et al (2016) Learning common and specific features for rgb-d semantic segmentation with deconvolutional networks. In: European Conference on Computer Vision. Springer International Publishing, pp. 664–679
Acknowledgments
This work is supported in part by Beijing Natural Science Foundation: 4142051.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, H., Wu, W., Wang, X. et al. RGB-D joint modelling with scene geometric information for indoor semantic segmentation. Multimed Tools Appl 77, 22475–22488 (2018). https://doi.org/10.1007/s11042-018-6056-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6056-8