This paper focuses on the problem of RGB-D semantic segmentation for indoor scenes. We introduce a novel gravity direction detection method based on vertical lines fitting combined 2D vision information and 3D geometric information to improve the original HHA depth encoding. Then to fuse two-stream networks of deep convolutional networks from RGB and depth encoding, we propose a joint modelling method by learning a weighted summing layer to fuse the prediction results. Finally, to refine the pixel-wise score maps, we adopt fully-connected CRF as a post-processing and propose a pairwise potential function combined normal kernel to explore geometric information. Experimental results show our proposed approach achieves state-of-the-art performance of RGB-D semantic segmentation on public dataset.

Liu, H., Wu, W., Wang, X. et al. RGB-D joint modelling with scene geometric information for indoor semantic segmentation. Multimed Tools Appl 77, 22475–22488 (2018). https://doi.org/10.1007/s11042-018-6056-8
DOI: https://doi.org/10.1007/s11042-018-6056-8