RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

NEWS:🔥RG-SAN is accepted at NeurIPS 2024 (Oral)!🔥

We invite you to explore our series of works, including 3D-STMN (AAAI 2024) and 3D-GRES (MM 2024 Oral).

Changli Wu, Qi Chen, Jiayi Ji, Haowei Wang, Yiwei Ma, You Huang, Gen Luo, Hao Fei, Xiaoshuai Sun, Rongrong Ji

Framework:

Introduction

3D Referring Expression Segmentation (3D-RES) aims to segment 3D objects by correlating referring expressions with point clouds. However, traditional approaches frequently encounter issues like over-segmentation or mis-segmentation, due to insufficient emphasis on spatial information of instances. In this paper, we introduce a Rule-Guided Spatial Awareness Network (RG-SAN) by utilizing solely the spatial information of the target instance for supervision. This approach enables the network to accurately depict the spatial relationships among all entities described in the text, thus enhancing the reasoning capabilities. The RG-SAN consists of the Text-driven Localization Module (TLM) and the Rule-guided Weak Supervision (RWS) strategy. The TLM initially locates all mentioned instances and iteratively refines their positional information. %This step ensures that each entity's location is continuously improved. The RWS strategy, acknowledging that only target objects have supervised positional information, employs dependency tree rules to precisely guide the core instance's positioning. Extensive testing on the ScanRefer benchmark has shown that RG-SAN not only establishes new performance benchmarks, with an mIoU increase of 5.1 points, but also exhibits significant improvements in robustness when processing descriptions with spatial ambiguity.

Installation

Requirements

Python 3.7 or higher
Pytorch 1.12
CUDA 11.3 or higher

The following installation suppose python=3.8 pytorch=1.12.1 and cuda=11.3.

Create a conda virtual environment

conda create -n rg-san python=3.8
conda activate rg-san

Clone this repository

Install the dependencies

Install Pytorch 1.12.1

pip install spconv-cu113
pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl # please check the versions in the website
pip install -r requirements.txt

Install segmentator from this repo (We wrap the segmentator in ScanNet).

Install Stanford CoreNLP toolkit from the official website.

Setup, Install rg_san, pointgroup_ops and attention_rpe_ops.

sudo apt-get install libsparsehash-dev
python setup.py develop
# install pointgroup_ops
cd rg_san/lib/ && python setup.py develop && cd ../../
# install attention_rpe_ops
cd lib/attention_rpe_ops && python setup.py install && cd ../../

Data Preparation

ScanNet v2 dataset

Download the ScanNet v2 dataset.

Put the downloaded scans folder as follows.

RG-SAN
├── data
│   ├── scannetv2
│   │   ├── scans

Split and preprocess point cloud data

cd data/scannetv2
bash prepare_data.sh

The script data into train/val folder and preprocess the data. After running the script the scannet dataset structure should look like below.

RG-SAN
├── data
│   ├── scannetv2
│   │   ├── scans
│   │   ├── train
│   │   ├── val

ScanRefer dataset

Download ScanRefer annotations following the instructions.

Put the downloaded ScanRefer folder as follows.

RG-SAN
├── data
│   ├── ScanRefer
│   │   ├── ScanRefer_filtered_train.json
│   │   ├── ScanRefer_filtered_val.json

Preprocess textual data

python data/features/save_graph.py --split train --data_root data/ --max_len 78
python data/features/save_graph.py --split val --data_root data/ --max_len 78

Pretrained Backbone

Download 3D U-Net pretrained weights from 3D-STMN.

Move the pretrained model to backbones.

mkdir backbones
mv ${Download_PATH}/sp_unet_backbone.pth backbones/

Training

bash scripts/train.sh

Inference

You can download our checkpoint to reproduce the performance:

bash scripts/test.sh

Citation

If you find this work useful in your research, please cite:

@misc{2412.02402,
Author = {Changli Wu and Qi Chen and Jiayi Ji and Haowei Wang and Yiwei Ma and You Huang and Gen Luo and Hao Fei and Xiaoshuai Sun and Rongrong Ji},
Title = {RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation},
Year = {2024},
Eprint = {arXiv:2412.02402},
}

Ancknowledgement

Sincerely thanks for 3D-STMN, SSTNet and SPFormer repos. This repo is build upon them.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
configs		configs
data		data
docs		docs
lib/attention_rpe_ops		lib/attention_rpe_ops
rg_san		rg_san
scripts		scripts
tools		tools
.gitignore		.gitignore
README.md		README.md
mapping_full2rio27.json		mapping_full2rio27.json
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Introduction

Installation

Data Preparation

ScanNet v2 dataset

ScanRefer dataset

Pretrained Backbone

Training

Inference

Citation

Ancknowledgement

About

Releases

Packages

Languages

sosppxo/RG-SAN

Folders and files

Latest commit

History

Repository files navigation

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Introduction

Installation

Data Preparation

ScanNet v2 dataset

ScanRefer dataset

Pretrained Backbone

Training

Inference

Citation

Ancknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages