NEWS:🔥RG-SAN is accepted at NeurIPS 2024 (Oral)!🔥
We invite you to explore our series of works, including 3D-STMN (AAAI 2024) and 3D-GRES (MM 2024 Oral).
Changli Wu, Qi Chen, Jiayi Ji, Haowei Wang, Yiwei Ma, You Huang, Gen Luo, Hao Fei, Xiaoshuai Sun, Rongrong Ji
Framework:
3D Referring Expression Segmentation (3D-RES) aims to segment 3D objects by correlating referring expressions with point clouds. However, traditional approaches frequently encounter issues like over-segmentation or mis-segmentation, due to insufficient emphasis on spatial information of instances. In this paper, we introduce a Rule-Guided Spatial Awareness Network (RG-SAN) by utilizing solely the spatial information of the target instance for supervision. This approach enables the network to accurately depict the spatial relationships among all entities described in the text, thus enhancing the reasoning capabilities. The RG-SAN consists of the Text-driven Localization Module (TLM) and the Rule-guided Weak Supervision (RWS) strategy. The TLM initially locates all mentioned instances and iteratively refines their positional information. %This step ensures that each entity's location is continuously improved. The RWS strategy, acknowledging that only target objects have supervised positional information, employs dependency tree rules to precisely guide the core instance's positioning. Extensive testing on the ScanRefer benchmark has shown that RG-SAN not only establishes new performance benchmarks, with an mIoU increase of 5.1 points, but also exhibits significant improvements in robustness when processing descriptions with spatial ambiguity.
Requirements
- Python 3.7 or higher
- Pytorch 1.12
- CUDA 11.3 or higher
The following installation suppose python=3.8
pytorch=1.12.1
and cuda=11.3
.
-
Create a conda virtual environment
conda create -n rg-san python=3.8 conda activate rg-san
-
Clone this repository
-
Install the dependencies
Install Pytorch 1.12.1
pip install spconv-cu113 pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl # please check the versions in the website pip install -r requirements.txt
Install segmentator from this repo (We wrap the segmentator in ScanNet).
Install Stanford CoreNLP toolkit from the official website.
-
Setup, Install rg_san, pointgroup_ops and attention_rpe_ops.
sudo apt-get install libsparsehash-dev python setup.py develop # install pointgroup_ops cd rg_san/lib/ && python setup.py develop && cd ../../ # install attention_rpe_ops cd lib/attention_rpe_ops && python setup.py install && cd ../../
Download the ScanNet v2 dataset.
Put the downloaded scans
folder as follows.
RG-SAN
├── data
│ ├── scannetv2
│ │ ├── scans
Split and preprocess point cloud data
cd data/scannetv2
bash prepare_data.sh
The script data into train/val folder and preprocess the data. After running the script the scannet dataset structure should look like below.
RG-SAN
├── data
│ ├── scannetv2
│ │ ├── scans
│ │ ├── train
│ │ ├── val
Download ScanRefer annotations following the instructions.
Put the downloaded ScanRefer
folder as follows.
RG-SAN
├── data
│ ├── ScanRefer
│ │ ├── ScanRefer_filtered_train.json
│ │ ├── ScanRefer_filtered_val.json
Preprocess textual data
python data/features/save_graph.py --split train --data_root data/ --max_len 78
python data/features/save_graph.py --split val --data_root data/ --max_len 78
Download 3D U-Net pretrained weights from 3D-STMN.
Move the pretrained model to backbones.
mkdir backbones
mv ${Download_PATH}/sp_unet_backbone.pth backbones/
bash scripts/train.sh
You can download our checkpoint to reproduce the performance:
bash scripts/test.sh
If you find this work useful in your research, please cite:
@misc{2412.02402,
Author = {Changli Wu and Qi Chen and Jiayi Ji and Haowei Wang and Yiwei Ma and You Huang and Gen Luo and Hao Fei and Xiaoshuai Sun and Rongrong Ji},
Title = {RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation},
Year = {2024},
Eprint = {arXiv:2412.02402},
}
Sincerely thanks for 3D-STMN, SSTNet and SPFormer repos. This repo is build upon them.