Skip to content

[ICLR 2025] Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving

Notifications You must be signed in to change notification settings

getterupper/PreWorld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PreWorld

Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving [paper]

ICLR 2025

TODO

  • Initial commit
  • Model zoo
  • arXiv version
  • Code for nuPlan dataset (3D Occupancy Prediction)
  • Code for LightWheelOcc dataset (3D Occupancy Prediction)

Introduction

Understanding world dynamics is crucial for planning in autonomous driving. Recent methods attempt to achieve this by learning a 3D occupancy world model that forecasts future surrounding scenes based on current observation. However, 3D occupancy labels are still required to produce promising results. Considering the high annotation cost for 3D outdoor scenes, we propose a semi-supervised vision-centric 3D occupancy world model, PreWorld, to leverage the potential of 2D labels through a novel two-stage training paradigm: the self-supervised pre-training stage and the fully-supervised fine-tuning stage. Specifically, during the pre-training stage, we utilize an attribute projection head to generate different attribute fields of a scene (e.g., RGB, density, semantic), thus enabling temporal supervision from 2D labels via volume rendering techniques. Furthermore, we introduce a simple yet effective state-conditioned forecasting module to recursively forecast future occupancy and ego trajectory in a direct manner. Extensive experiments on the nuScenes dataset validate the effectiveness and scalability of our method, and demonstrate that PreWorld achieves competitive performance across 3D occupancy prediction, 4D occupancy forecasting and motion planning tasks.

Getting Started

Model Zoo

3D Occupancy Prediction (on Occ3D-nuScenes Benchmark)

Method mIoU Config Checkpoints
PreWorld (+ Pre-training) 34.69 config model

4D Occupancy Forecasting (on Occ3D-nuScenes Benchmark)

Method Avg mIoU Config Checkpoints
PreWorld (+ Pre-training) 9.55 config model

Coming soon... 🏗️ 🚧 🔨

3D Occupancy Prediction (on OpenScene Benchmark)

Method mIoU Config Checkpoints
PreWorld (+ nuPlan Pre-training, 15000 scenes) 19.85 config model

Acknowledgement

Many thanks to these excellent open source projects:

Bibtex

If you find this work useful, please consider citing:

@article{li2025semi,
  title={Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving},
  author={Li, Xiang and Li, Pengfei and Zheng, Yupeng and Sun, Wei and Wang, Yan and Chen, Yilun},
  journal={arXiv preprint arXiv:2502.07309},
  year={2025}
}

About

[ICLR 2025] Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published