Skip to content

OpenRobotLab/Aether

Repository files navigation

image

Aether: Geometric-Aware Unified World Modeling

       

Aether addresses a fundamental challenge in AI: integrating geometric reconstruction with generative modeling for human-like spatial reasoning. Our framework unifies three core capabilities: (1) 🌏 4D dynamic reconstruction, (2) 🎬 action-conditioned video prediction, and (3) 🎯 goal-conditioned visual planning. Trained entirely on synthetic data, Aether achieves strong zero-shot generalization to real-world scenarios.

Teaser

🥳 NEWS:

  • Mar.31st 2025: The Gradio demo is available! You can deploy locally or experience Aether online on Hugging Face.

  • Mar.28th 2025: AetherV1 is released! Model checkpoints, paper, website, and inference code are all available.

🔨 Installation

Note: We recommend using virtual environments such as Anaconda.

# clone project
git clone https://github.com/OpenRobotLab/Aether.git
cd Aether

# create conda environment
conda create -n aether python=3.10
conda activate aether

# install dependencies
pip install -r requirements.txt

🚀 Inference

Run inference demo locally

  • 4D reconstruction:

    python scripts/demo.py --task reconstruction --video ./assets/example_videos/moviegen.mp4
  • Action-conditioned video prediction:

    python scripts/demo.py --task prediction --image ./assets/example_obs/car.png --raymap_action assets/example_raymaps/raymap_forward_right.npy
  • Goal-conditioned visual planning:

    python scripts/demo.py --task planning --image ./assets/example_obs_goal/01_obs.png --goal ./assets/example_obs_goal/01_goal.png

Results will be saved in ./outputs/ by default.

Run inference demo with Gradio

The Gradio demo provides an interactive web-based Aether experience.

python scripts/demo_gradio.py

Our local testing environment is deployed using an A100 GPU with 80GB of memory, and it is set to run on the local port 7860 by default.

📝 Citation

If you find this work useful in your research, please consider citing:

@article{aether,
  title     = {Aether: Geometric-Aware Unified World Modeling},
  author    = {Aether Team and Haoyi Zhu and Yifan Wang and Jianjun Zhou and Wenzheng Chang and Yang Zhou and Zizun Li and Junyi Chen and Chunhua Shen and Jiangmiao Pang and Tong He},
  journal   = {arXiv preprint arXiv:2503.18945},
  year      = {2025}
}

💡 Limitations

Aether represents an initial step in our journey, trained entirely on synthetic data. While it demonstrates promising capabilities, it is important to be aware of its current limitations:

  • 🔄 Aether struggles with highly dynamic scenarios, such as those involving significant motion or dense crowds.
  • 📸 Its camera pose estimation can be less stable in certain conditions.
  • 📐 For visual planning tasks, we recommend keeping the observations and goals relatively close to ensure optimal performance.

We are actively working on the next generation of Aether and are committed to addressing these limitations in future releases.

📚 License

This repository is licensed under the MIT License - see the LICENSE file for details. For any questions, please email to tonghe90[at]gmail[dot]com.

✨ Acknowledgements

Our work is primarily built upon Accelerate, Diffusers, CogVideoX, Finetrainers, DepthAnyVideo, CUT3R, MonST3R, VBench, GST, SPA, DroidCalib, Grounded-SAM-2, ceres-solver, etc. We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.

About

Aether: Geometric-Aware Unified World Modeling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages