GitHub - ChocoWu/Any2Caption: This is the project for 'Any2Caption', Interpreting Any Condition to Caption for Controllable Video Generation

Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation

Project page: https://sqwu.top/Any2Cap/

TL;DR

We present Any2Caption, a novel framework for controllable video generation from any condition. The key idea is decoupling various condition interpretation steps from the video synthesis step. By leveraging modern multimodal large language models (MLLMs), Any2Caption interprets diverse inputs—text, images, videos, and specialized cues such as region, motion, and camera poses—into dense, structured captions that offer backbone video generators with better guidance.

Code

Stay Tuned.

Citation

If you find Any2Caotion is useful and use it in your project, please kindly cite:

@inproceedings{wu2025Any2Caption,
    title={Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation},
    author={Shengqiong Wu and Weicai Ye and Jiahao Wang and Quande Liu and Xintao Wang and Pengfei Wan and Di Zhang and Kun Gai and Shuicheng Yan and Hao Fei and Tat-Seng Chua},
    booktitle={arxiv},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation

Project page: https://sqwu.top/Any2Cap/

TL;DR

Code

Citation

About

Releases

Packages

ChocoWu/Any2Caption

Folders and files

Latest commit

History

Repository files navigation

Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation

Project page: https://sqwu.top/Any2Cap/

TL;DR

Code

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages