Skip to content

This is the project for 'Any2Caption', Interpreting Any Condition to Caption for Controllable Video Generation

Notifications You must be signed in to change notification settings

ChocoWu/Any2Caption

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation

TL;DR

We present Any2Caption, a novel framework for controllable video generation from any condition. The key idea is decoupling various condition interpretation steps from the video synthesis step. By leveraging modern multimodal large language models (MLLMs), Any2Caption interprets diverse inputs—text, images, videos, and specialized cues such as region, motion, and camera poses—into dense, structured captions that offer backbone video generators with better guidance.

framework

Code

Stay Tuned.

Citation

If you find Any2Caotion is useful and use it in your project, please kindly cite:

@inproceedings{wu2025Any2Caption,
    title={Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation},
    author={Shengqiong Wu and Weicai Ye and Jiahao Wang and Quande Liu and Xintao Wang and Pengfei Wan and Di Zhang and Kun Gai and Shuicheng Yan and Hao Fei and Tat-Seng Chua},
    booktitle={arxiv},
    year={2025}
}

Releases

No releases published

Packages

No packages published