Skip to content

Latest commit

 

History

History

PATH

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

HumanBench:
Towards General Human-centric Perception with Projector Assisted Pretraining

Shixiang Tang1,4*, Cheng Chen4*, Qingsong Xie4, Meilin Chen2,4, Yizhou Wang2,4, Yuanzheng Ci1, Lei Bai3, Feng Zhu4, Haiyang Yang4, Li Yi4, Rui Zhao4,5, Wanli Ouyang3

1The University of Sydney; 2Zhejiang University; 3Shanghai Artifical Laboratory; 4SenseTime Research; 5Qing Yuan Research Institute, Shanghai Jiao Tong University

CVPR 2023




Human-centric perceptions include a variety of vision tasks, which have widespread industrial applications, including surveillance, autonomous driving, and the metaverse. It is desirable to have a general pretrain model for versatile human-centric downstream tasks. This paper forges ahead along this path from the aspects of both benchmark and pretraining methods. Specifically, we propose a HumanBench based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting. To learn both coarse-grained and fine-grained knowledge in human bodies, we further propose a Projector AssisTed Hierarchical pretraining method (PATH) to learn diverse knowledge at different granularity levels. Comprehensive evaluations on HumanBench show that our PATH achieves new state-of-the-art results on 17 downstream datasets and on-par results on the other 2 datasets.

[Paper]

Hightlights

$\text{\color{#2F6EBA}{A\ Large-scale\ and\ Diverse\ Human-Centric\ Benchmark}}$

  • collected 11,019,187 pretraining images from 37 datasets among 5 tasks from global to local tasks.
  • constructed 19 evaluation datasets from 6 tasks.
  • 3 evaluation protocols to assess the generalization ability of pretrained models: in-datasets evaluation, out-of-datasets evaluation, unseen-tasks evaluation.

$\text{\color{#2F6EBA}{A\ Projector\ Assisted\ Pretraining\ Method}}$

  • Designed a Task-specific MLP Projector to Enhance Generalization Ability of Supervised Pretraining.
  • Designed Hierarchical Weight Sharing Strategy to Reduce Task Conflicts.

$\text{\color{#2F6EBA}{Push\ the\ Limits\ of\ Human-Centric\ Tasks}}$

  • Higher Performance than States-of-the-art Methods on 17 Datasets and On-par Performance than States-of-the-art Methods on 2 Datasets.
  • Even the Tasks do NOT Exist in the Training Data.

Demo

Watch the video

Installation

See installation instructions.

Data

See data instructions.

We also provide a small training config, with 10% samples of the whole pretraining dataset.

Training

Download pre-trained MAE ViT-Large model from here and place the MAE pretrained weight mae_pretrain_vit_base.pth under core/models/backbones/pretrain_weights folder.

## train ViT-B
cd experiments/L2_full_setting_joint_v100_32g
sh train.sh

## train ViT-L
cd experiments/L2_full_setting_vit_large_a100_80g
sh train.sh

Evaluation

See evaluation instructions.

A pre-trained PATH-ViT-B is available at 🤗 hugging face and A pre-trained PATH-ViT-L is availabel at 🤗 hugging face. The results on various tasks are summarized below:

Project Release

  • Hugging Face Release
  • Detailed and Convinent Methods for Data Preparation.
  • PATH-B finetune configs
  • PATH-B/L HumanBench pretrained models
  • PATH Pretraining Code

Citation

@article{tang2023humanbench,
  title={HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining},
  author={Tang, Shixiang and Chen, Cheng and Xie, Qingsong and Chen, Meilin and Wang, Yizhou and Ci, Yuanzheng and Bai, Lei and Zhu, Feng and Yang, Haiyang and Yi, Li and others},
  journal={arXiv preprint arXiv:2303.05675},
  year={2023}
}

Acknowledgement

MAE, Mask2Former, bts, mmcv, mmdetetection, mmpose.

Contact

We are hiring at all levels at 2d-3d Human-Centric Foundation Model Team, including full-time researchers, engineers and interns. If you are interested in working with us on human-centric foundation model and human-centric AIGC driven by foundation model, please contact Shixiang Tang.