| |||||||
ShanghaiTech University Knowledge Management System
Self-learning Canonical Space for Multi-view 3D Human Pose Estimation | |
2024-03-29 | |
状态 | 已发表 |
摘要 | Multi -view 3D human pose estimation is naturally superior to single view one, benefiting from more comprehensive information provided by images of multiple views. The information includes camera poses, 2D/3D human poses, and 3D geometry. However, the accurate annotation of these information is hard to obtain, making it challenging to predict accurate 3D human pose from multi -view images. To deal with this issue, we propose a fully self -supervised framework, named cascaded multi -view aggregating network (CMANet), to construct a canonical parameter space to holistically integrate and exploit multi -view information. In our framework, the multi -view information is grouped into two categories: 1) intra-view information (i.e., camera pose, projected 2D human pose, view -dependent 3D human pose), 2) inter -view information (i.e., cross -view complement and 3D geometry constraint). Accordingly, CMANet consists of two components: intra-view module (IRV) and interview module (IEV). IRV is used for extracting initial camera pose and 3D human pose of each view; IEV is to fuse complementary pose information and cross -view 3D geometry for a final 3D human pose. To facilitate the aggregation of the intra- and inter -view, we define a canonical parameter space, depicted by per -view camera pose and human pose and shape parameters (θ and β) of SMPL model, and propose a two -stage learning procedure. At first stage, IRV learns to estimate camera pose and view -dependent 3D human pose supervised by confident output of an off -the -shelf 2D keypoint detector. At second stage, IRV is frozen and IEV further refines the camera pose and optimizes the 3D human pose by implicitly encoding the cross -view complement and 3D geometry constraint, achieved by jointly fitting predicted multi -view 2D keypoints. The proposed framework, modules, and learning strategy are demonstrated to be effective by comprehensive experiments and CMANet is superior to state-of-the-art methods in extensive quantitative and qualitative analysis. |
关键词 | Human Pose Estimation Multi-view Self-learning |
DOI | arXiv:2403.12440 |
相关网址 | 查看原文 |
出处 | Arxiv |
WOS记录号 | PPRN:88240541 |
WOS类目 | Computer Science, Software Engineering |
文献类型 | 预印本 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/372939 |
专题 | 信息科学与技术学院_硕士生 生物医学工程学院_PI研究组_沈定刚组 |
通讯作者 | Yang, Fan |
作者单位 | 1.ShanghaiTech Univ, Shanghai, Peoples R China 2.United Imaging Intelligence, Shanghai, Peoples R China |
推荐引用方式 GB/T 7714 | Li, Xiaoben,Meng, Mancheng,Wu, Ziyan,et al. Self-learning Canonical Space for Multi-view 3D Human Pose Estimation. 2024. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。