消息
×
loading..
Self-learning Canonical Space for Multi-view 3D Human Pose Estimation
2024-03-29
状态已发表
摘要

Multi -view 3D human pose estimation is naturally superior to single view one, benefiting from more comprehensive information provided by images of multiple views. The information includes camera poses, 2D/3D human poses, and 3D geometry. However, the accurate annotation of these information is hard to obtain, making it challenging to predict accurate 3D human pose from multi -view images. To deal with this issue, we propose a fully self -supervised framework, named cascaded multi -view aggregating network (CMANet), to construct a canonical parameter space to holistically integrate and exploit multi -view information. In our framework, the multi -view information is grouped into two categories: 1) intra-view information (i.e., camera pose, projected 2D human pose, view -dependent 3D human pose), 2) inter -view information (i.e., cross -view complement and 3D geometry constraint). Accordingly, CMANet consists of two components: intra-view module (IRV) and interview module (IEV). IRV is used for extracting initial camera pose and 3D human pose of each view; IEV is to fuse complementary pose information and cross -view 3D geometry for a final 3D human pose. To facilitate the aggregation of the intra- and inter -view, we define a canonical parameter space, depicted by per -view camera pose and human pose and shape parameters (θ and β) of SMPL model, and propose a two -stage learning procedure. At first stage, IRV learns to estimate camera pose and view -dependent 3D human pose supervised by confident output of an off -the -shelf 2D keypoint detector. At second stage, IRV is frozen and IEV further refines the camera pose and optimizes the 3D human pose by implicitly encoding the cross -view complement and 3D geometry constraint, achieved by jointly fitting predicted multi -view 2D keypoints. The proposed framework, modules, and learning strategy are demonstrated to be effective by comprehensive experiments and CMANet is superior to state-of-the-art methods in extensive quantitative and qualitative analysis.

关键词Human Pose Estimation Multi-view Self-learning
DOIarXiv:2403.12440
相关网址查看原文
出处Arxiv
WOS记录号PPRN:88240541
WOS类目Computer Science, Software Engineering
文献类型预印本
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/372939
专题信息科学与技术学院_硕士生
生物医学工程学院_PI研究组_沈定刚组
通讯作者Yang, Fan
作者单位
1.ShanghaiTech Univ, Shanghai, Peoples R China
2.United Imaging Intelligence, Shanghai, Peoples R China
推荐引用方式
GB/T 7714
Li, Xiaoben,Meng, Mancheng,Wu, Ziyan,et al. Self-learning Canonical Space for Multi-view 3D Human Pose Estimation. 2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Li, Xiaoben]的文章
[Meng, Mancheng]的文章
[Wu, Ziyan]的文章
百度学术
百度学术中相似的文章
[Li, Xiaoben]的文章
[Meng, Mancheng]的文章
[Wu, Ziyan]的文章
必应学术
必应学术中相似的文章
[Li, Xiaoben]的文章
[Meng, Mancheng]的文章
[Wu, Ziyan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。