Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
2023-09-25
会议录名称ARXIV
发表状态已发表
DOIarXiv:2302.14007
摘要

Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for both 2D and 3D computer vision. However, existing MAE-style methods can only learn from the data of a single modality, i.e., either images or point clouds, which neglect the implicit semantic and geometric correlation between 2D and 3D. In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training. Joint-MAE randomly masks an input 3D point cloud and its projected 2D images, and then reconstructs the masked information of the two modalities. For better cross-modal interaction, we construct our JointMAE by two hierarchical 2D-3D embedding modules, a joint encoder, and a joint decoder with modal-shared and model-specific decoders. On top of this, we further introduce two cross-modal strategies to boost the 3D representation learning, which are local-aligned attention mechanisms for 2D-3D semantic cues, and a cross-reconstruction loss for 2D-3D geometric constraints. By our pre-training paradigm, Joint-MAE achieves superior performance on multiple downstream tasks, e.g., 92.4% accuracy for linear SVM on ModelNet40 and 86.07% accuracy on the hardest split of ScanObjectNN.

会议名称32nd International Joint Conference on Artificial Intelligence (IJCAI)
出版地ALBERT-LUDWIGS UNIV FREIBURG GEORGES-KOHLER-ALLEE, INST INFORMATIK, GEB 052, FREIBURG, D-79110, GERMANY
会议地点null,Macao,PEOPLES R CHINA
会议日期AUG 19-25, 2023
URL查看原文
收录类别CPCI-S
语种英语
资助项目National Key R&D Program of China[
WOS研究方向Computer Science
WOS类目Computer Science, Software Engineering
WOS记录号PPRN:46089399
出版者IJCAI-INT JOINT CONF ARTIF INTELL
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/381389
专题信息科学与技术学院_博士生
通讯作者Guo, Ziyu
作者单位
1.Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
2.CUHK MMLab, Hong Kong, Peoples R China
3.Huazhong Univ Sci & Technol, Wuhan, Peoples R China
4.Chinese Univ Hong Kong, Inst Med Intelligence, Hong Kong, Peoples R China
5.ShanghaiTech Univ, Shanghai, Peoples R China
推荐引用方式
GB/T 7714
Guo, Ziyu,Zhang, Renrui,Qiu, Longtian,et al. Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training[C]. ALBERT-LUDWIGS UNIV FREIBURG GEORGES-KOHLER-ALLEE, INST INFORMATIK, GEB 052, FREIBURG, D-79110, GERMANY:IJCAI-INT JOINT CONF ARTIF INTELL,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Guo, Ziyu]的文章
[Zhang, Renrui]的文章
[Qiu, Longtian]的文章
百度学术
百度学术中相似的文章
[Guo, Ziyu]的文章
[Zhang, Renrui]的文章
[Qiu, Longtian]的文章
必应学术
必应学术中相似的文章
[Guo, Ziyu]的文章
[Zhang, Renrui]的文章
[Qiu, Longtian]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。