Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training

doi:arXiv:2302.14007

	Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
	Guo, Ziyu 1; Zhang, Renrui 2; Qiu, Longtian5 ; Li, Xianzhi 3; Heng, Pheng-Ann 1,4
	2023-09-25
会议录名称	ARXIV
发表状态	已发表
DOI	arXiv:2302.14007
摘要	Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for both 2D and 3D computer vision. However, existing MAE-style methods can only learn from the data of a single modality, i.e., either images or point clouds, which neglect the implicit semantic and geometric correlation between 2D and 3D. In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training. Joint-MAE randomly masks an input 3D point cloud and its projected 2D images, and then reconstructs the masked information of the two modalities. For better cross-modal interaction, we construct our JointMAE by two hierarchical 2D-3D embedding modules, a joint encoder, and a joint decoder with modal-shared and model-specific decoders. On top of this, we further introduce two cross-modal strategies to boost the 3D representation learning, which are local-aligned attention mechanisms for 2D-3D semantic cues, and a cross-reconstruction loss for 2D-3D geometric constraints. By our pre-training paradigm, Joint-MAE achieves superior performance on multiple downstream tasks, e.g., 92.4% accuracy for linear SVM on ModelNet40 and 86.07% accuracy on the hardest split of ScanObjectNN.
会议名称	32nd International Joint Conference on Artificial Intelligence (IJCAI)
出版地	ALBERT-LUDWIGS UNIV FREIBURG GEORGES-KOHLER-ALLEE, INST INFORMATIK, GEB 052, FREIBURG, D-79110, GERMANY
会议地点	null,Macao,PEOPLES R CHINA
会议日期	AUG 19-25, 2023
URL	查看原文
收录类别	CPCI-S
语种	英语
资助项目	National Key R&D Program of China[
WOS研究方向	Computer Science
WOS类目	Computer Science, Software Engineering
WOS记录号	PPRN:46089399
出版者	IJCAI-INT JOINT CONF ARTIF INTELL
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/381389
专题	信息科学与技术学院_博士生
通讯作者	Guo, Ziyu
作者单位	1.Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China 2.CUHK MMLab, Hong Kong, Peoples R China 3.Huazhong Univ Sci & Technol, Wuhan, Peoples R China 4.Chinese Univ Hong Kong, Inst Med Intelligence, Hong Kong, Peoples R China 5.ShanghaiTech Univ, Shanghai, Peoples R China
推荐引用方式 GB/T 7714	Guo, Ziyu,Zhang, Renrui,Qiu, Longtian,et al. Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training[C]. ALBERT-LUDWIGS UNIV FREIBURG GEORGES-KOHLER-ALLEE, INST INFORMATIK, GEB 052, FREIBURG, D-79110, GERMANY:IJCAI-INT JOINT CONF ARTIF INTELL,2023.