CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

	CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
	Xu, Jingwei1 ; Zhao, Zibo 1; Wang, Chenyu 2; Liu, Wen 3; Ma, Yi 4; Gao, Shenghua 4
	2024-11-07
发表期刊	IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI)
ISSN	0162-8828
发表状态	已投递待接收
摘要	This paper aims to design a unified Computer-Aided Design (CAD) generation system that can easily generate CAD models based on the user's inputs in the form of textual description, images, point clouds, or even a combination of them. Towards this goal, we introduce the CAD-MLLM, the first system capable of generating parametric CAD models conditioned on the multimodal input. Specifically, within the CAD-MLLM framework, we leverage the command sequences of CAD models and then employ advanced large language models (LLMs) to align the feature space across these diverse multi-modalities data and CAD models' vectorized representations. To facilitate the model training, we design a comprehensive data construction and annotation pipeline that equips each CAD model with corresponding multimodal data. Our resulting dataset, named Omni-CAD, is the first multimodal CAD dataset that contains textual description, multi-view images, points, and command sequence for each CAD model. It contains approximately 450K instances and their CAD construction sequences. To thoroughly evaluate the quality of our generated CAD models, we go beyond current evaluation metrics that focus on reconstruction quality by introducing additional metrics that assess topology quality and surface enclosure extent. Extensive experimental results demonstrate that CAD-MLLM significantly outperforms existing conditional generative methods and remains highly robust to noises and missing points. The project page and more visualizations can be found at: https://cad-mllm.github.io/
关键词	Computer-Aided Design Models Multimodal Large Language Models Multimodality Data
URL	查看原文
WOS类目	Computer Science, Software Engineering
WOS记录号	PPRN:119070022
文献类型	期刊论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/458352
专题	信息科学与技术学院信息科学与技术学院_硕士生
共同第一作者	Zhao, Zibo; Wang, Chenyu
通讯作者	Gao, Shenghua
作者单位	1.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China 2.Transcengram, Brookline, MA, USA 3.DeepSeek AI, Hangzhou, Peoples R China 4.Univ Hong Kong, Hong Kong, Peoples R China
第一作者单位	信息科学与技术学院
第一作者的第一单位	信息科学与技术学院
推荐引用方式 GB/T 7714	Xu, Jingwei,Zhao, Zibo,Wang, Chenyu,et al. CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI),2024.
APA	Xu, Jingwei,Zhao, Zibo,Wang, Chenyu,Liu, Wen,Ma, Yi,&Gao, Shenghua.(2024).CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI).
MLA	Xu, Jingwei,et al."CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM".IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI) (2024).