ShanghaiTech University Knowledge Management System
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM | |
2024-11-07 | |
发表期刊 | IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI)
![]() |
ISSN | 0162-8828 |
发表状态 | 已投递待接收 |
摘要 | This paper aims to design a unified Computer-Aided Design (CAD) generation system that can easily generate CAD models based on the user's inputs in the form of textual description, images, point clouds, or even a combination of them. Towards this goal, we introduce the CAD-MLLM, the first system capable of generating parametric CAD models conditioned on the multimodal input. Specifically, within the CAD-MLLM framework, we leverage the command sequences of CAD models and then employ advanced large language models (LLMs) to align the feature space across these diverse multi-modalities data and CAD models' vectorized representations. To facilitate the model training, we design a comprehensive data construction and annotation pipeline that equips each CAD model with corresponding multimodal data. Our resulting dataset, named Omni-CAD, is the first multimodal CAD dataset that contains textual description, multi-view images, points, and command sequence for each CAD model. It contains approximately 450K instances and their CAD construction sequences. To thoroughly evaluate the quality of our generated CAD models, we go beyond current evaluation metrics that focus on reconstruction quality by introducing additional metrics that assess topology quality and surface enclosure extent. Extensive experimental results demonstrate that CAD-MLLM significantly outperforms existing conditional generative methods and remains highly robust to noises and missing points. The project page and more visualizations can be found at: https://cad-mllm.github.io/ |
关键词 | Computer-Aided Design Models Multimodal Large Language Models Multimodality Data |
URL | 查看原文 |
WOS类目 | Computer Science, Software Engineering |
WOS记录号 | PPRN:119070022 |
文献类型 | 期刊论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/458352 |
专题 | 信息科学与技术学院 信息科学与技术学院_硕士生 |
共同第一作者 | Zhao, Zibo; Wang, Chenyu |
通讯作者 | Gao, Shenghua |
作者单位 | 1.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China 2.Transcengram, Brookline, MA, USA 3.DeepSeek AI, Hangzhou, Peoples R China 4.Univ Hong Kong, Hong Kong, Peoples R China |
第一作者单位 | 信息科学与技术学院 |
第一作者的第一单位 | 信息科学与技术学院 |
推荐引用方式 GB/T 7714 | Xu, Jingwei,Zhao, Zibo,Wang, Chenyu,et al. CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI),2024. |
APA | Xu, Jingwei,Zhao, Zibo,Wang, Chenyu,Liu, Wen,Ma, Yi,&Gao, Shenghua.(2024).CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI). |
MLA | Xu, Jingwei,et al."CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM".IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI) (2024). |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。