CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
2024-11-07
发表期刊IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI)
ISSN0162-8828
发表状态已投递待接收
摘要

This paper aims to design a unified Computer-Aided Design (CAD) generation system that can easily generate CAD models based on the user's inputs in the form of textual description, images, point clouds, or even a combination of them. Towards this goal, we introduce the CAD-MLLM, the first system capable of generating parametric CAD models conditioned on the multimodal input. Specifically, within the CAD-MLLM framework, we leverage the command sequences of CAD models and then employ advanced large language models (LLMs) to align the feature space across these diverse multi-modalities data and CAD models' vectorized representations. To facilitate the model training, we design a comprehensive data construction and annotation pipeline that equips each CAD model with corresponding multimodal data. Our resulting dataset, named Omni-CAD, is the first multimodal CAD dataset that contains textual description, multi-view images, points, and command sequence for each CAD model. It contains approximately 450K instances and their CAD construction sequences. To thoroughly evaluate the quality of our generated CAD models, we go beyond current evaluation metrics that focus on reconstruction quality by introducing additional metrics that assess topology quality and surface enclosure extent. Extensive experimental results demonstrate that CAD-MLLM significantly outperforms existing conditional generative methods and remains highly robust to noises and missing points. The project page and more visualizations can be found at: https://cad-mllm.github.io/

关键词Computer-Aided Design Models Multimodal Large Language Models Multimodality Data
URL查看原文
WOS类目Computer Science, Software Engineering
WOS记录号PPRN:119070022
文献类型期刊论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/458352
专题信息科学与技术学院
信息科学与技术学院_硕士生
共同第一作者Zhao, Zibo; Wang, Chenyu
通讯作者Gao, Shenghua
作者单位
1.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China
2.Transcengram, Brookline, MA, USA
3.DeepSeek AI, Hangzhou, Peoples R China
4.Univ Hong Kong, Hong Kong, Peoples R China
第一作者单位信息科学与技术学院
第一作者的第一单位信息科学与技术学院
推荐引用方式
GB/T 7714
Xu, Jingwei,Zhao, Zibo,Wang, Chenyu,et al. CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI),2024.
APA Xu, Jingwei,Zhao, Zibo,Wang, Chenyu,Liu, Wen,Ma, Yi,&Gao, Shenghua.(2024).CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI).
MLA Xu, Jingwei,et al."CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM".IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI) (2024).
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Xu, Jingwei]的文章
[Zhao, Zibo]的文章
[Wang, Chenyu]的文章
百度学术
百度学术中相似的文章
[Xu, Jingwei]的文章
[Zhao, Zibo]的文章
[Wang, Chenyu]的文章
必应学术
必应学术中相似的文章
[Xu, Jingwei]的文章
[Zhao, Zibo]的文章
[Wang, Chenyu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。