| |||||||
ShanghaiTech University Knowledge Management System
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model | |
2025 | |
发表期刊 | IEEE TRANSACTIONS ON MULTIMEDIA (IF:8.4[JCR-2023],8.0[5-Year]) |
ISSN | 1941-0077 |
EISSN | 1941-0077 |
卷号 | PP期号:99 |
发表状态 | 已发表 |
DOI | 10.1109/TMM.2025.3535389 |
摘要 | The advent of large language models, which enable flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored. By achieving instruction-based shape generation, versatile multi-modal generative shape models can significantly benefit various fields, such as 3D virtual construction and network-aided design. In this work, we present ShapeGPT, a shape-included multi-modal framework to leverage strong pre-trained language models to address multiple shape-relevant tasks. Specifically, ShapeGPT employs a “word-sentence-paragraph” framework to discretize continuous shapes into shape words, further assembles these words into shape sentences, and integrates shape with instructional text for multi-modal paragraphs. To learn this shape-language model, we use a three-stage training scheme, including shape representation, multi-modal alignment, and instruction-based generation, to align shape-language codebooks and learn the intricate correlations among these modalities. Extensive experiments demonstrate that ShapeGPT achieves comparable performance across shape-relevant tasks, including text-to-shape, shape-to-text, shape completion, and shape editing. |
关键词 | 3D modeling Modula (programming language) Syntactics Three dimensional computer graphics Unified Modeling Language 3-D shape Generative model Language model Large models Learn+ Modal language Multi-modal Multimodal generative model Shape generations Unified framework |
URL | 查看原文 |
收录类别 | EI |
语种 | 英语 |
出版者 | Institute of Electrical and Electronics Engineers Inc. |
EI入藏号 | 20250617821977 |
EI主题词 | Alignment |
EI分类号 | 1106.1.1 Computer Programming Languages ; 1106.2 Data Handling and Data Processing ; 1201.12 Modeling and Simulation ; 601.1 Mechanical Devices ; 902.1 Engineering Graphics |
原始文献类型 | Article in Press |
来源库 | IEEE |
文献类型 | 期刊论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/483996 |
专题 | 信息科学与技术学院_博士生 |
作者单位 | 1.School of Information Science and Technology, Fudan University, Shanghai, China 2.Tencent PCG, China 3.ShanghaiTech University, China 4.Deepseek, China |
推荐引用方式 GB/T 7714 | Fukun Yin,Xin Chen,Chi Zhang,et al. ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2025,PP(99). |
APA | Fukun Yin.,Xin Chen.,Chi Zhang.,Biao Jiang.,Zibo Zhao.,...&Tao Chen.(2025).ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model.IEEE TRANSACTIONS ON MULTIMEDIA,PP(99). |
MLA | Fukun Yin,et al."ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model".IEEE TRANSACTIONS ON MULTIMEDIA PP.99(2025). |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[Fukun Yin]的文章 |
[Xin Chen]的文章 |
[Chi Zhang]的文章 |
百度学术 |
百度学术中相似的文章 |
[Fukun Yin]的文章 |
[Xin Chen]的文章 |
[Chi Zhang]的文章 |
必应学术 |
必应学术中相似的文章 |
[Fukun Yin]的文章 |
[Xin Chen]的文章 |
[Chi Zhang]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。