ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model

doi:10.1109/TMM.2025.3535389

	ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
	Fukun Yin 1; Xin Chen 2; Chi Zhang 2; Biao Jiang 1; Zibo Zhao3 ; Wen Liu 4; Gang Yu 2; Tao Chen 1
	2025
发表期刊	IEEE TRANSACTIONS ON MULTIMEDIA (IF:8.4[JCR-2023],8.0[5-Year])
ISSN	1941-0077
EISSN	1941-0077
卷号	PP 期号:99
发表状态	已发表
DOI	10.1109/TMM.2025.3535389
摘要	The advent of large language models, which enable flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored. By achieving instruction-based shape generation, versatile multi-modal generative shape models can significantly benefit various fields, such as 3D virtual construction and network-aided design. In this work, we present ShapeGPT, a shape-included multi-modal framework to leverage strong pre-trained language models to address multiple shape-relevant tasks. Specifically, ShapeGPT employs a “word-sentence-paragraph” framework to discretize continuous shapes into shape words, further assembles these words into shape sentences, and integrates shape with instructional text for multi-modal paragraphs. To learn this shape-language model, we use a three-stage training scheme, including shape representation, multi-modal alignment, and instruction-based generation, to align shape-language codebooks and learn the intricate correlations among these modalities. Extensive experiments demonstrate that ShapeGPT achieves comparable performance across shape-relevant tasks, including text-to-shape, shape-to-text, shape completion, and shape editing.
关键词	3D modeling Modula (programming language) Syntactics Three dimensional computer graphics Unified Modeling Language 3-D shape Generative model Language model Large models Learn+ Modal language Multi-modal Multimodal generative model Shape generations Unified framework
URL	查看原文
收录类别	EI
语种	英语
出版者	Institute of Electrical and Electronics Engineers Inc.
EI入藏号	20250617821977
EI主题词	Alignment
EI分类号	1106.1.1 Computer Programming Languages ; 1106.2 Data Handling and Data Processing ; 1201.12 Modeling and Simulation ; 601.1 Mechanical Devices ; 902.1 Engineering Graphics
原始文献类型	Article in Press
来源库	IEEE
文献类型	期刊论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/483996
专题	信息科学与技术学院_博士生
作者单位	1.School of Information Science and Technology, Fudan University, Shanghai, China 2.Tencent PCG, China 3.ShanghaiTech University, China 4.Deepseek, China
推荐引用方式 GB/T 7714	Fukun Yin,Xin Chen,Chi Zhang,et al. ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2025,PP(99).
APA	Fukun Yin.,Xin Chen.,Chi Zhang.,Biao Jiang.,Zibo Zhao.,...&Tao Chen.(2025).ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model.IEEE TRANSACTIONS ON MULTIMEDIA,PP(99).
MLA	Fukun Yin,et al."ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model".IEEE TRANSACTIONS ON MULTIMEDIA PP.99(2025).