消息
×
loading..
ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model
2025
发表期刊IEEE TRANSACTIONS ON MULTIMEDIA (IF:8.4[JCR-2023],8.0[5-Year])
ISSN1941-0077
EISSN1941-0077
卷号PP期号:99
发表状态已发表
DOI10.1109/TMM.2025.3535389
摘要The advent of large language models, which enable flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored. By achieving instruction-based shape generation, versatile multi-modal generative shape models can significantly benefit various fields, such as 3D virtual construction and network-aided design. In this work, we present ShapeGPT, a shape-included multi-modal framework to leverage strong pre-trained language models to address multiple shape-relevant tasks. Specifically, ShapeGPT employs a “word-sentence-paragraph” framework to discretize continuous shapes into shape words, further assembles these words into shape sentences, and integrates shape with instructional text for multi-modal paragraphs. To learn this shape-language model, we use a three-stage training scheme, including shape representation, multi-modal alignment, and instruction-based generation, to align shape-language codebooks and learn the intricate correlations among these modalities. Extensive experiments demonstrate that ShapeGPT achieves comparable performance across shape-relevant tasks, including text-to-shape, shape-to-text, shape completion, and shape editing.
关键词3D modeling Modula (programming language) Syntactics Three dimensional computer graphics Unified Modeling Language 3-D shape Generative model Language model Large models Learn+ Modal language Multi-modal Multimodal generative model Shape generations Unified framework
URL查看原文
收录类别EI
语种英语
出版者Institute of Electrical and Electronics Engineers Inc.
EI入藏号20250617821977
EI主题词Alignment
EI分类号1106.1.1 Computer Programming Languages ; 1106.2 Data Handling and Data Processing ; 1201.12 Modeling and Simulation ; 601.1 Mechanical Devices ; 902.1 Engineering Graphics
原始文献类型Article in Press
来源库IEEE
文献类型期刊论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/483996
专题信息科学与技术学院_博士生
作者单位
1.School of Information Science and Technology, Fudan University, Shanghai, China
2.Tencent PCG, China
3.ShanghaiTech University, China
4.Deepseek, China
推荐引用方式
GB/T 7714
Fukun Yin,Xin Chen,Chi Zhang,et al. ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2025,PP(99).
APA Fukun Yin.,Xin Chen.,Chi Zhang.,Biao Jiang.,Zibo Zhao.,...&Tao Chen.(2025).ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model.IEEE TRANSACTIONS ON MULTIMEDIA,PP(99).
MLA Fukun Yin,et al."ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model".IEEE TRANSACTIONS ON MULTIMEDIA PP.99(2025).
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Fukun Yin]的文章
[Xin Chen]的文章
[Chi Zhang]的文章
百度学术
百度学术中相似的文章
[Fukun Yin]的文章
[Xin Chen]的文章
[Chi Zhang]的文章
必应学术
必应学术中相似的文章
[Fukun Yin]的文章
[Xin Chen]的文章
[Chi Zhang]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 10.1109@TMM.2025.3535389.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。