ShanghaiTech University Knowledge Management System
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator | |
2023 | |
会议录名称 | ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023)
![]() |
ISSN | 1049-5258 |
发表状态 | 已发表 |
摘要 | Text-to-video is a rapidly growing research area that aims to generate a semantic, identical, and temporal coherence sequence of frames that accurately align with the input text prompt. This study focuses on zero-shot text-to-video generation considering the data- and cost-efficient. To generate a semantic-coherent video, exhibiting a rich portrayal of temporal semantics such as the whole process of flower blooming rather than a set of "moving images", we propose a novel Free-Bloom pipeline that harnesses large language models (LLMs) as the director to generate a semantic-coherence prompt sequence, while pre-trained latent diffusion models (LDMs) as the animator to generate the high fidelity frames. Furthermore, to ensure temporal and identical coherence while maintaining semantic coherence, we propose a series of annotative modifications to adapting LDMs in the reverse process, including joint noise sampling, step-aware attention shift, and dual-path interpolation. Without any video data and training requirements, Free-Bloom generates vivid and high-quality videos, awe-inspiring in generating complex scenes with semantic meaningful frame sequences. In addition, Free-Bloom is naturally compatible with LDMs-based extensions. |
会议名称 | 37th Conference on Neural Information Processing Systems (NeurIPS) |
出版地 | 10010 NORTH TORREY PINES RD, LA JOLLA, CALIFORNIA 92037 USA |
会议地点 | null,New Orleans,LA |
会议日期 | DEC 10-16, 2023 |
URL | 查看原文 |
收录类别 | CPCI-S |
语种 | 英语 |
资助项目 | National Natural Science Foundation of China[62206174] ; Shanghai Pujiang Program[21PJ1410900] |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Artificial Intelligence ; Computer Science, Information Systems |
WOS记录号 | WOS:001224281500034 |
出版者 | NEURAL INFORMATION PROCESSING SYSTEMS (NIPS) |
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/381382 |
专题 | 信息科学与技术学院 信息科学与技术学院_PI研究组_虞晶怡组 信息科学与技术学院_硕士生 信息科学与技术学院_本科生 信息科学与技术学院_博士生 信息科学与技术学院_PI研究组_许岚组 信息科学与技术学院_PI研究组_杨思蓓组 |
通讯作者 | Yang, Sibei |
作者单位 | ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China |
第一作者单位 | 信息科学与技术学院 |
通讯作者单位 | 信息科学与技术学院 |
第一作者的第一单位 | 信息科学与技术学院 |
推荐引用方式 GB/T 7714 | Huang, Hanzhuo,Feng, Yufan,Shi, Cheng,et al. Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator[C]. 10010 NORTH TORREY PINES RD, LA JOLLA, CALIFORNIA 92037 USA:NEURAL INFORMATION PROCESSING SYSTEMS (NIPS),2023. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。