Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
2023
会议录名称ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023)
ISSN1049-5258
发表状态已发表
摘要Text-to-video is a rapidly growing research area that aims to generate a semantic, identical, and temporal coherence sequence of frames that accurately align with the input text prompt. This study focuses on zero-shot text-to-video generation considering the data- and cost-efficient. To generate a semantic-coherent video, exhibiting a rich portrayal of temporal semantics such as the whole process of flower blooming rather than a set of "moving images", we propose a novel Free-Bloom pipeline that harnesses large language models (LLMs) as the director to generate a semantic-coherence prompt sequence, while pre-trained latent diffusion models (LDMs) as the animator to generate the high fidelity frames. Furthermore, to ensure temporal and identical coherence while maintaining semantic coherence, we propose a series of annotative modifications to adapting LDMs in the reverse process, including joint noise sampling, step-aware attention shift, and dual-path interpolation. Without any video data and training requirements, Free-Bloom generates vivid and high-quality videos, awe-inspiring in generating complex scenes with semantic meaningful frame sequences. In addition, Free-Bloom is naturally compatible with LDMs-based extensions.
会议名称37th Conference on Neural Information Processing Systems (NeurIPS)
出版地10010 NORTH TORREY PINES RD, LA JOLLA, CALIFORNIA 92037 USA
会议地点null,New Orleans,LA
会议日期DEC 10-16, 2023
URL查看原文
收录类别CPCI-S
语种英语
资助项目National Natural Science Foundation of China[62206174] ; Shanghai Pujiang Program[21PJ1410900]
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence ; Computer Science, Information Systems
WOS记录号WOS:001224281500034
出版者NEURAL INFORMATION PROCESSING SYSTEMS (NIPS)
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/381382
专题信息科学与技术学院
信息科学与技术学院_PI研究组_虞晶怡组
信息科学与技术学院_硕士生
信息科学与技术学院_本科生
信息科学与技术学院_博士生
信息科学与技术学院_PI研究组_许岚组
信息科学与技术学院_PI研究组_杨思蓓组
通讯作者Yang, Sibei
作者单位
ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
第一作者单位信息科学与技术学院
通讯作者单位信息科学与技术学院
第一作者的第一单位信息科学与技术学院
推荐引用方式
GB/T 7714
Huang, Hanzhuo,Feng, Yufan,Shi, Cheng,et al. Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator[C]. 10010 NORTH TORREY PINES RD, LA JOLLA, CALIFORNIA 92037 USA:NEURAL INFORMATION PROCESSING SYSTEMS (NIPS),2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Huang, Hanzhuo]的文章
[Feng, Yufan]的文章
[Shi, Cheng]的文章
百度学术
百度学术中相似的文章
[Huang, Hanzhuo]的文章
[Feng, Yufan]的文章
[Shi, Cheng]的文章
必应学术
必应学术中相似的文章
[Huang, Hanzhuo]的文章
[Feng, Yufan]的文章
[Shi, Cheng]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。