ShanghaiTech University Knowledge Management System
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator | |
2023 | |
会议录名称 | ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS |
ISSN | 1049-5258 |
卷号 | 36 |
发表状态 | 已发表 |
摘要 | Text-to-video is a rapidly growing research area that aims to generate a semantic, identical, and temporal coherence sequence of frames that accurately align with the input text prompt. This study focuses on zero-shot text-to-video generation considering the data- and cost-efficient. To generate a semantic-coherent video, exhibiting a rich portrayal of temporal semantics such as the whole process of flower blooming rather than a set of 'moving images', we propose a novel Free-Bloom pipeline that harnesses large language models (LLMs) as the director to generate a semantic-coherence prompt sequence, while pre-trained latent diffusion models (LDMs) as the animator to generate the high fidelity frames. Furthermore, to ensure temporal and identical coherence while maintaining semantic coherence, we propose a series of annotative modifications to adapting LDMs in the reverse process, including joint noise sampling, step-aware attention shift, and dual-path interpolation. Without any video data and training requirements, Free-Bloom generates vivid and high-quality videos, awe-inspiring in generating complex scenes with semantic meaningful frame sequences. In addition, Free-Bloom is naturally compatible with LDMs-based extensions. © 2023 Neural information processing systems foundation. All rights reserved. |
会议名称 | 37th Conference on Neural Information Processing Systems, NeurIPS 2023 |
会议地点 | New Orleans, LA, United states |
会议日期 | December 10, 2023 - December 16, 2023 |
收录类别 | EI |
语种 | 英语 |
出版者 | Neural information processing systems foundation |
EI入藏号 | 20241715985711 |
原始文献类型 | Conference article (CA) |
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/370149 |
专题 | 信息科学与技术学院 信息科学与技术学院_PI研究组_虞晶怡组 信息科学与技术学院_硕士生 信息科学与技术学院_本科生 信息科学与技术学院_PI研究组_许岚组 信息科学与技术学院_PI研究组_杨思蓓组 |
通讯作者 | Yang, Sibei |
作者单位 | School of Information Science and Technology, ShanghaiTech University, China |
第一作者单位 | 信息科学与技术学院 |
通讯作者单位 | 信息科学与技术学院 |
第一作者的第一单位 | 信息科学与技术学院 |
推荐引用方式 GB/T 7714 | Huang, Hanzhuo,Feng, Yufan,Shi, Cheng,et al. Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator[C]:Neural information processing systems foundation,2023. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。