ShanghaiTech University Knowledge Management System
Chinese Title Generation for Short Videos: Dataset, Metric and Algorithm | |
2024 | |
发表期刊 | IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (IF:20.8[JCR-2023],22.2[5-Year]) |
ISSN | 1939-3539 |
EISSN | 1939-3539 |
卷号 | PP期号:99页码:5192-5208 |
发表状态 | 已发表 |
DOI | 10.1109/TPAMI.2024.3365739 |
摘要 | Previous work for video captioning aims to objectively describe the video content but the captions lack human interest and attractiveness, limiting its practical application scenarios. The intention of video title generation (video titling) is to produce attractive titles, but there is a lack of benchmarks. This work offers CREATE, the first large-scale Chinese shoRt vidEo retrievAl and Title gEneration dataset, to assist research and applications in video titling, video captioning, and video retrieval in Chinese. CREATE comprises a high-quality labeled 210K dataset and two web-scale 3M and 10M pre-training datasets, covering 51 categories, 50K+ tags, 537K+ manually annotated titles and captions, and 10M+ short videos with original video information. This work presents ACTEr, a unique Attractiveness-Consensus-based Title Evaluation, to objectively evaluate the quality of video title generation. This metric measures the semantic correlation between the candidate (model-generated title) and references (manual-labeled titles) and introduces attractive consensus weights to assess the attractiveness and relevance of the video title. Accordingly, this work proposes a novel multi-modal ALignment WIth Generation model, ALWIG, as one strong baseline to aid future model development. With the help of a tag-driven video-text alignment module and a GPT-based generation module, this model achieves video titling, captioning, and retrieval simultaneously. We believe that the release of the CREATE dataset, ACTEr metric, and ALWIG model will encourage in-depth research on the analysis and creation of Chinese short videos. Project webpage: https://createbenchmark.github.io/. |
关键词 | Video and Language Short Video Multi-modal Benchmark Video Titling Title Evaluation Text-Video Retrieval |
URL | 查看原文 |
收录类别 | EI |
语种 | 英语 |
出版者 | IEEE Computer Society |
EI入藏号 | 20240815580915 |
EI主题词 | Semantics |
EI分类号 | 723.2 Data Processing and Image Processing ; 913.3 Quality Assurance and Control |
原始文献类型 | Journal article (JA) |
来源库 | IEEE |
引用统计 | 正在获取...
|
文献类型 | 期刊论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/354970 |
专题 | 信息科学与技术学院 |
作者单位 | 1.National Laboratory of Pattern Recognition, Institution of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences, China 2.Aerospace Information Research Institute, CAS, China 3.ARC Lab at Tencent PCG, China 4.Huake Xingsheng Electric Power Engineering Technology, China 5.School of Information Science and Technology, ShanghaiTech University, China |
推荐引用方式 GB/T 7714 | Ziqi Zhang,Zongyang Ma,Chunfeng Yuan,et al. Chinese Title Generation for Short Videos: Dataset, Metric and Algorithm[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2024,PP(99):5192-5208. |
APA | Ziqi Zhang.,Zongyang Ma.,Chunfeng Yuan.,Yuxin Chen.,Peijin Wang.,...&Stephen Maybank.(2024).Chinese Title Generation for Short Videos: Dataset, Metric and Algorithm.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,PP(99),5192-5208. |
MLA | Ziqi Zhang,et al."Chinese Title Generation for Short Videos: Dataset, Metric and Algorithm".IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PP.99(2024):5192-5208. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。