Chinese Title Generation for Short Videos: Dataset, Metric and Algorithm
2024
发表期刊IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (IF:20.8[JCR-2023],22.2[5-Year])
ISSN1939-3539
EISSN1939-3539
卷号PP期号:99页码:5192-5208
发表状态已发表
DOI10.1109/TPAMI.2024.3365739
摘要Previous work for video captioning aims to objectively describe the video content but the captions lack human interest and attractiveness, limiting its practical application scenarios. The intention of video title generation (video titling) is to produce attractive titles, but there is a lack of benchmarks. This work offers CREATE, the first large-scale Chinese shoRt vidEo retrievAl and Title gEneration dataset, to assist research and applications in video titling, video captioning, and video retrieval in Chinese. CREATE comprises a high-quality labeled 210K dataset and two web-scale 3M and 10M pre-training datasets, covering 51 categories, 50K+ tags, 537K+ manually annotated titles and captions, and 10M+ short videos with original video information. This work presents ACTEr, a unique Attractiveness-Consensus-based Title Evaluation, to objectively evaluate the quality of video title generation. This metric measures the semantic correlation between the candidate (model-generated title) and references (manual-labeled titles) and introduces attractive consensus weights to assess the attractiveness and relevance of the video title. Accordingly, this work proposes a novel multi-modal ALignment WIth Generation model, ALWIG, as one strong baseline to aid future model development. With the help of a tag-driven video-text alignment module and a GPT-based generation module, this model achieves video titling, captioning, and retrieval simultaneously. We believe that the release of the CREATE dataset, ACTEr metric, and ALWIG model will encourage in-depth research on the analysis and creation of Chinese short videos. Project webpage: https://createbenchmark.github.io/.
关键词Video and Language Short Video Multi-modal Benchmark Video Titling Title Evaluation Text-Video Retrieval
URL查看原文
收录类别EI
语种英语
出版者IEEE Computer Society
EI入藏号20240815580915
EI主题词Semantics
EI分类号723.2 Data Processing and Image Processing ; 913.3 Quality Assurance and Control
原始文献类型Journal article (JA)
来源库IEEE
引用统计
正在获取...
文献类型期刊论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/354970
专题信息科学与技术学院
作者单位
1.National Laboratory of Pattern Recognition, Institution of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences, China
2.Aerospace Information Research Institute, CAS, China
3.ARC Lab at Tencent PCG, China
4.Huake Xingsheng Electric Power Engineering Technology, China
5.School of Information Science and Technology, ShanghaiTech University, China
推荐引用方式
GB/T 7714
Ziqi Zhang,Zongyang Ma,Chunfeng Yuan,et al. Chinese Title Generation for Short Videos: Dataset, Metric and Algorithm[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2024,PP(99):5192-5208.
APA Ziqi Zhang.,Zongyang Ma.,Chunfeng Yuan.,Yuxin Chen.,Peijin Wang.,...&Stephen Maybank.(2024).Chinese Title Generation for Short Videos: Dataset, Metric and Algorithm.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,PP(99),5192-5208.
MLA Ziqi Zhang,et al."Chinese Title Generation for Short Videos: Dataset, Metric and Algorithm".IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PP.99(2024):5192-5208.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Ziqi Zhang]的文章
[Zongyang Ma]的文章
[Chunfeng Yuan]的文章
百度学术
百度学术中相似的文章
[Ziqi Zhang]的文章
[Zongyang Ma]的文章
[Chunfeng Yuan]的文章
必应学术
必应学术中相似的文章
[Ziqi Zhang]的文章
[Zongyang Ma]的文章
[Chunfeng Yuan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。