UniGen: Unified Generative Pre-training for Multilingual Multimodal Representation
2024-03-16
会议录名称ACM INTERNATIONAL CONFERENCE PROCEEDING SERIES
页码25-31
发表状态已发表
DOI10.1145/3655497.3655509
摘要

Multilingual multimodal pre-training has garnered significant attention, but it faces challenges due to the substantial need for diverse multilingual text-image data, especially for minor languages. This article introduces UniGen, a unified strategy for efficient multilingual multimodal pre-training inspired by internet data distribution observations. Leveraging the richer availability and higher quality of multilingual text-English text and English text-image data, UniGen aligns the latent space of multilingual text with visual information to a unified semantic space. This alignment, with English as a reference, proves effective in enhancing cross-modal understanding. UniGen reduces reliance on multilingual text-image data, surpassing comparable models in multilingual multimodal benchmark IGLUE by a notable 7%. Notably, UniGen is the first multilingual multimodal model to unify all pre-training tasks within a generative pre-training framework. © 2024 ACM.

关键词Autoregressive modelling Generative model Image data Internet data Multi-modal Multilingual model Multilingual texts Multimodal pre-training Pre-training Text images
会议名称8th International Conference on Innovation in Artificial Intelligence, ICIAI 2024
会议地点Tokyo, Japan
会议日期March 16, 2024 - March 18, 2024
收录类别EI
语种英语
出版者Association for Computing Machinery
EI入藏号20243416908767
EI主题词Generative adversarial networks
EI分类号1101.2
原始文献类型Conference article (CA)
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/415588
专题信息科学与技术学院
通讯作者Luo, Guan
作者单位
1.State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
3.School of Information Science and Technology, ShanghaiTech University, Shanghai, China
推荐引用方式
GB/T 7714
Tian, Zheyuan,Luo, Guan,Wang, Bo,et al. UniGen: Unified Generative Pre-training for Multilingual Multimodal Representation[C]:Association for Computing Machinery,2024:25-31.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Tian, Zheyuan]的文章
[Luo, Guan]的文章
[Wang, Bo]的文章
百度学术
百度学术中相似的文章
[Tian, Zheyuan]的文章
[Luo, Guan]的文章
[Wang, Bo]的文章
必应学术
必应学术中相似的文章
[Tian, Zheyuan]的文章
[Luo, Guan]的文章
[Wang, Bo]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。