Learning Cross-Modal Context Graph for Visual Grounding
2020
会议录名称THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE
ISSN2159-5399
卷号34
页码11645-11652
发表状态已发表
DOI10.1609/aaai.v34i07.6833
摘要

Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challenging due to large variations in visual and linguistic features of grounding entities, strong context effect and the resulting semantic ambiguities. Prior works typically focus on learning representations of individual phrases with limited context information. To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task. In particular, we introduce a modular graph neural network to compute context-aware representations of phrases and object proposals respectively via message propagation, followed by a graph-based matching module to generate globally consistent localization of grounding phrases. We train the entire graph neural network jointly in a two-stage strategy and evaluate it on the Flickr30K Entities benchmark. Extensive experiments show that our method outperforms the prior state of the arts by a sizable margin, evidencing the efficacy of our grounding framework. Code is available at https://github.com/youngfly11/LCMCG-PyTorch.

会议录编者/会议主办者Assoc Advancement Artificial Intelligence ; Association for the Advancement of Artificial Intelligence
关键词Visual languages Graph neural networks Graphic methods Backpropagation Building blockes Context information Graph neural networks Graph representation Linguistic features Message propagation Semantic ambiguities State of the art
会议名称34th AAAI Conference on Artificial Intelligence / 32nd Innovative Applications of Artificial Intelligence Conference / 10th AAAI Symposium on Educational Advances in Artificial Intelligence
会议地点New York, NY
会议日期FEB 07-12, 2020
收录类别CPCI-S ; CPCI ; EI
语种英语
WOS记录号WOS:000668126804012
出版者ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE
EI入藏号20212210421387
EI主题词Semantics
EI分类号723.1.1 Computer Programming Languages ; 723.4 Artificial Intelligence
原始文献类型Proceedings Paper
引用统计
正在获取...
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/127951
专题信息科学与技术学院_博士生
信息科学与技术学院_PI研究组_何旭明组
信息科学与技术学院_硕士生
共同第一作者Wan, Bo
通讯作者Liu, Yongfei; He, Xuming
作者单位
1.ShanghaiTech Univ, Shanghai, Peoples R China;
2.Queens Univ, Kingston, ON, Canada
第一作者单位上海科技大学
通讯作者单位上海科技大学
第一作者的第一单位上海科技大学
推荐引用方式
GB/T 7714
Liu, Yongfei,Wan, Bo,Zhu, Xiaodan,et al. Learning Cross-Modal Context Graph for Visual Grounding[C]//Assoc Advancement Artificial Intelligence, Association for the Advancement of Artificial Intelligence:ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE,2020:11645-11652.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Liu, Yongfei]的文章
[Wan, Bo]的文章
[Zhu, Xiaodan]的文章
百度学术
百度学术中相似的文章
[Liu, Yongfei]的文章
[Wan, Bo]的文章
[Zhu, Xiaodan]的文章
必应学术
必应学术中相似的文章
[Liu, Yongfei]的文章
[Wan, Bo]的文章
[Zhu, Xiaodan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。