ShanghaiTech University Knowledge Management System
Learning Cross-Modal Context Graph for Visual Grounding | |
2020 | |
会议录名称 | THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE |
ISSN | 2159-5399 |
卷号 | 34 |
页码 | 11645-11652 |
发表状态 | 已发表 |
DOI | 10.1609/aaai.v34i07.6833 |
摘要 | Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challenging due to large variations in visual and linguistic features of grounding entities, strong context effect and the resulting semantic ambiguities. Prior works typically focus on learning representations of individual phrases with limited context information. To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task. In particular, we introduce a modular graph neural network to compute context-aware representations of phrases and object proposals respectively via message propagation, followed by a graph-based matching module to generate globally consistent localization of grounding phrases. We train the entire graph neural network jointly in a two-stage strategy and evaluate it on the Flickr30K Entities benchmark. Extensive experiments show that our method outperforms the prior state of the arts by a sizable margin, evidencing the efficacy of our grounding framework. Code is available at https://github.com/youngfly11/LCMCG-PyTorch. |
会议录编者/会议主办者 | Assoc Advancement Artificial Intelligence ; Association for the Advancement of Artificial Intelligence |
关键词 | Visual languages Graph neural networks Graphic methods Backpropagation Building blockes Context information Graph neural networks Graph representation Linguistic features Message propagation Semantic ambiguities State of the art |
会议名称 | 34th AAAI Conference on Artificial Intelligence / 32nd Innovative Applications of Artificial Intelligence Conference / 10th AAAI Symposium on Educational Advances in Artificial Intelligence |
会议地点 | New York, NY |
会议日期 | FEB 07-12, 2020 |
收录类别 | CPCI-S ; CPCI ; EI |
语种 | 英语 |
WOS记录号 | WOS:000668126804012 |
出版者 | ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE |
EI入藏号 | 20212210421387 |
EI主题词 | Semantics |
EI分类号 | 723.1.1 Computer Programming Languages ; 723.4 Artificial Intelligence |
原始文献类型 | Proceedings Paper |
引用统计 | 正在获取...
|
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/127951 |
专题 | 信息科学与技术学院_博士生 信息科学与技术学院_PI研究组_何旭明组 信息科学与技术学院_硕士生 |
共同第一作者 | Wan, Bo |
通讯作者 | Liu, Yongfei; He, Xuming |
作者单位 | 1.ShanghaiTech Univ, Shanghai, Peoples R China; 2.Queens Univ, Kingston, ON, Canada |
第一作者单位 | 上海科技大学 |
通讯作者单位 | 上海科技大学 |
第一作者的第一单位 | 上海科技大学 |
推荐引用方式 GB/T 7714 | Liu, Yongfei,Wan, Bo,Zhu, Xiaodan,et al. Learning Cross-Modal Context Graph for Visual Grounding[C]//Assoc Advancement Artificial Intelligence, Association for the Advancement of Artificial Intelligence:ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE,2020:11645-11652. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。