Learning Cross-Modal Context Graph for Visual Grounding

doi:10.1609/aaai.v34i07.6833

	Learning Cross-Modal Context Graph for Visual Grounding
	Liu, Yongfei1 ; Wan, Bo1 ; Zhu, Xiaodan 2; He, Xuming1
	2020
会议录名称	THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE
ISSN	2159-5399
卷号	34
页码	11645-11652
发表状态	已发表
DOI	10.1609/aaai.v34i07.6833
摘要	Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challenging due to large variations in visual and linguistic features of grounding entities, strong context effect and the resulting semantic ambiguities. Prior works typically focus on learning representations of individual phrases with limited context information. To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task. In particular, we introduce a modular graph neural network to compute context-aware representations of phrases and object proposals respectively via message propagation, followed by a graph-based matching module to generate globally consistent localization of grounding phrases. We train the entire graph neural network jointly in a two-stage strategy and evaluate it on the Flickr30K Entities benchmark. Extensive experiments show that our method outperforms the prior state of the arts by a sizable margin, evidencing the efficacy of our grounding framework. Code is available at https://github.com/youngfly11/LCMCG-PyTorch.
会议录编者/会议主办者	Assoc Advancement Artificial Intelligence ; Association for the Advancement of Artificial Intelligence
关键词	Visual languages Graph neural networks Graphic methods Backpropagation Building blockes Context information Graph neural networks Graph representation Linguistic features Message propagation Semantic ambiguities State of the art
会议名称	34th AAAI Conference on Artificial Intelligence / 32nd Innovative Applications of Artificial Intelligence Conference / 10th AAAI Symposium on Educational Advances in Artificial Intelligence
会议地点	New York, NY
会议日期	FEB 07-12, 2020
收录类别	CPCI-S ; CPCI ; EI
语种	英语
WOS记录号	WOS:000668126804012
出版者	ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE
EI入藏号	20212210421387
EI主题词	Semantics
EI分类号	723.1.1 Computer Programming Languages ; 723.4 Artificial Intelligence
原始文献类型	Proceedings Paper
引用统计	正在获取...
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/127951
专题	信息科学与技术学院_博士生信息科学与技术学院_PI研究组_何旭明组信息科学与技术学院_硕士生
共同第一作者	Wan, Bo
通讯作者	Liu, Yongfei; He, Xuming
作者单位	1.ShanghaiTech Univ, Shanghai, Peoples R China; 2.Queens Univ, Kingston, ON, Canada
第一作者单位	上海科技大学
通讯作者单位	上海科技大学
第一作者的第一单位	上海科技大学
推荐引用方式 GB/T 7714	Liu, Yongfei,Wan, Bo,Zhu, Xiaodan,et al. Learning Cross-Modal Context Graph for Visual Grounding[C]//Assoc Advancement Artificial Intelligence, Association for the Advancement of Artificial Intelligence:ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE,2020:11645-11652.