Relation-aware Instance Refinement for Weakly Supervised Visual Grounding

doi:10.1109/CVPR46437.2021.00556

	Relation-aware Instance Refinement for Weakly Supervised Visual Grounding
	Yongfei Liu1 ; Bo Wan1 ; Lin Ma 2; Xuming He1
	2021
会议录名称	PROCEEDINGS OF THE IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION
ISSN	1063-6919
页码	5608-5617
发表状态	已发表
DOI	10.1109/CVPR46437.2021.00556
摘要	Visual grounding, which aims to build a correspondence between visual objects and their language entities, plays a key role in cross-modal scene understanding. One promising and scalable strategy for learning visual grounding is to utilize weak supervision from only image-caption pairs. Previous methods typically rely on matching query phrases directly to a precomputed, fixed object candidate pool, which leads to inaccurate localization and ambiguous matching due to lack of semantic relation constraints. In our paper, we propose a novel context-aware weakly-supervised learning method that incorporates coarse-to-fine object refinement and entity relation modeling into a two-stage deep network, capable of producing more accurate object representation and matching. To effectively train our network, we introduce a self-taught regression loss for the proposal locations and a classification loss based on parsed entity relations. Extensive experiments on two public benchmarks Flickr30K Entities and ReferItGame demonstrate the efficacy of our weakly grounding framework. The results show that we outperform the previous methods by a considerable margin, achieving 59.27% top-1 accuracy in Flickr30K Entities and 37.68% in the ReferItGame dataset respectively. © 2021 IEEE
关键词	Computer vision Learning systems Visual languages Cross-modal Fixed-objects Image caption Language entities Localisation Matching query Matchings Scene understanding Semantic relations Visual objects
会议名称	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
会议地点	Virtual, Online, United states
会议日期	June 19, 2021 - June 25, 2021
URL	查看原文
收录类别	EI ; CPCI ; CPCI-S
语种	英语
WOS记录号	WOS:000739917305080
出版者	IEEE Computer Society
EI入藏号	20220411509337
EI主题词	Semantics
EI分类号	723.1.1 Computer Programming Languages ; 723.5 Computer Applications ; 741.2 Vision
原始文献类型	Conference article (CA)
来源库	IEEE
引用统计
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/195144
专题	信息科学与技术学院信息科学与技术学院_PI研究组_何旭明组信息科学与技术学院_硕士生信息科学与技术学院_博士生
作者单位	1.School of Information Science and Technology, ShanghaiTech University 2.Meituan
第一作者单位	信息科学与技术学院
第一作者的第一单位	信息科学与技术学院
推荐引用方式 GB/T 7714	Yongfei Liu,Bo Wan,Lin Ma,et al. Relation-aware Instance Refinement for Weakly Supervised Visual Grounding[C]:IEEE Computer Society,2021:5608-5617.