Relation-aware Instance Refinement for Weakly Supervised Visual Grounding
2021
会议录名称PROCEEDINGS OF THE IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION
ISSN1063-6919
页码5608-5617
发表状态已发表
DOI10.1109/CVPR46437.2021.00556
摘要

Visual grounding, which aims to build a correspondence between visual objects and their language entities, plays a key role in cross-modal scene understanding. One promising and scalable strategy for learning visual grounding is to utilize weak supervision from only image-caption pairs. Previous methods typically rely on matching query phrases directly to a precomputed, fixed object candidate pool, which leads to inaccurate localization and ambiguous matching due to lack of semantic relation constraints. In our paper, we propose a novel context-aware weakly-supervised learning method that incorporates coarse-to-fine object refinement and entity relation modeling into a two-stage deep network, capable of producing more accurate object representation and matching. To effectively train our network, we introduce a self-taught regression loss for the proposal locations and a classification loss based on parsed entity relations. Extensive experiments on two public benchmarks Flickr30K Entities and ReferItGame demonstrate the efficacy of our weakly grounding framework. The results show that we outperform the previous methods by a considerable margin, achieving 59.27% top-1 accuracy in Flickr30K Entities and 37.68% in the ReferItGame dataset respectively. © 2021 IEEE

关键词Computer vision Learning systems Visual languages Cross-modal Fixed-objects Image caption Language entities Localisation Matching query Matchings Scene understanding Semantic relations Visual objects
会议名称2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
会议地点Virtual, Online, United states
会议日期June 19, 2021 - June 25, 2021
URL查看原文
收录类别EI ; CPCI ; CPCI-S
语种英语
WOS记录号WOS:000739917305080
出版者IEEE Computer Society
EI入藏号20220411509337
EI主题词Semantics
EI分类号723.1.1 Computer Programming Languages ; 723.5 Computer Applications ; 741.2 Vision
原始文献类型Conference article (CA)
来源库IEEE
引用统计
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/195144
专题信息科学与技术学院
信息科学与技术学院_PI研究组_何旭明组
信息科学与技术学院_硕士生
信息科学与技术学院_博士生
作者单位
1.School of Information Science and Technology, ShanghaiTech University
2.Meituan
第一作者单位信息科学与技术学院
第一作者的第一单位信息科学与技术学院
推荐引用方式
GB/T 7714
Yongfei Liu,Bo Wan,Lin Ma,et al. Relation-aware Instance Refinement for Weakly Supervised Visual Grounding[C]:IEEE Computer Society,2021:5608-5617.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Yongfei Liu]的文章
[Bo Wan]的文章
[Lin Ma]的文章
百度学术
百度学术中相似的文章
[Yongfei Liu]的文章
[Bo Wan]的文章
[Lin Ma]的文章
必应学术
必应学术中相似的文章
[Yongfei Liu]的文章
[Bo Wan]的文章
[Lin Ma]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 10.1109@CVPR46437.2021.00556.pdf
格式: Adobe PDF
此文件暂不支持浏览
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。