ShanghaiTech University Knowledge Management System
Relation-aware Instance Refinement for Weakly Supervised Visual Grounding | |
2021 | |
会议录名称 | PROCEEDINGS OF THE IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION |
ISSN | 1063-6919 |
页码 | 5608-5617 |
发表状态 | 已发表 |
DOI | 10.1109/CVPR46437.2021.00556 |
摘要 | Visual grounding, which aims to build a correspondence between visual objects and their language entities, plays a key role in cross-modal scene understanding. One promising and scalable strategy for learning visual grounding is to utilize weak supervision from only image-caption pairs. Previous methods typically rely on matching query phrases directly to a precomputed, fixed object candidate pool, which leads to inaccurate localization and ambiguous matching due to lack of semantic relation constraints. In our paper, we propose a novel context-aware weakly-supervised learning method that incorporates coarse-to-fine object refinement and entity relation modeling into a two-stage deep network, capable of producing more accurate object representation and matching. To effectively train our network, we introduce a self-taught regression loss for the proposal locations and a classification loss based on parsed entity relations. Extensive experiments on two public benchmarks Flickr30K Entities and ReferItGame demonstrate the efficacy of our weakly grounding framework. The results show that we outperform the previous methods by a considerable margin, achieving 59.27% top-1 accuracy in Flickr30K Entities and 37.68% in the ReferItGame dataset respectively. © 2021 IEEE |
关键词 | Computer vision Learning systems Visual languages Cross-modal Fixed-objects Image caption Language entities Localisation Matching query Matchings Scene understanding Semantic relations Visual objects |
会议名称 | 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 |
会议地点 | Virtual, Online, United states |
会议日期 | June 19, 2021 - June 25, 2021 |
URL | 查看原文 |
收录类别 | EI ; CPCI ; CPCI-S |
语种 | 英语 |
WOS记录号 | WOS:000739917305080 |
出版者 | IEEE Computer Society |
EI入藏号 | 20220411509337 |
EI主题词 | Semantics |
EI分类号 | 723.1.1 Computer Programming Languages ; 723.5 Computer Applications ; 741.2 Vision |
原始文献类型 | Conference article (CA) |
来源库 | IEEE |
引用统计 | 正在获取...
|
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/195144 |
专题 | 信息科学与技术学院 信息科学与技术学院_PI研究组_何旭明组 信息科学与技术学院_硕士生 信息科学与技术学院_博士生 |
作者单位 | 1.School of Information Science and Technology, ShanghaiTech University 2.Meituan |
第一作者单位 | 信息科学与技术学院 |
第一作者的第一单位 | 信息科学与技术学院 |
推荐引用方式 GB/T 7714 | Yongfei Liu,Bo Wan,Lin Ma,et al. Relation-aware Instance Refinement for Weakly Supervised Visual Grounding[C]:IEEE Computer Society,2021:5608-5617. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[Yongfei Liu]的文章 |
[Bo Wan]的文章 |
[Lin Ma]的文章 |
百度学术 |
百度学术中相似的文章 |
[Yongfei Liu]的文章 |
[Bo Wan]的文章 |
[Lin Ma]的文章 |
必应学术 |
必应学术中相似的文章 |
[Yongfei Liu]的文章 |
[Bo Wan]的文章 |
[Lin Ma]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。