Structured Attentions for Visual Question Answering
2017
会议录名称2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
ISSN2380-7504
卷号2017-October
页码1300-1309
发表状态已发表
DOI10.1109/ICCV.2017.145
摘要Visual attention, which assigns weights to image regions according to their relevance to a question, is considered as an indispensable part by most Visual Question Answering models. Although the questions may involve complex relations among multiple regions, few attention models can effectively encode such cross-region relations. In this paper, we demonstrate the importance of encoding such relations by showing the limited effective receptive field of ResNet on two datasets, and propose to model the visual attention as a multivariate distribution over a grid-structured Conditional Random Field on image regions. We demonstrate how to convert the iterative inference algorithms, Mean Field and Loopy Belief Propagation, as recurrent layers of an end-to-end neural network. We empirically evaluated our model on 3 datasets, in which it surpasses the best baseline model of the newly released CLEVR dataset [13] by 9.5%, and the best published model on the VQA dataset [3] by 1.25%. Source code is available at https://github.com/zhuchen03/vqa-sva.
出版地345 E 47TH ST, NEW YORK, NY 10017 USA
会议地点Venice, Italy
会议日期22-29 Oct. 2017
URL查看原文
收录类别CPCI ; EI
语种英语
WOS研究方向Computer Science ; Engineering
WOS类目Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS记录号WOS:000425498401038
出版者IEEE
EI入藏号20180704804048
EI主题词Behavioral research ; Encoding (symbols) ; Inference engines ; Iterative methods
EI分类号Computer Software, Data Handling and Applications:723 ; Numerical Methods:921.6 ; Social Sciences:971
原始文献类型Proceedings Paper
来源库IEEE
引用统计
正在获取...
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/16306
专题信息科学与技术学院
信息科学与技术学院_PI研究组_马毅组
信息科学与技术学院_PI研究组_屠可伟组
信息科学与技术学院_硕士生
通讯作者Zhu, Chen
作者单位
ShanghaiTech Univ, Shanghai, Peoples R China
第一作者单位上海科技大学
通讯作者单位上海科技大学
第一作者的第一单位上海科技大学
推荐引用方式
GB/T 7714
Zhu, Chen,Zhao, Yanpeng,Huang, Shuaiyi,et al. Structured Attentions for Visual Question Answering[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2017:1300-1309.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Zhu, Chen]的文章
[Zhao, Yanpeng]的文章
[Huang, Shuaiyi]的文章
百度学术
百度学术中相似的文章
[Zhu, Chen]的文章
[Zhao, Yanpeng]的文章
[Huang, Shuaiyi]的文章
必应学术
必应学术中相似的文章
[Zhu, Chen]的文章
[Zhao, Yanpeng]的文章
[Huang, Shuaiyi]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 10.1109@ICCV.2017.145.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。