ShanghaiTech University Knowledge Management System
Structured Attentions for Visual Question Answering | |
2017 | |
会议录名称 | 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
![]() |
ISSN | 2380-7504 |
卷号 | 2017-October |
页码 | 1300-1309 |
发表状态 | 已发表 |
DOI | 10.1109/ICCV.2017.145 |
摘要 | Visual attention, which assigns weights to image regions according to their relevance to a question, is considered as an indispensable part by most Visual Question Answering models. Although the questions may involve complex relations among multiple regions, few attention models can effectively encode such cross-region relations. In this paper, we demonstrate the importance of encoding such relations by showing the limited effective receptive field of ResNet on two datasets, and propose to model the visual attention as a multivariate distribution over a grid-structured Conditional Random Field on image regions. We demonstrate how to convert the iterative inference algorithms, Mean Field and Loopy Belief Propagation, as recurrent layers of an end-to-end neural network. We empirically evaluated our model on 3 datasets, in which it surpasses the best baseline model of the newly released CLEVR dataset [13] by 9.5%, and the best published model on the VQA dataset [3] by 1.25%. Source code is available at https://github.com/zhuchen03/vqa-sva. |
出版地 | 345 E 47TH ST, NEW YORK, NY 10017 USA |
会议地点 | Venice, Italy |
会议日期 | 22-29 Oct. 2017 |
URL | 查看原文 |
收录类别 | CPCI ; EI |
语种 | 英语 |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000425498401038 |
出版者 | IEEE |
EI入藏号 | 20180704804048 |
EI主题词 | Behavioral research ; Encoding (symbols) ; Inference engines ; Iterative methods |
EI分类号 | Computer Software, Data Handling and Applications:723 ; Numerical Methods:921.6 ; Social Sciences:971 |
原始文献类型 | Proceedings Paper |
来源库 | IEEE |
引用统计 | 正在获取...
|
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/16306 |
专题 | 信息科学与技术学院 信息科学与技术学院_PI研究组_马毅组 信息科学与技术学院_PI研究组_屠可伟组 信息科学与技术学院_硕士生 |
通讯作者 | Zhu, Chen |
作者单位 | ShanghaiTech Univ, Shanghai, Peoples R China |
第一作者单位 | 上海科技大学 |
通讯作者单位 | 上海科技大学 |
第一作者的第一单位 | 上海科技大学 |
推荐引用方式 GB/T 7714 | Zhu, Chen,Zhao, Yanpeng,Huang, Shuaiyi,et al. Structured Attentions for Visual Question Answering[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2017:1300-1309. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。