Structured Attentions for Visual Question Answering

doi:10.1109/ICCV.2017.145

	Structured Attentions for Visual Question Answering
	Zhu, Chen; Zhao, Yanpeng; Huang, Shuaiyi; Tu, Kewei; Ma, Yi
	2017
会议录名称	2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
ISSN	2380-7504
卷号	2017-October
页码	1300-1309
发表状态	已发表
DOI	10.1109/ICCV.2017.145
摘要	Visual attention, which assigns weights to image regions according to their relevance to a question, is considered as an indispensable part by most Visual Question Answering models. Although the questions may involve complex relations among multiple regions, few attention models can effectively encode such cross-region relations. In this paper, we demonstrate the importance of encoding such relations by showing the limited effective receptive field of ResNet on two datasets, and propose to model the visual attention as a multivariate distribution over a grid-structured Conditional Random Field on image regions. We demonstrate how to convert the iterative inference algorithms, Mean Field and Loopy Belief Propagation, as recurrent layers of an end-to-end neural network. We empirically evaluated our model on 3 datasets, in which it surpasses the best baseline model of the newly released CLEVR dataset [13] by 9.5%, and the best published model on the VQA dataset [3] by 1.25%. Source code is available at https://github.com/zhuchen03/vqa-sva.
出版地	345 E 47TH ST, NEW YORK, NY 10017 USA
会议地点	Venice, Italy
会议日期	22-29 Oct. 2017
URL	查看原文
收录类别	CPCI ; EI
语种	英语
WOS研究方向	Computer Science ; Engineering
WOS类目	Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS记录号	WOS:000425498401038
出版者	IEEE
EI入藏号	20180704804048
EI主题词	Behavioral research ; Encoding (symbols) ; Inference engines ; Iterative methods
EI分类号	Computer Software, Data Handling and Applications:723 ; Numerical Methods:921.6 ; Social Sciences:971
原始文献类型	Proceedings Paper
来源库	IEEE
引用统计	正在获取...
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/16306
专题	信息科学与技术学院信息科学与技术学院_PI研究组_马毅组信息科学与技术学院_PI研究组_屠可伟组信息科学与技术学院_硕士生
通讯作者	Zhu, Chen
作者单位	ShanghaiTech Univ, Shanghai, Peoples R China
第一作者单位	上海科技大学
通讯作者单位	上海科技大学
第一作者的第一单位	上海科技大学
推荐引用方式 GB/T 7714	Zhu, Chen,Zhao, Yanpeng,Huang, Shuaiyi,et al. Structured Attentions for Visual Question Answering[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2017:1300-1309.