Structured Attentions for Visual Question Answering
Zhu, Chen; Zhao, Yanpeng; Huang, Shuaiyi; Tu, Kewei; Ma, Yi
2017
Source Publication2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
Volume2017-October
Pages1300-1309
Status已发表
DOI10.1109/ICCV.2017.145
AbstractVisual attention, which assigns weights to image regions according to their relevance to a question, is considered as an indispensable part by most Visual Question Answering models. Although the questions may involve complex relations among multiple regions, few attention models can effectively encode such cross-region relations. In this paper, we demonstrate the importance of encoding such relations by showing the limited effective receptive field of ResNet on two datasets, and propose to model the visual attention as a multivariate distribution over a grid-structured Conditional Random Field on image regions. We demonstrate how to convert the iterative inference algorithms, Mean Field and Loopy Belief Propagation, as recurrent layers of an end-to-end neural network. We empirically evaluated our model on 3 datasets, in which it surpasses the best baseline model of the newly released CLEVR dataset [13] by 9.5%, and the best published model on the VQA dataset [3] by 1.25%. Source code is available at https://github.com/zhuchen03/vqa-sva.
Publication Place345 E 47TH ST, NEW YORK, NY 10017 USA
Conference PlaceVenice, Italy
Indexed ByCPCI ; EI
Language英语
WOS Research AreaComputer Science ; Engineering
WOS SubjectComputer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS IDWOS:000425498401038
PublisherIEEE
EI Accession Number20180704804048
EI KeywordsBehavioral research ; Encoding (symbols) ; Inference engines ; Iterative methods
EI Classification NumberComputer Software, Data Handling and Applications:723 ; Numerical Methods:921.6 ; Social Sciences:971
Original Document TypeProceedings Paper
Citation statistics
Cited Times:22[WOS]   [WOS Record]     [Related Records in WOS]
Document Type会议论文
Identifierhttps://kms.shanghaitech.edu.cn/handle/2MSLDSTB/16306
Collection信息科学与技术学院
信息科学与技术学院_PI研究组_马毅组
信息科学与技术学院_PI研究组_屠可伟组
信息科学与技术学院_硕士生
Corresponding AuthorZhu, Chen
AffiliationShanghaiTech Univ, Shanghai, Peoples R China
First Author AffilicationShanghaiTech University
Corresponding Author AffilicationShanghaiTech University
First Signature AffilicationShanghaiTech University
Recommended Citation
GB/T 7714
Zhu, Chen,Zhao, Yanpeng,Huang, Shuaiyi,et al. Structured Attentions for Visual Question Answering[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2017:1300-1309.
Files in This Item: Download All
File Name/Size DocType Version Access License
10.1109@ICCV.2017.14(1400KB)会议论文作者原稿开放获取UnknownView Download
Related Services
Usage statistics
Scholar Google
Similar articles in Scholar Google
[Zhu, Chen]'s Articles
[Zhao, Yanpeng]'s Articles
[Huang, Shuaiyi]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhu, Chen]'s Articles
[Zhao, Yanpeng]'s Articles
[Huang, Shuaiyi]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhu, Chen]'s Articles
[Zhao, Yanpeng]'s Articles
[Huang, Shuaiyi]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 10.1109@ICCV.2017.145.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.