Generative Modeling of Audible Shapes for Object Perception

doi:10.1109/ICCV.2017.141

	Generative Modeling of Audible Shapes for Object Perception
	Zhang, Zhoutong 1; Wu, Jiajun 1; Li, Qiujia 2; Huang, Zhengjia3 ; Traer, James 1; McDermott, Josh H.1; Tenenbaum, Joshua B.1; Freeman, William T.1,4
	2017
会议录名称	2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
ISSN	2380-7504
卷号	2017-October
页码	1260-1269
发表状态	已发表
DOI	10.1109/ICCV.2017.141
摘要	Humans infer rich knowledge of objects from both auditory and visual cues. Building a machine of such competency, however, is very challenging, due to the great difficulty in capturing large-scale, clean data of objects with both their appearance and the sound they make. In this paper, we present a novel, open-source pipeline that generates audiovisual data, purely from 3D object shapes and their physical properties. Through comparison with audio recordings and human behavioral studies, we validate the accuracy of the sounds it generates. Using this generative model, we are able to construct a synthetic audio-visual dataset, namely Sound-20K, for object perception tasks. We demonstrate that auditory and visual information play complementary roles in object perception, and further, that the representation learned on synthetic audio-visual data can transfer to real-world scenarios.
出版地	345 E 47TH ST, NEW YORK, NY 10017 USA
会议地点	Venice, Italy
会议日期	22-29 Oct. 2017
URL	查看原文
收录类别	CPCI ; EI
语种	英语
资助项目	Center for Brain, Minds and Machines (NSF STC award)[CCF-1231216]
WOS研究方向	Computer Science ; Engineering
WOS类目	Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic
WOS记录号	WOS:000425498401034
出版者	IEEE
EI入藏号	20180704804044
EI主题词	Behavioral research ; Computer vision
EI分类号	Computer Applications:723.5 ; Acoustic Waves:751.1 ; Social Sciences:971
WOS关键词	SOUNDS ; MOTION
原始文献类型	Proceedings Paper
来源库	IEEE
引用统计	正在获取...
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/16300
专题	信息科学与技术学院信息科学与技术学院_本科生
通讯作者	Zhang, Zhoutong
作者单位	1.MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA 2.Univ Cambridge, Cambridge, England 3.ShanghaiTech Univ, Shanghai, Peoples R China 4.Google Res, Mountain View, CA USA
推荐引用方式 GB/T 7714	Zhang, Zhoutong,Wu, Jiajun,Li, Qiujia,et al. Generative Modeling of Audible Shapes for Object Perception[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2017:1260-1269.