ShanghaiTech University Knowledge Management System
Generative Modeling of Audible Shapes for Object Perception | |
2017 | |
会议录名称 | 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
![]() |
ISSN | 2380-7504 |
卷号 | 2017-October |
页码 | 1260-1269 |
发表状态 | 已发表 |
DOI | 10.1109/ICCV.2017.141 |
摘要 | Humans infer rich knowledge of objects from both auditory and visual cues. Building a machine of such competency, however, is very challenging, due to the great difficulty in capturing large-scale, clean data of objects with both their appearance and the sound they make. In this paper, we present a novel, open-source pipeline that generates audiovisual data, purely from 3D object shapes and their physical properties. Through comparison with audio recordings and human behavioral studies, we validate the accuracy of the sounds it generates. Using this generative model, we are able to construct a synthetic audio-visual dataset, namely Sound-20K, for object perception tasks. We demonstrate that auditory and visual information play complementary roles in object perception, and further, that the representation learned on synthetic audio-visual data can transfer to real-world scenarios. |
出版地 | 345 E 47TH ST, NEW YORK, NY 10017 USA |
会议地点 | Venice, Italy |
会议日期 | 22-29 Oct. 2017 |
URL | 查看原文 |
收录类别 | CPCI ; EI |
语种 | 英语 |
资助项目 | Center for Brain, Minds and Machines (NSF STC award)[CCF-1231216] |
WOS研究方向 | Computer Science ; Engineering |
WOS类目 | Computer Science, Artificial Intelligence ; Engineering, Electrical & Electronic |
WOS记录号 | WOS:000425498401034 |
出版者 | IEEE |
EI入藏号 | 20180704804044 |
EI主题词 | Behavioral research ; Computer vision |
EI分类号 | Computer Applications:723.5 ; Acoustic Waves:751.1 ; Social Sciences:971 |
WOS关键词 | SOUNDS ; MOTION |
原始文献类型 | Proceedings Paper |
来源库 | IEEE |
引用统计 | 正在获取...
|
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/16300 |
专题 | 信息科学与技术学院 信息科学与技术学院_本科生 |
通讯作者 | Zhang, Zhoutong |
作者单位 | 1.MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA 2.Univ Cambridge, Cambridge, England 3.ShanghaiTech Univ, Shanghai, Peoples R China 4.Google Res, Mountain View, CA USA |
推荐引用方式 GB/T 7714 | Zhang, Zhoutong,Wu, Jiajun,Li, Qiujia,et al. Generative Modeling of Audible Shapes for Object Perception[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2017:1260-1269. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。