| |||||||
ShanghaiTech University Knowledge Management System
iQuery: Instruments as Queries for Audio-Visual Sound Separation | |
2022-12-08 | |
状态 | 已发表 |
摘要 | Current audio-visual separation methods share a standard architecture design where an audio encoder-decoder network is fused with visual encoding features at the encoder bottleneck. This design confounds the learning of multi-modal feature encoding with robust sound decoding for audio separation. To generalize to a new instrument: one must finetune the entire visual and audio network for all musical instruments. We re-formulate visual-sound separation task and propose Instrument as Query (iQuery) with a flexible query expansion mechanism. Our approach ensures cross-modal consistency and cross-instrument disentanglement. We utilize "visually named" queries to initiate the learning of audio queries and use cross-modal attention to remove potential sound source interference at the estimated waveforms. To generalize to a new instrument or event class, drawing inspiration from the text-prompt design, we insert an additional query as an audio prompt while freezing the attention mechanism. Experimental results on three benchmarks demonstrate that our iQuery improves audio-visual sound source separation performance. |
DOI | arXiv:2212.03814 |
相关网址 | 查看原文 |
出处 | Arxiv |
WOS记录号 | PPRN:25201306 |
WOS类目 | Computer Science, Software Engineering ; Engineering, Electrical& Electronic |
文献类型 | 预印本 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/348092 |
专题 | 信息科学与技术学院_硕士生 信息科学与技术学院_本科生 |
作者单位 | 1.UC San Diego, La Jolla, CA 92093, USA 2.Shanghai AI Lab, Shanghai, Peoples R China 3.Chinese Univ Hong Kong, Hong Kong, Peoples R China 4.Natl Univ Singapore, Singapore, Singapore 5.ShanghaiTech Univ, Shanghai, Peoples R China 6.Univ Penn, Philadelphia, PA, USA |
推荐引用方式 GB/T 7714 | Chen, Jiaben,Zhang, Renrui,Lian, Dongze,et al. iQuery: Instruments as Queries for Audio-Visual Sound Separation. 2022. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。