ShanghaiTech University Knowledge Management System
Neural2speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction | |
2024-04 | |
会议录名称 | ICASSP 2024 - 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
![]() |
ISSN | 1520-6149 |
页码 | 2200-2204 |
发表状态 | 已发表 |
DOI | 10.1109/ICASSP48485.2024.10446614 |
摘要 | Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network (DNN) models trained on extensive neural recording data, which is resource-intensive under regular clinical constraints. However, achieving satisfactory performance in reconstructing speech from limited-scale neural recordings has been challenging, mainly due to the complexity of speech representations and the neural data constraints. To overcome these challenges, we propose a novel transfer learning framework for neural-driven speech reconstruction, called Neural2Speech, which consists of two distinct training phases. First, a speech autoencoder is pre-trained on readily available speech corpora to decode speech waveforms from the encoded speech representations. Second, a lightweight adaptor is trained on the small-scale neural recordings to align the neural activity and the speech representation for decoding. Remarkably, our proposed Neural2Speech demonstrates the feasibility of neural-driven speech reconstruction even with only 20 minutes of intracranial data, which significantly outperforms existing baseline methods in terms of speech fidelity and intelligibility. |
会议录编者/会议主办者 | The Institute of Electrical and Electronics Engineers Signal Processing Society |
关键词 | Brain-computer interface Electrocorticography Speech reconstruction Transfer learning |
会议名称 | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 |
会议地点 | Seoul, Korea, Republic of |
会议日期 | 14-19 April 2024 |
URL | 查看原文 |
收录类别 | EI |
语种 | 英语 |
出版者 | Institute of Electrical and Electronics Engineers Inc. |
EI入藏号 | 20251418177699 |
EI主题词 | Speech intelligibility |
EI分类号 | 751.5 Speech ; 752.2 Sound Recording ; 1101.2.1 Deep Learning |
原始文献类型 | Conference article (CA) |
来源库 | IEEE |
引用统计 | 正在获取...
|
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/354940 |
专题 | 生物医学工程学院 信息科学与技术学院_博士生 生物医学工程学院_PI研究组_李远宁 |
作者单位 | 1.School of Biomedical Engineering, ShanghaiTech University, Shanghai, China 2.JD AI Research, Beijing, China 3.Department of Neurological Surgery, University of California, San Francisco, CA, USA |
第一作者单位 | 生物医学工程学院 |
第一作者的第一单位 | 生物医学工程学院 |
推荐引用方式 GB/T 7714 | Jiawei Li,Chunxu Guo,Li Fu,et al. Neural2speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction[C]//The Institute of Electrical and Electronics Engineers Signal Processing Society:Institute of Electrical and Electronics Engineers Inc.,2024:2200-2204. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[Jiawei Li]的文章 |
[Chunxu Guo]的文章 |
[Li Fu]的文章 |
百度学术 |
百度学术中相似的文章 |
[Jiawei Li]的文章 |
[Chunxu Guo]的文章 |
[Li Fu]的文章 |
必应学术 |
必应学术中相似的文章 |
[Jiawei Li]的文章 |
[Chunxu Guo]的文章 |
[Li Fu]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。