Neural2speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction
2024-04
会议录名称ICASSP 2024 - 2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
ISSN1520-6149
页码2200-2204
发表状态已发表
DOI10.1109/ICASSP48485.2024.10446614
摘要Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network (DNN) models trained on extensive neural recording data, which is resource-intensive under regular clinical constraints. However, achieving satisfactory performance in reconstructing speech from limited-scale neural recordings has been challenging, mainly due to the complexity of speech representations and the neural data constraints. To overcome these challenges, we propose a novel transfer learning framework for neural-driven speech reconstruction, called Neural2Speech, which consists of two distinct training phases. First, a speech autoencoder is pre-trained on readily available speech corpora to decode speech waveforms from the encoded speech representations. Second, a lightweight adaptor is trained on the small-scale neural recordings to align the neural activity and the speech representation for decoding. Remarkably, our proposed Neural2Speech demonstrates the feasibility of neural-driven speech reconstruction even with only 20 minutes of intracranial data, which significantly outperforms existing baseline methods in terms of speech fidelity and intelligibility.
会议录编者/会议主办者The Institute of Electrical and Electronics Engineers Signal Processing Society
关键词Brain-computer interface Electrocorticography Speech reconstruction Transfer learning
会议名称2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
会议地点Seoul, Korea, Republic of
会议日期14-19 April 2024
URL查看原文
收录类别EI
语种英语
出版者Institute of Electrical and Electronics Engineers Inc.
EI入藏号20251418177699
EI主题词Speech intelligibility
EI分类号751.5 Speech ; 752.2 Sound Recording ; 1101.2.1 Deep Learning
原始文献类型Conference article (CA)
来源库IEEE
引用统计
正在获取...
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/354940
专题生物医学工程学院
信息科学与技术学院_博士生
生物医学工程学院_PI研究组_李远宁
作者单位
1.School of Biomedical Engineering, ShanghaiTech University, Shanghai, China
2.JD AI Research, Beijing, China
3.Department of Neurological Surgery, University of California, San Francisco, CA, USA
第一作者单位生物医学工程学院
第一作者的第一单位生物医学工程学院
推荐引用方式
GB/T 7714
Jiawei Li,Chunxu Guo,Li Fu,et al. Neural2speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction[C]//The Institute of Electrical and Electronics Engineers Signal Processing Society:Institute of Electrical and Electronics Engineers Inc.,2024:2200-2204.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Jiawei Li]的文章
[Chunxu Guo]的文章
[Li Fu]的文章
百度学术
百度学术中相似的文章
[Jiawei Li]的文章
[Chunxu Guo]的文章
[Li Fu]的文章
必应学术
必应学术中相似的文章
[Jiawei Li]的文章
[Chunxu Guo]的文章
[Li Fu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。