Cross-Utterance Conditioned VAE for Speech Generation
2024
发表期刊IEEE/ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (IF:4.1[JCR-2023],4.2[5-Year])
ISSN2329-9290
EISSN2329-9304
卷号32页码:4263-4276
发表状态已发表
DOI10.1109/TASLP.2024.3453598
摘要Speech synthesis systems powered by neural networks hold promise for multimedia production, but frequently face issues with producing expressive speech and seamless editing. In response, we present the Cross-Utterance Conditioned Variational Autoencoder speech synthesis (CUC-VAE S2) framework to enhance prosody and ensure natural speech generation. This framework leverages the powerful representational capabilities of pre-trained language models and the re-expression abilities of variational autoencoders (VAEs). The core component of the CUC-VAE S2 framework is the cross-utterance CVAE, which extracts acoustic, speaker, and textual features from surrounding sentences to generate context-sensitive prosodic features, more accurately emulating human prosody generation. We further propose two practical algorithms tailored for distinct speech synthesis applications: CUC-VAE TTS for text-to-speech and CUC-VAE SE for speech editing. The CUC-VAE TTS is a direct application of the framework, designed to generate audio with contextual prosody derived from surrounding texts. On the other hand, the CUC-VAE SE algorithm leverages real mel spectrogram sampling conditioned on contextual information, producing audio that closely mirrors real sound and thereby facilitating flexible speech editing based on text such as deletion, insertion, and replacement. Experimental results on the LibriTTS datasets demonstrate that our proposed models significantly enhance speech synthesis and editing, producing more natural and expressive speech. © 2014 IEEE.
关键词Context sensitive languages Neural networks Signal encoding Spectrographs Variational techniques Auto encoders Expressive-speech Language model Natural speech Pre-trained language model Speech editing Speech generation Speech synthesis system TTS Variational autoencoder
URL查看原文
收录类别EI
语种英语
出版者Institute of Electrical and Electronics Engineers Inc.
EI入藏号20244217217870
EI主题词Speech enhancement
EI分类号101.1 ; 1101 ; 1106.1.1 ; 1201.2 ; 1301.1.3.1 ; 716.1 Information Theory and Signal Processing ; 741.3 Optical Devices and Systems ; 751.5 Speech
原始文献类型Journal article (JA)
来源库IEEE
文献类型期刊论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/436537
专题信息科学与技术学院_硕士生
创意与艺术学院_PI研究组(P)_田政组
通讯作者Sun, Fanglei
作者单位
1.The University of Manchester, Department of Computer Science, Manchester; M13 9PL, United Kingdom;
2.ShanghaiTech University, School of Creativity and Art, Shanghai; 201210, China;
3.University of Cambridge, Machine Intelligence Lab, Cambridge; CB2 1TN, United Kingdom;
4.Shanghai Jiao Tong University, School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai; 200240, China;
5.Tsinghua University, Department of Electronic Engineering, Beijing; 100190, China;
6.University College London, Department of Speech Hearing and Phonetic Sciences, London; WC1E 6BT, United Kingdom;
7.University College London, Department of Computer Science, London; WC1E 6BT, United Kingdom;
8.The Hong Kong University of Science and Technology (Guangzhou), Thrust of Internet of Things, Guangzhou; 511453, China;
9.University of Shanghai for Science and Technology, Department of Computer Science and Engineering, Shanghai; 200093, China
推荐引用方式
GB/T 7714
Li, Yang,Yu, Cheng,Sun, Guangzhi,et al. Cross-Utterance Conditioned VAE for Speech Generation[J]. IEEE/ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,2024,32:4263-4276.
APA Li, Yang.,Yu, Cheng.,Sun, Guangzhi.,Zu, Weiqin.,Tian, Zheng.,...&Sun, Fanglei.(2024).Cross-Utterance Conditioned VAE for Speech Generation.IEEE/ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING,32,4263-4276.
MLA Li, Yang,et al."Cross-Utterance Conditioned VAE for Speech Generation".IEEE/ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 32(2024):4263-4276.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Li, Yang]的文章
[Yu, Cheng]的文章
[Sun, Guangzhi]的文章
百度学术
百度学术中相似的文章
[Li, Yang]的文章
[Yu, Cheng]的文章
[Sun, Guangzhi]的文章
必应学术
必应学术中相似的文章
[Li, Yang]的文章
[Yu, Cheng]的文章
[Sun, Guangzhi]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 10.1109@TASLP.2024.3453598.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。