消息
×
loading..
DCIM-AVSR: Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
2025-04-11
会议录名称ICASSP 2025 - 2025 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
ISSN1520-6149
发表状态已发表
DOI10.1109/ICASSP49660.2025.10890272
摘要Speech recognition is the technology that enables machines to interpret and process human speech, converting spoken language into text or commands. This technology is essential for applications such as virtual assistants, transcription services, and communication tools. The Audio-Visual Speech Recognition (AVSR) model enhances traditional speech recognition, particularly in noisy environments, by incorporating visual modalities like lip movements and facial expressions. While traditional AVSR models trained on large-scale datasets with numerous parameters can achieve remarkable accuracy, often surpassing human performance, they also come with high training costs and deployment challenges. To address these issues, we introduce an efficient AVSR model that reduces the number of parameters through the integration of a Dual Conformer Interaction Module (DCIM). In addition, we propose a pre-training method that optimizes model performance by fine-tuning. Unlike conventional models that require the system to independently learn the hierarchical relationship between audio and visual modalities, our approach incorporates this distinction directly into the model architecture. This design enhances both efficiency and performance, resulting in a more practical and effective solution for AVSR tasks.
会议地点Hyderabad, India
会议日期6-11 April 2025
URL查看原文
来源库IEEE
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/496897
专题生物医学工程学院_硕士生
生物医学工程学院_硕士生
作者单位
1.School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
2.Shanghai Clinical Research and Trial Center, Shanghai, China
第一作者单位上海科技大学
第一作者的第一单位上海科技大学
推荐引用方式
GB/T 7714
Xinyu Wang,Haotian Jiang,Haolin Huang,et al. DCIM-AVSR: Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module[C],2025.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Xinyu Wang]的文章
[Haotian Jiang]的文章
[Haolin Huang]的文章
百度学术
百度学术中相似的文章
[Xinyu Wang]的文章
[Haotian Jiang]的文章
[Haolin Huang]的文章
必应学术
必应学术中相似的文章
[Xinyu Wang]的文章
[Haotian Jiang]的文章
[Haolin Huang]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 10.1109@ICASSP49660.2025.10890272.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。