Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation
2024-07-26
状态已发表
摘要

Named entity recognition (NER) models often struggle with noisy inputs, such as those with spelling mistakes or errors generated by Optical Character Recognition processes, and learning a robust NER model is challenging. Existing robust NER models utilize both noisy text and its corresponding gold text for training, which is infeasible in many real-world applications in which gold text is not available. In this paper, we consider a more realistic setting in which only noisy text and its NER labels are available. We propose to retrieve relevant text of the noisy text from a knowledge corpus and use it to enhance the representation of the original noisy input. We design three retrieval methods: sparse retrieval based on lexicon similarity, dense retrieval based on semantic similarity, and self-retrieval based on task-specific text. After retrieving relevant text, we concatenate the retrieved text with the original noisy text and encode them with a transformer network, utilizing self-attention to enhance the contextual token representations of the noisy text using the retrieved text. We further employ a multi-view training framework that improves robust NER without retrieving text during inference. Experiments show that our retrieval-augmented model achieves significant improvements in various noisy NER settings.

关键词named entity recognition robust learning from noisy data retrieval augmentation
DOIarXiv:2407.18562
相关网址查看原文
出处Arxiv
WOS记录号PPRN:91118659
WOS类目Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications
文献类型预印本
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/408362
专题信息科学与技术学院
信息科学与技术学院_PI研究组_屠可伟组
信息科学与技术学院_硕士生
通讯作者Jiang, Yong
作者单位
1.ShanghaiTech Univ, Sch Informat Sci & Technol, 393 Huaxia Middle Rd, Shanghai 201210, Peoples R China
2.Alibaba Grp, 969,Wen Yi West Rd, Hangzhou 311121, Peoples R China
推荐引用方式
GB/T 7714
Ai, Chaoyi,Jiang, Yong,Huang, Shen,et al. Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation. 2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Ai, Chaoyi]的文章
[Jiang, Yong]的文章
[Huang, Shen]的文章
百度学术
百度学术中相似的文章
[Ai, Chaoyi]的文章
[Jiang, Yong]的文章
[Huang, Shen]的文章
必应学术
必应学术中相似的文章
[Ai, Chaoyi]的文章
[Jiang, Yong]的文章
[Huang, Shen]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。