ShanghaiTech University Knowledge Management System
Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation | |
2024-07-26 | |
状态 | 已发表 |
摘要 | Named entity recognition (NER) models often struggle with noisy inputs, such as those with spelling mistakes or errors generated by Optical Character Recognition processes, and learning a robust NER model is challenging. Existing robust NER models utilize both noisy text and its corresponding gold text for training, which is infeasible in many real-world applications in which gold text is not available. In this paper, we consider a more realistic setting in which only noisy text and its NER labels are available. We propose to retrieve relevant text of the noisy text from a knowledge corpus and use it to enhance the representation of the original noisy input. We design three retrieval methods: sparse retrieval based on lexicon similarity, dense retrieval based on semantic similarity, and self-retrieval based on task-specific text. After retrieving relevant text, we concatenate the retrieved text with the original noisy text and encode them with a transformer network, utilizing self-attention to enhance the contextual token representations of the noisy text using the retrieved text. We further employ a multi-view training framework that improves robust NER without retrieving text during inference. Experiments show that our retrieval-augmented model achieves significant improvements in various noisy NER settings. |
关键词 | named entity recognition robust learning from noisy data retrieval augmentation |
DOI | arXiv:2407.18562 |
相关网址 | 查看原文 |
出处 | Arxiv |
WOS记录号 | PPRN:91118659 |
WOS类目 | Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications |
文献类型 | 预印本 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/408362 |
专题 | 信息科学与技术学院 信息科学与技术学院_PI研究组_屠可伟组 信息科学与技术学院_硕士生 |
通讯作者 | Jiang, Yong |
作者单位 | 1.ShanghaiTech Univ, Sch Informat Sci & Technol, 393 Huaxia Middle Rd, Shanghai 201210, Peoples R China 2.Alibaba Grp, 969,Wen Yi West Rd, Hangzhou 311121, Peoples R China |
推荐引用方式 GB/T 7714 | Ai, Chaoyi,Jiang, Yong,Huang, Shen,et al. Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation. 2024. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。