AugGPT: Leveraging ChatGPT for Text Data Augmentation
2025
发表期刊IEEE TRANSACTIONS ON BIG DATA (IF:7.5[JCR-2023],5.8[5-Year])
ISSN2372-2096
EISSN2332-7790
卷号PP期号:99
发表状态已发表
DOI10.1109/TBDATA.2025.3536934
摘要Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning (FSL) scenario, where the data in the target domain is generally much scarcer and of lowered quality. A natural and widely used strategy to mitigate such challenges is to perform data augmentation to better capture data invariance and increase the sample size. However, current text data augmentation methods either can't ensure the correct labeling of the generated data (lacking faithfulness), or can't ensure sufficient diversity in the generated data (lacking compactness), or both. Inspired by the recent success of large language models (LLM), especially the development of ChatGPT, we propose a text data augmentation approach based on ChatGPT (named"AugGPT"). AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. The augmented samples can then be used in downstream model training. Experiment results on multiple few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach over state-of-the-art text data augmentation methods in terms of testing accuracy and distribution of the augmented samples. Codes of AugGPT are available at https://github.com/yhydhx/AugGPT. © 2015 IEEE.
关键词Contrastive Learning - Data assimilation - Spatio-temporal data Augmentation methods - Data augmentation - Few-shot learning - Language model - Language processing - Large language model - Natural language processing - Natural languages - Sample sizes - Text data
URL查看原文
收录类别EI
语种英语
出版者Institute of Electrical and Electronics Engineers Inc.
EI入藏号20250617837306
EI主题词Zero-shot learning
EI分类号1101.2 Machine Learning - 1106.2 Data Handling and Data Processing - 1106.4 Database Systems
原始文献类型Article in Press
来源库IEEE
文献类型期刊论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/497024
专题生物医学工程学院_PI研究组_沈定刚组
通讯作者Dai, Haixing
作者单位
1.University of Georgia, School of Computing, Athens; GA, United States;
2.South China University of Technology, School of Computer Science and Engineering, China;
3.Lehigh University, Department of Computer Science and Engineering, Bethlehem; PA, United States;
4.Carnegie Mellon University, Heinz College of Information Systems and Public Policy, Pittsburgh; PA, United States;
5.Massachusetts General Hospital, Harvard Medical School, Department of Radiology, Boston; MA, United States;
6.Mayo Clinic, Department of Radiation Oncology, Phoenix; AZ, United States;
7.University of Virginia, School of Data Science, Charlottesville; VA, United States;
8.The University of Texas at Arlington, Department of Computer Science and Engineering, Arlington; TX, United States;
9.ShanghaiTech University, School of Biomedical Engineering, Shanghai; 201210, China;
10.Shanghai United Imaging Intelligence Co., Ltd., Shanghai; 200230, China;
11.Shanghai Clinical Research and Trial Center, Shanghai; 201210, China
推荐引用方式
GB/T 7714
Dai, Haixing,Liu, Zhengliang,Liao, Wenxiong,et al. AugGPT: Leveraging ChatGPT for Text Data Augmentation[J]. IEEE TRANSACTIONS ON BIG DATA,2025,PP(99).
APA Dai, Haixing.,Liu, Zhengliang.,Liao, Wenxiong.,Huang, Xiaoke.,Cao, Yihan.,...&Li, Xiang.(2025).AugGPT: Leveraging ChatGPT for Text Data Augmentation.IEEE TRANSACTIONS ON BIG DATA,PP(99).
MLA Dai, Haixing,et al."AugGPT: Leveraging ChatGPT for Text Data Augmentation".IEEE TRANSACTIONS ON BIG DATA PP.99(2025).
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Dai, Haixing]的文章
[Liu, Zhengliang]的文章
[Liao, Wenxiong]的文章
百度学术
百度学术中相似的文章
[Dai, Haixing]的文章
[Liu, Zhengliang]的文章
[Liao, Wenxiong]的文章
必应学术
必应学术中相似的文章
[Dai, Haixing]的文章
[Liu, Zhengliang]的文章
[Liao, Wenxiong]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 10.1109@TBDATA.2025.3536934.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。