Fine-tuning ChatGPT Achieves State-of-the-Art Performance for Chemical Text Mining
2023-11-16
状态已发表
摘要Extracting knowledge from complex and diverse chemical texts is a pivotal task for both experimental and computational chemists. The task is still considered to be extremely challenging due to the complexity of the chemical language and scientific literature. This study fine-tuned ChatGPT for five intricate chemical text mining tasks: compound entity recognition, reaction role labelling, metal-organic framework (MOF) synthesis information extraction, nuclear magnetic resonance spectroscopy (NMR) data extraction, and the conversion of reaction paragraph to action sequence. The fine-tuned ChatGPT demonstrated impressive performance, significantly reducing the need for repetitive and extensive prompt engineering experiments. It achieved exact accuracy levels ranging from 69% to 95% on these tasks with minimal annotated data. For comparison, we fine-tuned open-source pre-trained large language models (LLMs) such as Llama2, T5, and BART. The results showed that the fine-tuned ChatGPT excelled in all tasks. It even outperformed those task-adaptive pre-training and fine-tuning models that were based on a significantly larger amount of in-domain data. Given its versatility, robustness, and low-code capability, leveraging fine-tuned LLMs as toolkits for automated data acquisition could revolutionize chemical knowledge extraction.
关键词Chemical Text Mining Large Language Models ChatGPT Fine-tune Few-data Knowledge Extraction Cheminformatics synthesis chemical synthesis llama language model MOF NMR reaction role chemical procedure paragraph LLMs structured data
语种英语
DOI10.26434/chemrxiv-2023-k7ct5
相关网址查看原文
出处chemRxiv
收录类别PPRN.PPRN
WOS记录号PPRN:88250314
WOS类目Chemistry, Multidisciplinary
文献类型预印本
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/372967
专题物质科学与技术学院
物质科学与技术学院_博士生
通讯作者Fu, Zunyun; Zheng, Mingyue
作者单位
1.Chinese Acad Sci, Shanghai Inst Mat Med, State Key Lab Drug Res, Drug Discovery & Design Canter, 555 Zuchongzhi Rd, Shanghai 201203, Peoples R China
2.Univ Chinese Acad Sci, 19A Yuquan Rd, Beijing 100049, Peoples R China
3.Nanjing Univ Chinese Med, 138 Xianlin Rd, Nanjing 210023, Peoples R China
4.Zhejiang Univ, Innovat Inst Artificial Intelligence Med Zhejiang Univ, Coll Pharmaceut Sci, Hangzhou 310058, Zhejiang, Peoples R China
5.ShanghaiTech Univ, Sch Phys Sci & Technol, Shanghai 201210, Peoples R China
6.ProtonUnfold Technol Co Ltd, Suzhou, Peoples R China
7.Lingang Lab, Shanghai 200031, Peoples R China
8.Univ South Florida, Taneja Coll Pharm, Dept Pharmaceut Sci, Tampa, FL 33612, USA
9.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
10.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
11.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
12.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China
13.ProtonUnfold Technology Co., Ltd, Suzhou, China
14.Lingang Laboratory, Shanghai 200031, China
15.Department of Pharmaceutical Sciences, Taneja College of Pharmacy, University of South Florida, Tampa, Florida 33612, United States
16.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
17.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China,Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China
推荐引用方式
GB/T 7714
Zhang, Wei,Wang, Qinggong,Kong, Xiangtai,et al. Fine-tuning ChatGPT Achieves State-of-the-Art Performance for Chemical Text Mining. 2023.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Zhang, Wei]的文章
[Wang, Qinggong]的文章
[Kong, Xiangtai]的文章
百度学术
百度学术中相似的文章
[Zhang, Wei]的文章
[Wang, Qinggong]的文章
[Kong, Xiangtai]的文章
必应学术
必应学术中相似的文章
[Zhang, Wei]的文章
[Wang, Qinggong]的文章
[Kong, Xiangtai]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 10.26434@chemrxiv-2023-k7ct5.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。