ShanghaiTech University Knowledge Management System
Fine-tuning ChatGPT Achieves State-of-the-Art Performance for Chemical Text Mining | |
2023-11-16 | |
状态 | 已发表 |
摘要 | Extracting knowledge from complex and diverse chemical texts is a pivotal task for both experimental and computational chemists. The task is still considered to be extremely challenging due to the complexity of the chemical language and scientific literature. This study fine-tuned ChatGPT for five intricate chemical text mining tasks: compound entity recognition, reaction role labelling, metal-organic framework (MOF) synthesis information extraction, nuclear magnetic resonance spectroscopy (NMR) data extraction, and the conversion of reaction paragraph to action sequence. The fine-tuned ChatGPT demonstrated impressive performance, significantly reducing the need for repetitive and extensive prompt engineering experiments. It achieved exact accuracy levels ranging from 69% to 95% on these tasks with minimal annotated data. For comparison, we fine-tuned open-source pre-trained large language models (LLMs) such as Llama2, T5, and BART. The results showed that the fine-tuned ChatGPT excelled in all tasks. It even outperformed those task-adaptive pre-training and fine-tuning models that were based on a significantly larger amount of in-domain data. Given its versatility, robustness, and low-code capability, leveraging fine-tuned LLMs as toolkits for automated data acquisition could revolutionize chemical knowledge extraction. |
关键词 | Chemical Text Mining Large Language Models ChatGPT Fine-tune Few-data Knowledge Extraction Cheminformatics synthesis chemical synthesis llama language model MOF NMR reaction role chemical procedure paragraph LLMs structured data |
语种 | 英语 |
DOI | 10.26434/chemrxiv-2023-k7ct5 |
相关网址 | 查看原文 |
出处 | chemRxiv |
收录类别 | PPRN.PPRN |
WOS记录号 | PPRN:88250314 |
WOS类目 | Chemistry, Multidisciplinary |
文献类型 | 预印本 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/372967 |
专题 | 物质科学与技术学院 物质科学与技术学院_博士生 |
通讯作者 | Fu, Zunyun; Zheng, Mingyue |
作者单位 | 1.Chinese Acad Sci, Shanghai Inst Mat Med, State Key Lab Drug Res, Drug Discovery & Design Canter, 555 Zuchongzhi Rd, Shanghai 201203, Peoples R China 2.Univ Chinese Acad Sci, 19A Yuquan Rd, Beijing 100049, Peoples R China 3.Nanjing Univ Chinese Med, 138 Xianlin Rd, Nanjing 210023, Peoples R China 4.Zhejiang Univ, Innovat Inst Artificial Intelligence Med Zhejiang Univ, Coll Pharmaceut Sci, Hangzhou 310058, Zhejiang, Peoples R China 5.ShanghaiTech Univ, Sch Phys Sci & Technol, Shanghai 201210, Peoples R China 6.ProtonUnfold Technol Co Ltd, Suzhou, Peoples R China 7.Lingang Lab, Shanghai 200031, Peoples R China 8.Univ South Florida, Taneja Coll Pharm, Dept Pharmaceut Sci, Tampa, FL 33612, USA 9.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China 10.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China 11.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China 12.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China 13.ProtonUnfold Technology Co., Ltd, Suzhou, China 14.Lingang Laboratory, Shanghai 200031, China 15.Department of Pharmaceutical Sciences, Taneja College of Pharmacy, University of South Florida, Tampa, Florida 33612, United States 16.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China 17.Drug Discovery and Design Canter, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China,University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China,Nanjing University of Chinese Medicine, 138 Xianlin Road, Nanjing 210023, China |
推荐引用方式 GB/T 7714 | Zhang, Wei,Wang, Qinggong,Kong, Xiangtai,et al. Fine-tuning ChatGPT Achieves State-of-the-Art Performance for Chemical Text Mining. 2023. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。