Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review

doi:10.2196/22769

	Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review
	Wang, Leyao 1; Wan, Zhiyu2,3 ; Ni, Congning 1; Song, Qingyuan 1; Li, Yang 1; Clayton, Ellen 2,4,5; Malin, Bradley 1,2,6; Yin, Zhijun 1,2
	2024-11-07
发表期刊	JOURNAL OF MEDICAL INTERNET RESEARCH (IF:5.8[JCR-2023],6.7[5-Year])
ISSN	1438-8871
卷号	26 页码:e22769
发表状态	已发表
DOI	10.2196/22769
摘要	Background:The launch of ChatGPT (OpenAI) in November 2022 attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including health care. Numerous studies have since been conducted regarding how to use state-of-the-art LLMs in health-related scenarios. Objective:This review aims to summarize applications of and concerns regarding conversational LLMs in health care and provide an agenda for future research in this field. Methods:We used PubMed, ACM, and the IEEE digital libraries as primary sources for this review. We followed the guidance of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) to screen and select peer-reviewed research articles that (1) were related to health care applications and conversational LLMs and (2) were published before September 1, 2023, the date when we started paper collection. We investigated these papers and classified them according to their applications and concerns. Results:Our search initially identified 820 papers according to targeted keywords, out of which 65 (7.9%) papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT (60/65, 92% of papers), followed by Bard (Google LLC; 1/65, 2% of papers), LLaMA (Meta; 1/65, 2% of papers), and other LLMs (6/65, 9% papers). These papers were classified into four categories of applications: (1) summarization, (2) medical knowledge inquiry, (3) prediction (eg, diagnosis, treatment recommendation, and drug synergy), and (4) administration (eg, documentation and information collection), and four categories of concerns: (1) reliability (eg, training data quality, accuracy, interpretability, and consistency in responses), (2) bias, (3) privacy, and (4) public acceptability. There were 49 (75%) papers using LLMs for either summarization or medical knowledge inquiry, or both, and there are 58 (89%) papers expressing concerns about either reliability or bias, or both. We found that conversational LLMs exhibited promising results in summarization and providing general medical knowledge to patients with a relatively high accuracy. However, conversational LLMs such as ChatGPT are not always able to provide reliable answers to complex health-related tasks (eg, diagnosis) that require specialized domain expertise. While bias or privacy issues are often noted as concerns, no experiments in our reviewed papers thoughtfully examined how conversational LLMs lead to these issues in health care research. Conclusions:Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications bring bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in health care.
关键词	large language model ChatGPT artificial intelligence natural language processing health care summarization medical knowledge inquiry reliability bias privacy
学科门类	工学 ; 医学
收录类别	SCI ; SCIE ; EI ; 其他 ; SCOPUS
语种	英语
引用统计	正在获取...
文献类型	期刊论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/457910
专题	生物医学工程学院_PI研究组_万之瑜组
共同第一作者	Wang, Leyao; Clayton, Ellen
通讯作者	Yin, Zhijun
作者单位	1.Vanderbilt University, Department of Computer Science, Nashville, TN, United States 2.Vanderbilt University Medical Center, Department of Biomedical Informatics, Nashville, TN, United States 3.ShanghaiTech University, School of Biomedical Engineering, Shanghai, China 4.Vanderbilt University Medical Center, Department of Pediatrics, Nashville, TN, United States 5.Vanderbilt University Medical Center, School of Law, Nashville, TN, United States 6.Vanderbilt University Medical Center, Department of Biostatistics, Nashville, TN, United States
推荐引用方式 GB/T 7714	Wang, Leyao,Wan, Zhiyu,Ni, Congning,et al. Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review[J]. JOURNAL OF MEDICAL INTERNET RESEARCH,2024,26:e22769.
APA	Wang, Leyao.,Wan, Zhiyu.,Ni, Congning.,Song, Qingyuan.,Li, Yang.,...&Yin, Zhijun.(2024).Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.JOURNAL OF MEDICAL INTERNET RESEARCH,26,e22769.
MLA	Wang, Leyao,et al."Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review".JOURNAL OF MEDICAL INTERNET RESEARCH 26(2024):e22769.