No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT
2024
发表期刊IEEE TRANSACTIONS ON SOFTWARE ENGINEERING (IF:6.5[JCR-2023],7.0[5-Year])
ISSN0098-5589
EISSN1939-3520
卷号PP期号:99页码:1-35
发表状态已发表
DOI10.1109/TSE.2024.3392499
摘要

Large language models (LLMs) have demonstrated impressive capabilities across various natural language processing (NLP) tasks, such as machine translation, question answering, summarization, and so on. Additionally, LLMs are also highly valuable in supporting software engineering tasks, particularly in the field of code generation. Automatic code generation is a process of automatically generating source code or executable code based on given specifications or requirements, improving developer productivity. In this study, we perform a systematic empirical assessment to the quality of code generation using ChatGPT, a recent state-of-the-art product LLM. We leverage 728 algorithm problems in five languages (i.e., C, C++, Java, Python, and JavaScript) and 18 CWEs with 54 code scenarios for the code generation task. Our evaluation encompasses a comprehensive analysis of code snippets generated by ChatGPT, focusing on three critical aspects: correctness, complexity, and security. We also specifically investigate ChatGPT’s ability to engage in multi-round fixing process (i.e., ChatGPT’s dialog ability, chatting between users and ChatGPT for fixing generated buggy code) of facilitating code generation. By delving into the generated code and examining the experimental results, this work provides valuable insights into the performance of ChatGPT in tackling code generation tasks over the three critical aspects. The experimental results demonstrate that (1) ChatGPT is better at generating functionally correct code for problems before 2021 in different languages than problems after 2021 with 48.14% advantage in Accepted rate on judgment platform, but ChatGPT’s ability to directly fix erroneous code with multi-round fixing process to achieve correct functionality is relatively weak; (2) the distribution of cyclomatic and cognitive complexity levels for code snippets in different languages varies. Furthermore, the multi-round fixing process with ChatGPT generally preserves or increases the complexity levels of code snippets; (3) in algorithm scenarios with languages of C, C++, and Jave, and CWE scenarios with languages of C and Python3, the code generated by ChatGPT has relevant vulnerabilities. However, the multi-round fixing process for vulnerable code snippets demonstrates promising results, with more than 89% of vulnerabilities successfully addressed; and (4) code generation may be affected by ChatGPT’s non-determinism factor, resulting in variations of code snippets in functional correctness, complexity, and security. Overall, our findings uncover potential issues and limitations that arise in the ChatGPT-based code generation and lay the groundwork for improving AI and LLM-based code generation techniques. IEEE

关键词Automatic programming C (programming language) Codes (symbols) Computational linguistics Electronic mail Job analysis Natural language processing systems Quality control Software engineering Xmlns:xlink=' Xmlns:xsi=' Large Language Model ChatGPT Code Generation
URL查看原文
收录类别EI
语种英语
WOS类目Computer Science, Software Engineering
WOS记录号PPRN:74901681
出版者Institute of Electrical and Electronics Engineers Inc.
EI入藏号20241815995080
EI主题词Python
EI分类号721.1 Computer Theory, Includes Formal Logic, Automata Theory, Switching Theory, Programming Theory ; 723.1 Computer Programming ; 723.1.1 Computer Programming Languages ; 723.2 Data Processing and Image Processing ; 913.3 Quality Assurance and Control
原始文献类型Article in Press
来源库IEEE
引用统计
正在获取...
文献类型期刊论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/370118
专题信息科学与技术学院_硕士生
信息科学与技术学院_PI研究组_张良峰组
通讯作者Tang, Yutian
作者单位
1.ShanghaiTech University, Shanghai, China
2.University of Glasgow, United Kingdom
3.Department of Computing, Hong Kong Polytechnic University, Hong Kong SAR, China
4.Nanjing University, China
第一作者单位上海科技大学
第一作者的第一单位上海科技大学
推荐引用方式
GB/T 7714
Liu, Zhijie,Tang, Yutian,Luo, Xiapu,et al. No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,2024,PP(99):1-35.
APA Liu, Zhijie,Tang, Yutian,Luo, Xiapu,Zhou, Yuming,&Zhang, Liang Feng.(2024).No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT.IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,PP(99),1-35.
MLA Liu, Zhijie,et al."No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT".IEEE TRANSACTIONS ON SOFTWARE ENGINEERING PP.99(2024):1-35.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Liu, Zhijie]的文章
[Tang, Yutian]的文章
[Luo, Xiapu]的文章
百度学术
百度学术中相似的文章
[Liu, Zhijie]的文章
[Tang, Yutian]的文章
[Luo, Xiapu]的文章
必应学术
必应学术中相似的文章
[Liu, Zhijie]的文章
[Tang, Yutian]的文章
[Luo, Xiapu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。