An LLM-based Readability Measurement for Unit Tests' Context-aware Inputs

doi:arXiv:2407.21369

	An LLM-based Readability Measurement for Unit Tests' Context-aware Inputs
	Zhou, Zhichao1 ; Tang, Yutian 3; Lin, Yun 2; He, Jingzhu1
	2024-07-31
状态	已发表
摘要	Automated test techniques usually generate unit tests with higher code coverage than manual tests. However, the readability of automated tests is crucial for code comprehension and maintenance. The readability of unit tests involves many aspects. In this paper, we focus on test inputs. The central limitation of existing studies on input readability is that they focus on test codes alone without taking the tested source codes into consideration, making them either ignore different source codes’ different readability requirements or require manual efforts to write readable inputs. However, we observe that the source codes specify the contexts that test inputs must satisfy. Based on such observation, we introduce the Context Consistency Criterion (a.k.a, C3), which is a readability measurement tool that leverages Large Language Models to extract primitive-type (including string-type) parameters’ readability contexts from the source codes and checks whether test inputs are consistent with those contexts. We have also proposed EvoSuiteC3. It leverages C3’s extracted contexts to help EvoSuite generate readable test inputs. We have evaluated C3’s performance on 409 JAVA classes and compared manual and automated tests’ readability under C3 measurement. The results are two-fold. First, The Precision, Recall, and F1-Score of C3’s mined readability contexts are 84.4%, 83%, and 83.7%, respectively. Second, under C3’s measurement, the string-type input readability scores of EvoSuiteC3, ChatUniTest (an LLM-based test generation tool), manual tests, and two traditional tools (EvoSuite and Randoop) are 90%, 83%, 68%, 8%, and 8%, showing the traditional tools’ inability in generating readable string-type inputs. We have conducted a survey based on the questionnaires collected from 30 programmers with varied backgrounds. The results reveal that when C3 identifies readable differences between tests, programmers tend to give similar opinions of the test’s readability of C3.
关键词	readability test generation large language models
DOI	arXiv:2407.21369
相关网址	查看原文
出处	Arxiv
WOS记录号	PPRN:91174928
WOS类目	Computer Science, Software Engineering
文献类型	预印本
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/408355
专题	信息科学与技术学院信息科学与技术学院_硕士生信息科学与技术学院_博士生信息科学与技术学院_PI研究组_何静竹组
通讯作者	He, Jingzhu
作者单位	1.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China 2.Shanghai Jiao Tong Univ, Shanghai, Peoples R China 3.Univ Glasgow, Glasgow, Scotland
推荐引用方式 GB/T 7714	Zhou, Zhichao,Tang, Yutian,Lin, Yun,et al. An LLM-based Readability Measurement for Unit Tests' Context-aware Inputs. 2024.