Cost-Effective Label-free Node Classification with LLMs

doi:arXiv:2412.11983

	Cost-Effective Label-free Node Classification with LLMs
其他题名	Leveraging Large Language Models for Effective Label-free Node Classification in Text-Attributed Graphs
	Zhang, Taiyan1,2,3 ; Yang, Renchi 3; Yan, Mingyu 4,5; Ye, Xiaochun 4,5; Fan, Dongrui 4,5; Lai, Yurui 3
	2025-04-04
会议录名称	SIGIR 2025
发表状态	已发表
DOI	arXiv:2412.11983
摘要	Graph neural networks (GNNs) have emerged as go-to models for node classification in graph data due to their powerful abilities in fusing graph structures and attributes. However, such models strongly rely on adequate high-quality labeled data for training, which are expensive to acquire in practice. With the advent of large language models (LLMs), a promising way is to leverage their superb zero-shot capabilities and massive knowledge for node labeling. Despite promising results reported, this methodology either demands considerable queries to LLMs, or suffers from compromised performance caused by noisy labels produced by LLMs. To remedy these issues, this work presents Cella, an active self-training framework that integrates LLMs into GNNs in a costeffective manner. The design recipe of Cella is to iteratively identify small sets of “critical” samples using GNNs and extract informative pseudo-labels for them with both LLMs and GNNs as additional supervision signals to enhance model training. Particularly, Cella includes three major components: (i) an effective active node selection strategy for initial annotations; (ii) a judicious sample selection scheme to sift out the “critical” nodes based on label disharmonicity and entropy; and (iii) a label refinement module combining LLMs and GNNs with rewired topology. Our extensive experiments over five benchmark text-attributed graph datasets demonstrate that Cella significantly outperforms the state of the arts under the same query budget to LLMs in terms of label-free node classification. In particular, on the DBLP dataset with 14.3k nodes, Cella is able to achieve a 8.08% conspicuous improvement in accuracy over the state-of-the-art at a cost of less than one cent.
会议举办国	意大利
关键词	Graph Neural Network Label-free Node Large Language Models Classification
收录类别	SCI
语种	英语
WOS记录号	PPRN:119967190
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/500303
专题	信息科学与技术学院_硕士生
共同第一作者	Lai, Yurui
通讯作者	Yan, Mingyu
作者单位	1.ShanghaiTech Univ, Shanghai, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China 3.Hong Kong Baptist Univ, Hong Kong, Peoples R China 4.Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing, Peoples R China 5.Univ Chinese Acad Sci, Beijing, Peoples R China
第一作者单位	上海科技大学
第一作者的第一单位	上海科技大学
推荐引用方式 GB/T 7714	Zhang, Taiyan,Yang, Renchi,Yan, Mingyu,et al. Cost-Effective Label-free Node Classification with LLMs[C],2025.