ShanghaiTech University Knowledge Management System
Cost-Effective Label-free Node Classification with LLMs | |
其他题名 | Leveraging Large Language Models for Effective Label-free Node Classification in Text-Attributed Graphs |
2025-04-04 | |
会议录名称 | SIGIR 2025 |
发表状态 | 已发表 |
DOI | arXiv:2412.11983 |
摘要 | Graph neural networks (GNNs) have emerged as go-to models for node classification in graph data due to their powerful abilities in fusing graph structures and attributes. However, such models strongly rely on adequate high-quality labeled data for training, which are expensive to acquire in practice. With the advent of large language models (LLMs), a promising way is to leverage their superb zero-shot capabilities and massive knowledge for node labeling. Despite promising results reported, this methodology either demands considerable queries to LLMs, or suffers from compromised performance caused by noisy labels produced by LLMs. To remedy these issues, this work presents Cella, an active self-training framework that integrates LLMs into GNNs in a costeffective manner. The design recipe of Cella is to iteratively identify small sets of “critical” samples using GNNs and extract informative pseudo-labels for them with both LLMs and GNNs as additional supervision signals to enhance model training. Particularly, Cella includes three major components: (i) an effective active node selection strategy for initial annotations; (ii) a judicious sample selection scheme to sift out the “critical” nodes based on label disharmonicity and entropy; and (iii) a label refinement module combining LLMs and GNNs with rewired topology. Our extensive experiments over five benchmark text-attributed graph datasets demonstrate that Cella significantly outperforms the state of the arts under the same query budget to LLMs in terms of label-free node classification. In particular, on the DBLP dataset with 14.3k nodes, Cella is able to achieve a 8.08% conspicuous improvement in accuracy over the state-of-the-art at a cost of less than one cent. |
会议举办国 | 意大利 |
关键词 | Graph Neural Network Label-free Node Large Language Models Classification |
收录类别 | SCI |
语种 | 英语 |
WOS记录号 | PPRN:119967190 |
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/500303 |
专题 | 信息科学与技术学院_硕士生 |
共同第一作者 | Lai, Yurui |
通讯作者 | Yan, Mingyu |
作者单位 | 1.ShanghaiTech Univ, Shanghai, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China 3.Hong Kong Baptist Univ, Hong Kong, Peoples R China 4.Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing, Peoples R China 5.Univ Chinese Acad Sci, Beijing, Peoples R China |
第一作者单位 | 上海科技大学 |
第一作者的第一单位 | 上海科技大学 |
推荐引用方式 GB/T 7714 | Zhang, Taiyan,Yang, Renchi,Yan, Mingyu,et al. Cost-Effective Label-free Node Classification with LLMs[C],2025. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。