Cost-Effective Label-free Node Classification with LLMs
其他题名Leveraging Large Language Models for Effective Label-free Node Classification in Text-Attributed Graphs
2025-04-04
会议录名称SIGIR 2025
发表状态已发表
DOIarXiv:2412.11983
摘要

Graph neural networks (GNNs) have emerged as go-to models for node classification in graph data due to their powerful abilities in fusing graph structures and attributes. However, such models strongly rely on adequate high-quality labeled data for training, which are expensive to acquire in practice. With the advent of large language models (LLMs), a promising way is to leverage their superb zero-shot capabilities and massive knowledge for node labeling. Despite promising results reported, this methodology either demands considerable queries to LLMs, or suffers from compromised performance caused by noisy labels produced by LLMs. To remedy these issues, this work presents Cella, an active self-training framework that integrates LLMs into GNNs in a costeffective manner. The design recipe of Cella is to iteratively identify small sets of “critical” samples using GNNs and extract informative pseudo-labels for them with both LLMs and GNNs as additional supervision signals to enhance model training. Particularly, Cella includes three major components: (i) an effective active node selection strategy for initial annotations; (ii) a judicious sample selection scheme to sift out the “critical” nodes based on label disharmonicity and entropy; and (iii) a label refinement module combining LLMs and GNNs with rewired topology. Our extensive experiments over five benchmark text-attributed graph datasets demonstrate that Cella significantly outperforms the state of the arts under the same query budget to LLMs in terms of label-free node classification. In particular, on the DBLP dataset with 14.3k nodes, Cella is able to achieve a 8.08% conspicuous improvement in accuracy over the state-of-the-art at a cost of less than one cent.

会议举办国意大利
关键词Graph Neural Network Label-free Node  Large Language Models Classification
收录类别SCI
语种英语
WOS记录号PPRN:119967190
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/500303
专题信息科学与技术学院_硕士生
共同第一作者Lai, Yurui
通讯作者Yan, Mingyu
作者单位
1.ShanghaiTech Univ, Shanghai, Peoples R China
2.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
3.Hong Kong Baptist Univ, Hong Kong, Peoples R China
4.Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing, Peoples R China
5.Univ Chinese Acad Sci, Beijing, Peoples R China
第一作者单位上海科技大学
第一作者的第一单位上海科技大学
推荐引用方式
GB/T 7714
Zhang, Taiyan,Yang, Renchi,Yan, Mingyu,et al. Cost-Effective Label-free Node Classification with LLMs[C],2025.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Zhang, Taiyan]的文章
[Yang, Renchi]的文章
[Yan, Mingyu]的文章
百度学术
百度学术中相似的文章
[Zhang, Taiyan]的文章
[Yang, Renchi]的文章
[Yan, Mingyu]的文章
必应学术
必应学术中相似的文章
[Zhang, Taiyan]的文章
[Yang, Renchi]的文章
[Yan, Mingyu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。