Knowledge Discovery from Natural Languages: a Linguistic Dataset of 10K Kinship Relations
2023-07-07
会议录名称2023 IEEE 6TH INTERNATIONAL CONFERENCE ON BIG DATA AND ARTIFICIAL INTELLIGENCE (BDAI)
页码79-88
发表状态已发表
DOI10.1109/BDAI59165.2023.10257043
摘要In the expansive realm of knowledge discovery, this study propels forward the subdomain of rule mining with the inception of a singular synthetic dataset - the Kinship 10K Dataset. This dataset, purpose-built for natural language rule mining, derives from the intricate relationship networks across 20 simulated families. These networks include 1,500 unique characters. The development leverages generative techniques, producing a rich array of kinship rules. Each rule is grounded in one of eight foundational Meta kinship relations. The final ensemble, a comprehensive dataset, comprises 10,526 relationship instances, 234 distinct kinship relations, and 104 learnable rules. In addition, we introduce two evaluation metrics - Rule Coverage (RC) and Directed Rule Mining Capability (DRMC) for examining rule mining algorithms in closed domains. RC quantifies the inclusiveness of rule mining datasets, while DRMC delivers nuanced analysis of algorithmic performance in discerning and extracting precise rules, taking accuracy and precision into account. Additionally, we set a benchmark by utilizing the GPT-3.5 and GPT-4 models as baselines. It is noteworthy that the GPT-4 model attained scores of 0.78 and 0.35 on the RC and DRMC metrics respectively. These scores underscore the inherent challenges of the task and signify the merit in pursuing further research to advance this domain. Collectively, this investigation presents a substantial contribution to knowledge discovery. By introducing an innovative dataset, formulating novel evaluation metrics, and instituting a robust baseline model, it not only highlights the prospects for deeper insights and increased automation in the wider field of knowledge discovery but also sets the stage for upcoming advancements in rule mining research. © 2023 IEEE.
会议录编者/会议主办者Yangtze Delta Region Institute of Tsinghua University Zhejiang
关键词Knowledge Discovery Rule Mining Natural language Kinship Dataset
会议名称6th International Conference on Big Data and Artificial Intelligence, BDAI 2023
会议地点Jiaxing, China
会议日期7-9 July 2023
URL查看原文
收录类别EI
语种英语
出版者Institute of Electrical and Electronics Engineers Inc.
EI入藏号20234515019957
EI主题词Data mining
EI分类号723.2 Data Processing and Image Processing
原始文献类型Conference article (CA)
来源库IEEE
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/333541
专题信息科学与技术学院_硕士生
作者单位
1.University of Science and Technology of China, Hefei, China
2.Shanghai Innovation Center for Processor Technologies, ShanghaiTech University, Shanghai, China
推荐引用方式
GB/T 7714
Yue Yangming,Li Chunxiao,Chen YeZeng,et al. Knowledge Discovery from Natural Languages: a Linguistic Dataset of 10K Kinship Relations[C]//Yangtze Delta Region Institute of Tsinghua University Zhejiang:Institute of Electrical and Electronics Engineers Inc.,2023:79-88.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Yue Yangming]的文章
[Li Chunxiao]的文章
[Chen YeZeng]的文章
百度学术
百度学术中相似的文章
[Yue Yangming]的文章
[Li Chunxiao]的文章
[Chen YeZeng]的文章
必应学术
必应学术中相似的文章
[Yue Yangming]的文章
[Li Chunxiao]的文章
[Chen YeZeng]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。