ShanghaiTech University Knowledge Management System
Knowledge Discovery from Natural Languages: a Linguistic Dataset of 10K Kinship Relations | |
2023-07-07 | |
会议录名称 | 2023 IEEE 6TH INTERNATIONAL CONFERENCE ON BIG DATA AND ARTIFICIAL INTELLIGENCE (BDAI)
![]() |
页码 | 79-88 |
发表状态 | 已发表 |
DOI | 10.1109/BDAI59165.2023.10257043 |
摘要 | In the expansive realm of knowledge discovery, this study propels forward the subdomain of rule mining with the inception of a singular synthetic dataset - the Kinship 10K Dataset. This dataset, purpose-built for natural language rule mining, derives from the intricate relationship networks across 20 simulated families. These networks include 1,500 unique characters. The development leverages generative techniques, producing a rich array of kinship rules. Each rule is grounded in one of eight foundational Meta kinship relations. The final ensemble, a comprehensive dataset, comprises 10,526 relationship instances, 234 distinct kinship relations, and 104 learnable rules. In addition, we introduce two evaluation metrics - Rule Coverage (RC) and Directed Rule Mining Capability (DRMC) for examining rule mining algorithms in closed domains. RC quantifies the inclusiveness of rule mining datasets, while DRMC delivers nuanced analysis of algorithmic performance in discerning and extracting precise rules, taking accuracy and precision into account. Additionally, we set a benchmark by utilizing the GPT-3.5 and GPT-4 models as baselines. It is noteworthy that the GPT-4 model attained scores of 0.78 and 0.35 on the RC and DRMC metrics respectively. These scores underscore the inherent challenges of the task and signify the merit in pursuing further research to advance this domain. Collectively, this investigation presents a substantial contribution to knowledge discovery. By introducing an innovative dataset, formulating novel evaluation metrics, and instituting a robust baseline model, it not only highlights the prospects for deeper insights and increased automation in the wider field of knowledge discovery but also sets the stage for upcoming advancements in rule mining research. © 2023 IEEE. |
会议录编者/会议主办者 | Yangtze Delta Region Institute of Tsinghua University Zhejiang |
关键词 | Knowledge Discovery Rule Mining Natural language Kinship Dataset |
会议名称 | 6th International Conference on Big Data and Artificial Intelligence, BDAI 2023 |
会议地点 | Jiaxing, China |
会议日期 | 7-9 July 2023 |
URL | 查看原文 |
收录类别 | EI |
语种 | 英语 |
出版者 | Institute of Electrical and Electronics Engineers Inc. |
EI入藏号 | 20234515019957 |
EI主题词 | Data mining |
EI分类号 | 723.2 Data Processing and Image Processing |
原始文献类型 | Conference article (CA) |
来源库 | IEEE |
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/333541 |
专题 | 信息科学与技术学院_硕士生 |
作者单位 | 1.University of Science and Technology of China, Hefei, China 2.Shanghai Innovation Center for Processor Technologies, ShanghaiTech University, Shanghai, China |
推荐引用方式 GB/T 7714 | Yue Yangming,Li Chunxiao,Chen YeZeng,et al. Knowledge Discovery from Natural Languages: a Linguistic Dataset of 10K Kinship Relations[C]//Yangtze Delta Region Institute of Tsinghua University Zhejiang:Institute of Electrical and Electronics Engineers Inc.,2023:79-88. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。