ShanghaiTech University Knowledge Management System
Learning to Shape Rewards Using a Game of Two Partners | |
2023-06-27 | |
会议录名称 | PROCEEDINGS OF THE 37TH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI 2023 |
ISSN | 2159-5399 |
卷号 | 37 |
页码 | 11604-11612 |
发表状态 | 已发表 |
摘要 | Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construction is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that is beneficial to the task thus ensuring efficient convergence to high-performance policies. We demonstrate ROSA’s properties in three didactic experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments. Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. |
会议录编者/会议主办者 | Association for the Advancement of Artificial Intelligence |
关键词 | Domain Knowledge Learning algorithms Learning systems Autonomous learning Domain knowledge Error prones Learn+ Markov games Performance Reinforcement learnings Reward function Shaping algorithm Two agents |
会议名称 | 37th AAAI Conference on Artificial Intelligence, AAAI 2023 |
出版地 | 2275 E BAYSHORE RD, STE 160, PALO ALTO, CA 94303 USA |
会议地点 | Washington, DC, United states |
会议日期 | February 7, 2023 - February 14, 2023 |
URL | 查看原文 |
收录类别 | EI ; CPCI-S |
语种 | 英语 |
资助项目 | UKRI Turing AI World-Leading Researcher Fellowship[EP/W002973/1] |
WOS研究方向 | Computer Science ; History & Philosophy of Science |
WOS类目 | Computer Science, Artificial Intelligence ; Computer Science, Theory & Methods ; History & Philosophy Of Science |
WOS记录号 | WOS:001243749200011 |
出版者 | AAAI Press |
EI入藏号 | 20233414603285 |
EI主题词 | Reinforcement learning |
EISSN | 2374-3468 |
EI分类号 | 723.4 Artificial Intelligence ; 723.4.2 Machine Learning |
原始文献类型 | Conference article (CA) |
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/348715 |
专题 | 信息科学与技术学院_博士生 |
通讯作者 | Mguni, David; Yang, Yaodong |
作者单位 | 1.Huawei R&D 2.University of Manchester, United Kingdom 3.Imperial College London, United Kingdom 4.University of Alberta, Edmonton, Canada 5.Alberta Machine Intelligence Institute, Edmonton, Canada 6.Shanghai Tech University, China 7.University College London, United Kingdom 8.Peking University, Beijing, China |
推荐引用方式 GB/T 7714 | Mguni, David,Jafferjee, Taher,Wang, Jianhong,et al. Learning to Shape Rewards Using a Game of Two Partners[C]//Association for the Advancement of Artificial Intelligence. 2275 E BAYSHORE RD, STE 160, PALO ALTO, CA 94303 USA:AAAI Press,2023:11604-11612. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。