Learning to Shape Rewards Using a Game of Two Partners
2023-06-27
会议录名称PROCEEDINGS OF THE 37TH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI 2023
ISSN2159-5399
卷号37
页码11604-11612
发表状态已发表
摘要

Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construction is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that is beneficial to the task thus ensuring efficient convergence to high-performance policies. We demonstrate ROSA’s properties in three didactic experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments. Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

会议录编者/会议主办者Association for the Advancement of Artificial Intelligence
关键词Domain Knowledge Learning algorithms Learning systems Autonomous learning Domain knowledge Error prones Learn+ Markov games Performance Reinforcement learnings Reward function Shaping algorithm Two agents
会议名称37th AAAI Conference on Artificial Intelligence, AAAI 2023
出版地2275 E BAYSHORE RD, STE 160, PALO ALTO, CA 94303 USA
会议地点Washington, DC, United states
会议日期February 7, 2023 - February 14, 2023
URL查看原文
收录类别EI ; CPCI-S
语种英语
资助项目UKRI Turing AI World-Leading Researcher Fellowship[EP/W002973/1]
WOS研究方向Computer Science ; History & Philosophy of Science
WOS类目Computer Science, Artificial Intelligence ; Computer Science, Theory & Methods ; History & Philosophy Of Science
WOS记录号WOS:001243749200011
出版者AAAI Press
EI入藏号20233414603285
EI主题词Reinforcement learning
EISSN2374-3468
EI分类号723.4 Artificial Intelligence ; 723.4.2 Machine Learning
原始文献类型Conference article (CA)
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/348715
专题信息科学与技术学院_博士生
通讯作者Mguni, David; Yang, Yaodong
作者单位
1.Huawei R&D
2.University of Manchester, United Kingdom
3.Imperial College London, United Kingdom
4.University of Alberta, Edmonton, Canada
5.Alberta Machine Intelligence Institute, Edmonton, Canada
6.Shanghai Tech University, China
7.University College London, United Kingdom
8.Peking University, Beijing, China
推荐引用方式
GB/T 7714
Mguni, David,Jafferjee, Taher,Wang, Jianhong,et al. Learning to Shape Rewards Using a Game of Two Partners[C]//Association for the Advancement of Artificial Intelligence. 2275 E BAYSHORE RD, STE 160, PALO ALTO, CA 94303 USA:AAAI Press,2023:11604-11612.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Mguni, David]的文章
[Jafferjee, Taher]的文章
[Wang, Jianhong]的文章
百度学术
百度学术中相似的文章
[Mguni, David]的文章
[Jafferjee, Taher]的文章
[Wang, Jianhong]的文章
必应学术
必应学术中相似的文章
[Mguni, David]的文章
[Jafferjee, Taher]的文章
[Wang, Jianhong]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。