Learning to Shape Rewards Using a Game of Two Partners

	Learning to Shape Rewards Using a Game of Two Partners
	Mguni, David 1; Jafferjee, Taher 1; Wang, Jianhong 2; Perez-Nieves, Nicolas 3; Song, Wenbin6 ; Tong, Feifei 1; Taylor, Matthew E.4,5; Yang, Tianpei 4,5; Dai, Zipeng 1; Chen, Hui 7; Zhu, Jiangcheng 1; Shao, Kun 1; Wang, Jun 7; Yang, Yaodong 8
	2023-06-27
会议录名称	PROCEEDINGS OF THE 37TH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI 2023
ISSN	2159-5399
卷号	37
页码	11604-11612
发表状态	已发表
摘要	Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construction is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that is beneficial to the task thus ensuring efficient convergence to high-performance policies. We demonstrate ROSA’s properties in three didactic experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments. Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
会议录编者/会议主办者	Association for the Advancement of Artificial Intelligence
关键词	Domain Knowledge Learning algorithms Learning systems Autonomous learning Domain knowledge Error prones Learn+ Markov games Performance Reinforcement learnings Reward function Shaping algorithm Two agents
会议名称	37th AAAI Conference on Artificial Intelligence, AAAI 2023
出版地	2275 E BAYSHORE RD, STE 160, PALO ALTO, CA 94303 USA
会议地点	Washington, DC, United states
会议日期	February 7, 2023 - February 14, 2023
URL	查看原文
收录类别	EI ; CPCI-S
语种	英语
资助项目	UKRI Turing AI World-Leading Researcher Fellowship[EP/W002973/1]
WOS研究方向	Computer Science ; History & Philosophy of Science
WOS类目	Computer Science, Artificial Intelligence ; Computer Science, Theory & Methods ; History & Philosophy Of Science
WOS记录号	WOS:001243749200011
出版者	AAAI Press
EI入藏号	20233414603285
EI主题词	Reinforcement learning
EISSN	2374-3468
EI分类号	723.4 Artificial Intelligence ; 723.4.2 Machine Learning
原始文献类型	Conference article (CA)
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/348715
专题	信息科学与技术学院_博士生
通讯作者	Mguni, David; Yang, Yaodong
作者单位	1.Huawei R&D 2.University of Manchester, United Kingdom 3.Imperial College London, United Kingdom 4.University of Alberta, Edmonton, Canada 5.Alberta Machine Intelligence Institute, Edmonton, Canada 6.Shanghai Tech University, China 7.University College London, United Kingdom 8.Peking University, Beijing, China
推荐引用方式 GB/T 7714	Mguni, David,Jafferjee, Taher,Wang, Jianhong,et al. Learning to Shape Rewards Using a Game of Two Partners[C]//Association for the Advancement of Artificial Intelligence. 2275 E BAYSHORE RD, STE 160, PALO ALTO, CA 94303 USA:AAAI Press,2023:11604-11612.