Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals

doi:10.1109/ROBIO49542.2019.8961549

	Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals
	Zhang, Yizheng; Rosendo, Andre
	2019-12
会议录名称	2019 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO)
页码	1418-1423
发表状态	已发表
DOI	10.1109/ROBIO49542.2019.8961549
摘要	Deep Reinforcement Learning (DRL) has shown its promising capabilities to learn optimal policies directly from trial and error. However, learning can be hindered if the goal of the learning, defined by the reward function, is "not optimal". We demonstrate that by setting the goal/target of competition in a counter-intuitive but intelligent way, instead of heuristically trying solutions through many hours, the DRL simulation can quickly converge into a winning strategy. The ICRA-DJI RoboMaster AI Challenge is a game of cooperation and competition between robots in a partially observable environment, quite similar to the Counter-Strike game. Unlike the traditional approach to games, where the reward is given at winning the match or hitting the enemy, our DRL algorithm rewards our robots when in a geometric-strategic advantage, which implicitly increases the winning chances. Furthermore, we use Deep Q Learning (DQL) to generate multi-agent paths for moving, which improves the cooperation between two robots by avoiding collision. Finally, we implement a variant A* algorithm with the same implicit geometric goal as DQL and compare results. We conclude that a well-set goal can put in question the need for learning algorithms, with geometric-based searches outperforming DQL in many orders of magnitude.
会议地点	Dali, China
会议日期	6-8 Dec. 2019
URL	查看原文
收录类别	EI
资助项目	[0830000081] ; National Natural Science Foundation of China[61850410527]
出版者	Institute of Electrical and Electronics Engineers Inc.
EI入藏号	20200608146690
EI主题词	Biomimetics ; Deep learning ; Geometry ; Learning algorithms ; Machine learning ; Motion planning ; Multi agent systems ; Optimization ; Robotics ; Robots
EI分类号	Biotechnology:461.8 ; Artificial Intelligence:723.4 ; Robotics:731.5 ; Mathematics:921 ; Optimization Techniques:921.5
原始文献类型	Conferences
来源库	IEEE
引用统计	正在获取...
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/102273
专题	信息科学与技术学院_硕士生信息科学与技术学院_PI研究组_ANDRE LUIS MACEDO ROSENDO SILVA组
作者单位	Shanghaitech University, Living Machine Lab, 393 Mid HuaxiaRoad Pudong District, Shanghai, China
第一作者单位	上海科技大学
第一作者的第一单位	上海科技大学
推荐引用方式 GB/T 7714	Zhang, Yizheng,Rosendo, Andre. Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals[C]:Institute of Electrical and Electronics Engineers Inc.,2019:1418-1423.