Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals
2019-12
会议录名称2019 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO)
页码1418-1423
发表状态已发表
DOI10.1109/ROBIO49542.2019.8961549
摘要

Deep Reinforcement Learning (DRL) has shown its promising capabilities to learn optimal policies directly from trial and error. However, learning can be hindered if the goal of the learning, defined by the reward function, is "not optimal". We demonstrate that by setting the goal/target of competition in a counter-intuitive but intelligent way, instead of heuristically trying solutions through many hours, the DRL simulation can quickly converge into a winning strategy. The ICRA-DJI RoboMaster AI Challenge is a game of cooperation and competition between robots in a partially observable environment, quite similar to the Counter-Strike game. Unlike the traditional approach to games, where the reward is given at winning the match or hitting the enemy, our DRL algorithm rewards our robots when in a geometric-strategic advantage, which implicitly increases the winning chances. Furthermore, we use Deep Q Learning (DQL) to generate multi-agent paths for moving, which improves the cooperation between two robots by avoiding collision. Finally, we implement a variant A* algorithm with the same implicit geometric goal as DQL and compare results. We conclude that a well-set goal can put in question the need for learning algorithms, with geometric-based searches outperforming DQL in many orders of magnitude.

会议地点Dali, China
会议日期6-8 Dec. 2019
URL查看原文
收录类别EI
资助项目[0830000081] ; National Natural Science Foundation of China[61850410527]
出版者Institute of Electrical and Electronics Engineers Inc.
EI入藏号20200608146690
EI主题词Biomimetics ; Deep learning ; Geometry ; Learning algorithms ; Machine learning ; Motion planning ; Multi agent systems ; Optimization ; Robotics ; Robots
EI分类号Biotechnology:461.8 ; Artificial Intelligence:723.4 ; Robotics:731.5 ; Mathematics:921 ; Optimization Techniques:921.5
原始文献类型Conferences
来源库IEEE
引用统计
正在获取...
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/102273
专题信息科学与技术学院_硕士生
信息科学与技术学院_PI研究组_ANDRE LUIS MACEDO ROSENDO SILVA组
作者单位
Shanghaitech University, Living Machine Lab, 393 Mid HuaxiaRoad Pudong District, Shanghai, China
第一作者单位上海科技大学
第一作者的第一单位上海科技大学
推荐引用方式
GB/T 7714
Zhang, Yizheng,Rosendo, Andre. Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals[C]:Institute of Electrical and Electronics Engineers Inc.,2019:1418-1423.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Zhang, Yizheng]的文章
[Rosendo, Andre]的文章
百度学术
百度学术中相似的文章
[Zhang, Yizheng]的文章
[Rosendo, Andre]的文章
必应学术
必应学术中相似的文章
[Zhang, Yizheng]的文章
[Rosendo, Andre]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。