ShanghaiTech University Knowledge Management System
Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals | |
2019-12 | |
会议录名称 | 2019 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO)
![]() |
页码 | 1418-1423 |
发表状态 | 已发表 |
DOI | 10.1109/ROBIO49542.2019.8961549 |
摘要 | Deep Reinforcement Learning (DRL) has shown its promising capabilities to learn optimal policies directly from trial and error. However, learning can be hindered if the goal of the learning, defined by the reward function, is "not optimal". We demonstrate that by setting the goal/target of competition in a counter-intuitive but intelligent way, instead of heuristically trying solutions through many hours, the DRL simulation can quickly converge into a winning strategy. The ICRA-DJI RoboMaster AI Challenge is a game of cooperation and competition between robots in a partially observable environment, quite similar to the Counter-Strike game. Unlike the traditional approach to games, where the reward is given at winning the match or hitting the enemy, our DRL algorithm rewards our robots when in a geometric-strategic advantage, which implicitly increases the winning chances. Furthermore, we use Deep Q Learning (DQL) to generate multi-agent paths for moving, which improves the cooperation between two robots by avoiding collision. Finally, we implement a variant A* algorithm with the same implicit geometric goal as DQL and compare results. We conclude that a well-set goal can put in question the need for learning algorithms, with geometric-based searches outperforming DQL in many orders of magnitude. |
会议地点 | Dali, China |
会议日期 | 6-8 Dec. 2019 |
URL | 查看原文 |
收录类别 | EI |
资助项目 | [0830000081] ; National Natural Science Foundation of China[61850410527] |
出版者 | Institute of Electrical and Electronics Engineers Inc. |
EI入藏号 | 20200608146690 |
EI主题词 | Biomimetics ; Deep learning ; Geometry ; Learning algorithms ; Machine learning ; Motion planning ; Multi agent systems ; Optimization ; Robotics ; Robots |
EI分类号 | Biotechnology:461.8 ; Artificial Intelligence:723.4 ; Robotics:731.5 ; Mathematics:921 ; Optimization Techniques:921.5 |
原始文献类型 | Conferences |
来源库 | IEEE |
引用统计 | 正在获取...
|
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/102273 |
专题 | 信息科学与技术学院_硕士生 信息科学与技术学院_PI研究组_ANDRE LUIS MACEDO ROSENDO SILVA组 |
作者单位 | Shanghaitech University, Living Machine Lab, 393 Mid HuaxiaRoad Pudong District, Shanghai, China |
第一作者单位 | 上海科技大学 |
第一作者的第一单位 | 上海科技大学 |
推荐引用方式 GB/T 7714 | Zhang, Yizheng,Rosendo, Andre. Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals[C]:Institute of Electrical and Electronics Engineers Inc.,2019:1418-1423. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[Zhang, Yizheng]的文章 |
[Rosendo, Andre]的文章 |
百度学术 |
百度学术中相似的文章 |
[Zhang, Yizheng]的文章 |
[Rosendo, Andre]的文章 |
必应学术 |
必应学术中相似的文章 |
[Zhang, Yizheng]的文章 |
[Rosendo, Andre]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。