ORDER MATTERS: AGENT-BY-AGENT POLICY OPTIMIZATION
2023
会议录名称11TH INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS, ICLR 2023
摘要While multi-agent trust region algorithms have achieved great success empirically in solving coordination tasks, most of them, however, suffer from a non-stationarity problem since agents update their policies simultaneously. In contrast, a sequential scheme that updates policies agent-by-agent provides another perspective and shows strong performance. However, sample inefficiency and lack of monotonic improvement guarantees for each agent are still the two significant challenges for the sequential scheme. In this paper, we propose the Agent-by-agent Policy Optimization (A2PO) algorithm to improve the sample efficiency and retain the guarantees of monotonic improvement for each agent during training. We justify the tightness of the monotonic improvement bound compared with other trust region algorithms. From the perspective of sequentially updating agents, we further consider the effect of agent updating order and extend the theory of non-stationarity into the sequential update scheme. To evaluate A2PO, we conduct a comprehensive empirical study on four benchmarks: StarCraftII, Multiagent MuJoCo, Multi-agent Particle Environment, and Google Research Football full game scenarios. A2PO consistently outperforms strong baselines. © 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.
会议录编者/会议主办者Baidu ; DeepMind ; et al. ; Google Research ; Huawei ; Meta AI
关键词Software agents Sports Coordination tasks Monotonics Multi agent Non-stationarities Performance Policy agents Policy optimization Sequential update Trust region algorithms Update schemes
会议名称11th International Conference on Learning Representations, ICLR 2023
会议地点Kigali, Rwanda
会议日期May 1, 2023 - May 5, 2023
收录类别EI
语种英语
出版者International Conference on Learning Representations, ICLR
EI入藏号20243116791232
EI主题词Multi agent systems
原始文献类型Conference article (CA)
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/407254
专题创意与艺术学院_PI研究组(P)_田政组
通讯作者Tian, Zheng; Zhang, Weinan
作者单位
1.Shanghai Jiao Tong University, China;
2.Digital Brain Lab;
3.ShanghaiTech University, China;
4.University College London, United Kingdom
通讯作者单位上海科技大学
推荐引用方式
GB/T 7714
Wang, Xihuai,Tian, Zheng,Wan, Ziyu,et al. ORDER MATTERS: AGENT-BY-AGENT POLICY OPTIMIZATION[C]//Baidu, DeepMind, et al., Google Research, Huawei, Meta AI:International Conference on Learning Representations, ICLR,2023.
条目包含的文件
条目无相关文件。
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Wang, Xihuai]的文章
[Tian, Zheng]的文章
[Wan, Ziyu]的文章
百度学术
百度学术中相似的文章
[Wang, Xihuai]的文章
[Tian, Zheng]的文章
[Wan, Ziyu]的文章
必应学术
必应学术中相似的文章
[Wang, Xihuai]的文章
[Tian, Zheng]的文章
[Wan, Ziyu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。