消息
×
loading..
Reduced Policy Optimization for Continuous Control with Hard Constraints
2023
会议录名称ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023)
ISSN1049-5258
发表状态已发表
摘要Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints remains challenging, particularly in those situations with non-convex hard constraints. Inspired by the generalized reduced gradient (GRG) algorithm, a classical constrained optimization technique, we propose a reduced policy optimization (RPO) algorithm that combines RL with GRG to address general hard constraints. RPO partitions actions into basic actions and nonbasic actions following the GRG method and output the basic actions via a policy network. Subsequently, RPO calculates the nonbasic actions by solving equations based on equality constraints using the obtained basic actions. The policy network is then updated by implicitly differentiating nonbasic actions with respect to basic actions. Additionally, we introduce an action projection procedure based on the reduced gradient and apply a modified Lagrangian relaxation technique to ensure inequality constraints are satisfied. To the best of our knowledge, RPO is the first attempt that introduces GRG to RL as a way of efficiently handling both equality and inequality hard constraints. It is worth noting that there is currently a lack of RL environments with complex hard constraints, which motivates us to develop three new benchmarks: two robotics manipulation tasks and a smart grid operation control task. With these benchmarks, RPO achieves better performance than previous constrained RL algorithms in terms of both cumulative reward and constraint violation. We believe RPO, along with the new benchmarks, will open up new opportunities for applying RL to real-world problems with complex constraints.
会议名称37th Conference on Neural Information Processing Systems (NeurIPS)
出版地10010 NORTH TORREY PINES RD, LA JOLLA, CALIFORNIA 92037 USA
会议地点null,New Orleans,LA
会议日期DEC 10-16, 2023
URL查看原文
收录类别CPCI-S
语种英语
资助项目NSFC[62303319] ; Shanghai Sailing Program["22YF1428800","21YF1429400"] ; Shanghai Local College Capacity Building Program[23010503100]
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence ; Computer Science, Information Systems
WOS记录号WOS:001228825102028
出版者NEURAL INFORMATION PROCESSING SYSTEMS (NIPS)
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/348105
专题信息科学与技术学院_硕士生
信息科学与技术学院_博士生
信息科学与技术学院_PI研究组_汪婧雅组
信息科学与技术学院_PI研究组_石野组
通讯作者Shi, Ye
作者单位
1.ShanghaiTech Univ, Shanghai, Peoples R China
2.Kings Coll London, London, England
第一作者单位上海科技大学
通讯作者单位上海科技大学
第一作者的第一单位上海科技大学
推荐引用方式
GB/T 7714
Ding, Shutong,Wang, Jingya,Du, Yali,et al. Reduced Policy Optimization for Continuous Control with Hard Constraints[C]. 10010 NORTH TORREY PINES RD, LA JOLLA, CALIFORNIA 92037 USA:NEURAL INFORMATION PROCESSING SYSTEMS (NIPS),2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Ding, Shutong]的文章
[Wang, Jingya]的文章
[Du, Yali]的文章
百度学术
百度学术中相似的文章
[Ding, Shutong]的文章
[Wang, Jingya]的文章
[Du, Yali]的文章
必应学术
必应学术中相似的文章
[Ding, Shutong]的文章
[Wang, Jingya]的文章
[Du, Yali]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。