降低方差的深度确定性策略梯度算法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

2020 海军军事理论研究课题


Deep Deterministic Policy Gradient Algorithm with Reduced Variance
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对高方差现象导致训练过程不稳定、算法性能下降的问题,提出一种降低方差的深度确定性策略梯度 算法(reduction variance deep deterministic policy gradient,RV-DDPG)。通过延迟更新目标策略的方法,减少误差出 现次数,降低误差的累计;通过平滑目标策略的方法,减小单步误差,稳定方差。将RV-DDPG 算法、传统深度确 定性策略梯度算法(deep deterministic policy gradient,DDPG)和目前广泛应用的异步优势行动者评论家算法 (asynchronous advantage actor-critic,A3C)应用于Pendulum、Mountain Car Continues 和Half Cheetah 问题。实验结 果表明:RV-DDPG 具有更好的收敛性和稳定性,证明了该算法降低方差的有效性。

    Abstract:

    In order to solve the problem that high variance leads to the instability of training process and the decline of algorithm performance, a reduction variance deep deterministic policy gradient (RV-DDPG) algorithm is proposed. Through the method of delaying updating the target strategy, the number of errors is reduced and the accumulation of errors is reduced; through the method of smoothing the target strategy, the single-step error is reduced and the variance is stabilized. The RV-DDPG algorithm, the traditional deep deterministic policy gradient algorithm (DDPG) and the widely used asynchronous asynchronous advantage actor (A3C) are applied to Pendulum, Mountain Car Continues and Half Cheetah problems. The experimental results show that RV-DDPG has better convergence and stability, which proves the effectiveness of the algorithm to reduce the variance.

    参考文献
    相似文献
    引证文献
引用本文

赵国庆.降低方差的深度确定性策略梯度算法[J].,2022,41(6).

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-02-15
  • 最后修改日期:2022-03-28
  • 录用日期:
  • 在线发布日期: 2022-06-06
  • 出版日期:
文章二维码