基于狀態(tài)聚類的非參數(shù)化近似廣義策略迭代增強學習算法

發(fā)布時間：2019-04-18 08:59

【摘要】：為解決當前近似策略迭代增強學習算法普遍存在計算量大、基函數(shù)不能完全自動構建的問題,提出一種基于狀態(tài)聚類的非參數(shù)化近似廣義策略迭代增強學習算法(NPAGPI-SC).該算法利用二級隨機采樣過程采集樣本,利用trial-and-error過程和以樣本完全覆蓋為目標的估計方法計算逼近器初始參數(shù),利用delta規(guī)則和最近鄰思想在學習過程中自適應地調整逼近器,利用貪心策略選擇應執(zhí)行的動作.一級倒立擺平衡控制的仿真實驗結果驗證了所提出算法的有效性和魯棒性.
[Abstract]:In order to solve the problem that the current approximate strategy iterative reinforcement learning algorithm has a large amount of computation and the basis function can not be constructed automatically, a nonparametric approximate generalized strategy iterative reinforcement learning algorithm (NPAGPI-SC) based on state clustering is proposed. In this algorithm, the two-stage random sampling process is used to collect samples, and the initial parameters of the approximator are calculated by using the trial-and-error process and the estimation method with the complete coverage of the sample as the target. The delta rule and the nearest neighbor idea are used to adjust the approximator adaptively in the learning process, and the greedy strategy is used to select the actions to be performed. The simulation results of the balance control of a single inverted pendulum verify the effectiveness and robustness of the proposed algorithm.
【作者單位】：南昌大學江西省機器人與焊接自動化重點實驗室;
【基金】：國家863計劃項目(SS2013AA041003)
【分類號】：TP181
，

本文編號：2459917

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/2459917.html

上一篇：基于支持向量機的模糊特征分類算法研究
下一篇：壓電式多維力傳感器晶組設計與制作關鍵技術研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于狀態(tài)聚類的非參數(shù)化近似廣義策略迭代增強學習算法