基于改進(jìn)ELM的遞歸最小二乘時(shí)序差分強(qiáng)化學(xué)習(xí)算法及其應(yīng)用

發(fā)布時(shí)間：2018-01-16 03:04

本文關(guān)鍵詞：基于改進(jìn)ELM的遞歸最小二乘時(shí)序差分強(qiáng)化學(xué)習(xí)算法及其應(yīng)用　出處：《化工學(xué)報(bào)》2017年03期 　論文類型：期刊論文

【摘要】：針對值函數(shù)逼近算法對精度及計(jì)算時(shí)間等要求,提出了一種基于改進(jìn)極限學(xué)習(xí)機(jī)的遞歸最小二乘時(shí)序差分強(qiáng)化學(xué)習(xí)算法。首先,將遞推方法引入到最小二乘時(shí)序差分強(qiáng)化學(xué)習(xí)算法中消去最小二乘中的矩陣求逆過程,形成遞推最小二乘時(shí)序差分強(qiáng)化學(xué)習(xí)算法,減少算法的復(fù)雜度及其計(jì)算量。其次,考慮到LSTD(0)算法收斂速度慢,加入資格跡增加樣本利用率提高收斂速度的算法,形成LSTD(λ)算法,以保證在經(jīng)歷過相同數(shù)量的軌跡后能收斂于真實(shí)值。同時(shí),考慮到大部分強(qiáng)化學(xué)習(xí)問題的值函數(shù)是單調(diào)的,而傳統(tǒng)ELM方法通常運(yùn)用具有雙側(cè)抑制特性的Sigmoid激活函數(shù),增大了計(jì)算成本,提出采用具有單側(cè)抑制特性的Softplus激活函數(shù)代替?zhèn)鹘y(tǒng)Sigmoid函數(shù),以減少計(jì)算量提高運(yùn)算速度,使得該算法在提高精度的同時(shí)提高了計(jì)算速度。通過與傳統(tǒng)基于徑向基函數(shù)的最小二乘強(qiáng)化學(xué)習(xí)算法和基于極限學(xué)習(xí)機(jī)的最小二乘TD算法在廣義Hop-world問題的對比實(shí)驗(yàn),比較結(jié)果證明了所提出算法在滿足精度的條件下有效提高了計(jì)算速度,甚至某些條件下精度比其他兩種算法更高。
[Abstract]:According to the value of algorithm on the accuracy and computational time requirements of function approximation, and proposes an improved extreme learning machine differential sequential recursive least squares algorithm based on reinforcement learning. First, the recursive method is introduced into the least squares temporal difference reinforcement learning algorithm in the least squares matrix elimination in the inverse process, the formation of the recursive least squares temporal difference the reinforcement learning algorithm to reduce the complexity of the algorithm and computation. Secondly, considering the LSTD (0) the slow convergence of the algorithm, adding eligibility rate of increase to improve the convergence speed of the algorithm by using the sample, the formation of LSTD (lambda) algorithm, to ensure that experienced in the same number of trajectories can converge to the true value. At the same time. Taking into account the most intensive value function learning problem is monotone, while the traditional ELM method is usually used with bilateral inhibition of Sigmoid activation function, increases the computation cost, mining equipment Unilateral suppression Softplus activation function to replace the traditional Sigmoid function, to reduce the amount of computation and improve the speed, so that the algorithm can improve the accuracy and speed of calculation is improved. Compared with the traditional least squares based on radial basis function and reinforcement learning algorithm based on least square algorithm TD limit experiment machine learning in the generalized Hop-world problem. The comparison results show that the proposed algorithm can meet the precision in calculation speed under the condition improved, even under certain conditions with greater accuracy than the other two algorithms.

【作者單位】：北京化工大學(xué)信息科學(xué)與技術(shù)學(xué)院;
【基金】：國家自然科學(xué)基金項(xiàng)目(61573051,61472021) 軟件開發(fā)環(huán)境國家重點(diǎn)實(shí)驗(yàn)室開放課題(SKLSDE-2015KF-01) 中央高�；究蒲袠I(yè)務(wù)費(fèi)專項(xiàng)資金項(xiàng)目(PT1613-05)~~
【分類號】：TP181
【正文快照】： 引言強(qiáng)化學(xué)習(xí)是由Watkins等[1-3]提出的基于心理學(xué)的一種全新的機(jī)器學(xué)習(xí)算法,其主要思想是通過智能體與環(huán)境的交互與試錯(cuò),以環(huán)境的反饋信號作為輸入實(shí)現(xiàn)策略的優(yōu)化。實(shí)現(xiàn)策略優(yōu)化需要正確的策略評價(jià)和策略迭代技術(shù),而如何正確地估計(jì)函數(shù)值是策略評價(jià)的一個(gè)中心問題。強(qiáng)化學(xué)習(xí)

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 李春貴,劉永信,陳波;多步截?cái)嘈袆印u價(jià)強(qiáng)化學(xué)習(xí)算法[J];內(nèi)蒙古大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年02期

2 鄭宇;羅四維;呂子昂;;基于模型的層次化強(qiáng)化學(xué)習(xí)算法[J];北京交通大學(xué)學(xué)報(bào);2006年05期

3 周如益;高陽;;一種基于性能勢的無折扣強(qiáng)化學(xué)習(xí)算法[J];廣西師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2006年04期

4 高陽;周如益;王皓;曹志新;;平均獎賞強(qiáng)化學(xué)習(xí)算法研究[J];計(jì)算機(jī)學(xué)報(bào);2007年08期

5 何源;張文生;;基于核方法的強(qiáng)化學(xué)習(xí)算法[J];微計(jì)算機(jī)信息;2008年04期

6 楊旭東;劉全;李瑾;;一種基于資格跡的并行強(qiáng)化學(xué)習(xí)算法[J];蘇州大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年01期

7 劉夢婷;牟永敏;趙剛;歐陽騰飛;;基于強(qiáng)化學(xué)習(xí)算法的供應(yīng)鏈管理訂單策略研究[J];數(shù)據(jù)通信;2013年01期

8 王學(xué)寧,賀漢根,徐昕;求解部分可觀測馬氏決策過程的強(qiáng)化學(xué)習(xí)算法[J];控制與決策;2004年11期

9 李春貴;劉永信;王萌;;集成規(guī)劃的行動-自適應(yīng)評價(jià)強(qiáng)化學(xué)習(xí)算法[J];內(nèi)蒙古大學(xué)學(xué)報(bào)(自然科學(xué)版);2008年03期

10 孟偉;韓學(xué)東;;并行強(qiáng)化學(xué)習(xí)算法及其應(yīng)用研究[J];計(jì)算機(jī)工程與應(yīng)用;2009年34期

相關(guān)會議論文前2條

1 陳宗海;段家慶;任皴;羅楊宇;李成榮;;針對機(jī)器人覓食任務(wù)的強(qiáng)化學(xué)習(xí)算法及其仿真研究[A];'2008系統(tǒng)仿真技術(shù)及其應(yīng)用學(xué)術(shù)會議論文集[C];2008年

2 孟祥萍;苑全德;皮玉珍;;基于量子理論的多Agent系統(tǒng)強(qiáng)化學(xué)習(xí)研究[A];'2006系統(tǒng)仿真技術(shù)及其應(yīng)用學(xué)術(shù)交流會論文集[C];2006年

相關(guān)博士學(xué)位論文前3條

1 陳興國;基于值函數(shù)估計(jì)的強(qiáng)化學(xué)習(xí)算法研究[D];南京大學(xué);2013年

2 鄭宇;分層強(qiáng)化學(xué)習(xí)算法及其應(yīng)用研究[D];北京交通大學(xué);2009年

3 李妼;基于視覺聽覺語義相干性的強(qiáng)化學(xué)習(xí)系統(tǒng)的研究[D];太原理工大學(xué);2012年

相關(guān)碩士學(xué)位論文前10條

1 宋拴;結(jié)合演示數(shù)據(jù)的強(qiáng)化學(xué)習(xí)與排序算法研究[D];南京大學(xué);2014年

2 馬朋委;Q_learning強(qiáng)化學(xué)習(xí)算法的改進(jìn)及應(yīng)用研究[D];安徽理工大學(xué);2016年

3 許志鵬;基于動作抽象的分層強(qiáng)化學(xué)習(xí)算法研究[D];蘇州大學(xué);2016年

4 房東陽;基于模糊強(qiáng)化學(xué)習(xí)的柔性結(jié)構(gòu)控制方法研究[D];西安電子科技大學(xué);2015年

5 張曉艷;連續(xù)時(shí)間分層強(qiáng)化學(xué)習(xí)算法[D];合肥工業(yè)大學(xué);2010年

6 蘇浩銘;基于模型知識的大空間強(qiáng)化學(xué)習(xí)算法的研究與實(shí)現(xiàn)[D];合肥工業(yè)大學(xué);2008年

7 楊宛璐;基于性能勢的改進(jìn)平均獎賞強(qiáng)化學(xué)習(xí)算法研究[D];廣東工業(yè)大學(xué);2014年

8 宋超峰;基于平均型強(qiáng)化學(xué)習(xí)算法的動態(tài)調(diào)度方法的研究[D];天津大學(xué);2006年

9 袁姣紅;基于模型的動態(tài)分層強(qiáng)化學(xué)習(xí)算法研究[D];中南大學(xué);2011年

10 褚建華;Q-learning強(qiáng)化學(xué)習(xí)算法改進(jìn)及其應(yīng)用研究[D];北京化工大學(xué);2009年

，

本文編號：1431238

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/1431238.html

上一篇：基于CPCIe的高速多通道信號采集板卡的設(shè)計(jì)
下一篇：基于多目標(biāo)四維可視化算法的有源電力濾波器濾波電感優(yōu)化設(shè)計(jì)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于改進(jìn)ELM的遞歸最小二乘時(shí)序差分強(qiáng)化學(xué)習(xí)算法及其應(yīng)用