基于機(jī)器學(xué)習(xí)的游戲智能系統(tǒng)研究與應(yīng)用

發(fā)布時(shí)間：2018-10-24 11:07

【摘要】：在機(jī)器學(xué)習(xí)領(lǐng)域中,直接利用高維的感知數(shù)據(jù),例如視覺語音信號(hào)等,訓(xùn)練學(xué)習(xí)并獲得一個(gè)具有良好控制策略的決策系統(tǒng)仍然是一個(gè)挑戰(zhàn)性的問題。在Deep Q-Learning Network(DQN)提出以前,現(xiàn)有領(lǐng)域內(nèi)成功的強(qiáng)化學(xué)習(xí)應(yīng)用案例主要依賴于組合人工特征或者策略表達(dá)來實(shí)現(xiàn),特征的適用性嚴(yán)重影響最后的結(jié)果。隨著深度強(qiáng)化學(xué)習(xí)領(lǐng)域的發(fā)展,利用DQN算法直接從高維數(shù)據(jù)以及環(huán)境的反饋中,能夠以平穩(wěn)的方式成功地學(xué)習(xí)到一個(gè)非常好的控制策略,在Atari環(huán)境中大部分游戲取得了非常好的表現(xiàn)。利用卷積神經(jīng)網(wǎng)絡(luò)擁有的直接從高維數(shù)據(jù)中提取特征的能力以及Q-Learning算法用于訓(xùn)練動(dòng)作評(píng)價(jià)網(wǎng)絡(luò),DQN在游戲智能領(lǐng)域提供了新的解決思路。然而仍然存在一系列的挑戰(zhàn)。首先DQN需要一個(gè)完全觀測(cè)的狀態(tài)信息,在面對(duì)需要超過4幀信息用以表示當(dāng)前狀態(tài)的時(shí)候,并不能獲得一個(gè)非常好的控制策略,例如在3D環(huán)境下。稀疏、有延遲的、有噪聲的獎(jiǎng)勵(lì)信號(hào)是另一個(gè)問題,強(qiáng)化學(xué)習(xí)需要從這樣一個(gè)獎(jiǎng)勵(lì)信號(hào)中去學(xué)習(xí)控制策略,但是由于樣本間的強(qiáng)相關(guān)性以及獎(jiǎng)勵(lì)信號(hào)所存在的問題,往往不能獲得比較好的效果。本文利用LSTM(Long-Short Term Memory)網(wǎng)絡(luò)對(duì)長時(shí)間狀態(tài)的記憶能力,以及使用改進(jìn)的異步訓(xùn)練算法,根據(jù)實(shí)驗(yàn)的具體情況設(shè)計(jì)一個(gè)基于深度神經(jīng)網(wǎng)絡(luò)的游戲智能系統(tǒng),并在一個(gè)3D環(huán)境下,驗(yàn)證智能系統(tǒng)所學(xué)習(xí)到的決策能力。
[Abstract]:In the field of machine learning, it is still a challenging problem to directly use high-dimensional perceptual data, such as visual speech signals, to train and learn and obtain a decision system with good control strategy. Before Deep Q-Learning Network (DQN) was proposed, successful reinforcement learning cases in existing fields mainly depended on the combination of artificial features or strategy expression to achieve, and the applicability of the features seriously affected the final results. With the development of deep reinforcement learning, we can learn a very good control strategy in a steady way by using DQN algorithm directly from high-dimensional data and environment feedback. Most of the games performed very well in the Atari environment. Using the ability of the convolutional neural network to extract features directly from the high-dimensional data and the Q-Learning algorithm used to train the action evaluation network, DQN provides a new solution in the field of game intelligence. However, a number of challenges remain. First of all, DQN needs a fully observed state information. When more than four frames of information are needed to represent the current state, a very good control strategy can not be obtained, such as in 3D environment. Sparse, delayed, noisy reward signals are another problem. Reinforcement learning requires learning control strategies from such a reward signal, but due to strong correlation between samples and problems with reward signals, It is often impossible to achieve better results. In this paper, a game intelligent system based on deep neural network is designed based on LSTM (Long-Short Term Memory) network) memory ability of long time state and improved asynchronous training algorithm, according to the specific conditions of the experiment, and in a 3D environment. Verify the decision ability learned by the intelligent system.
【學(xué)位授予單位】：電子科技大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP311.52;TP18

【參考文獻(xiàn)】

相關(guān)碩士學(xué)位論文前5條

1 張煒;基于機(jī)器學(xué)習(xí)的智能家居系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[D];吉林大學(xué);2016年

2 邱立威;深度強(qiáng)化學(xué)習(xí)在視頻游戲中的應(yīng)用[D];華南理工大學(xué);2015年

3 何賽;游戲人工智能關(guān)鍵技術(shù)研究與應(yīng)用[D];北京郵電大學(xué);2015年

4 劉楊;FPS游戲中多智能體決策模型研究[D];上海交通大學(xué);2013年

5 布偉光;游戲中基于規(guī)則與機(jī)器學(xué)習(xí)的智能技術(shù)應(yīng)用研究[D];重慶大學(xué);2008年

，

本文編號(hào)：2291214

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/2291214.html

上一篇：面向復(fù)雜形體表面操作的工業(yè)機(jī)器人路徑規(guī)劃與仿真研究
下一篇：基于生物電信號(hào)的上肢康復(fù)機(jī)器人的研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于機(jī)器學(xué)習(xí)的游戲智能系統(tǒng)研究與應(yīng)用