基于SARSA算法的足球機器人決策系統(tǒng)的研究與設計
發(fā)布時間:2018-12-08 21:20
【摘要】:Robo Cup 2D仿真機器人足球比賽平臺是多智能體機器人系統(tǒng)研究的一種平臺,研究人員可以在該平臺上測試不同的機器學習算法。強化學習是機器學習算法中的重要算法之一,它允許智能體通過與環(huán)境不斷地進行交互以獲得最大的累積獎勵回報。在一定的條件下,強化學習可以保證智能體的學習能夠收斂到最優(yōu)策略上。強化學習已經被廣泛應用于圍棋、五子棋、俄羅斯方塊、虛幻競技場等游戲當中并取得了成功,但是它在Robo Cup 2D仿真比賽中并沒有被充分研究。本文將SARSA算法引入到Robo Cup 2D仿真比賽中,并對其進行改進。根據防守球員的位置和球的位置對球員智能體的狀態(tài)空間進行映射,并根據空間狀態(tài)的映射獲得其對應的前提條件函數,作為SARSA算法進行動作選擇的依據,對SARSA算法在Helios框架中進行了設計與實現(xiàn);谧闱蝾I域知識,本文提出了兩種基于領域知識的獎勵修正函數,包括基于球隊分散度的獎勵修正函數和基于足球轉移距離的獎勵修正函數,以使球隊有更好的表現(xiàn)。在多智能體系統(tǒng)中,單智能體獨立地進行強化學習得到Q表往往是稀疏的,無法代表整個系統(tǒng)的全局情況,為了解決這種問題,本文對多智能體共享Q表的方法進行了研究,并提出了多Q表融合算法,使得球隊在比賽中獲得更高的勝率。由于強化學習算法的設計需要保證Q表的收斂,本文首先對比了自適應?-greedy動作選擇策略與固定?-greedy動作選擇策略的收斂性,并最終選擇了能夠收斂的自適應?-greedy動作選擇策略;然后對于獎勵回報函數的設計本文對比了不同獎勵值對進球得分的影響,確定了正確的獎勵值,并對比了SARSA算法在引入兩種獎勵修正后球隊的勝率,實驗證明獎勵修正的引入有利于提高球隊勝率;最后與參加Robo Cup 2D的球隊進行了多場比賽,并對比賽結果進行了統(tǒng)計分析,驗證了本文算法的有效性。
[Abstract]:Robo Cup 2D simulation robot soccer competition platform is a platform for the research of multi-agent robot system. Researchers can test different machine learning algorithms on the platform. Reinforcement learning is one of the most important algorithms in machine learning. It allows agents to interact with the environment continuously to obtain the maximum cumulative reward. Under certain conditions, reinforcement learning can ensure that agent learning converges to the optimal strategy. Reinforcement learning has been widely used in games such as go, Gobang, Tetris, Unreal Arena and so on, but it has not been fully studied in Robo Cup 2D simulation competition. This paper introduces SARSA algorithm into Robo Cup 2D simulation competition and improves it. According to the position of the defensive player and the position of the ball, the state space of the player agent is mapped, and the corresponding precondition function is obtained according to the mapping of the space state, which is used as the basis for the action selection of the SARSA algorithm. The SARSA algorithm is designed and implemented in the framework of Helios. Based on football domain knowledge, this paper proposes two kinds of reward correction functions based on domain knowledge, including one based on team dispersion and one based on football transfer distance, so as to make the team perform better. In order to solve this problem, the method of multi-agent sharing Q table is studied in this paper, in order to solve this problem, the Q table is often sparse and cannot represent the overall situation of the whole system by the single-agent reinforcement learning independently, in order to solve this problem, this paper studies the method of multi-agent sharing Q table. A multi-Q-table fusion algorithm is proposed to make the team win higher in the match. Since the design of reinforcement learning algorithm needs to ensure the convergence of Q table, this paper first compares the convergence of adaptive-greedy action selection strategy and fixed-greedy action selection strategy. Finally, the adaptive greedy action selection strategy which can converge is selected. Then for the design of reward and reward function, this paper compares the effect of different reward values on goal score, determines the correct reward value, and compares the winning rate of the team after introducing two kinds of reward correction by SARSA algorithm. Experimental results show that the introduction of reward correction is beneficial to improve the winning rate of the team. Finally, several games were conducted with the team participating in Robo Cup 2D, and the results were statistically analyzed to verify the effectiveness of this algorithm.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP242
本文編號:2369012
[Abstract]:Robo Cup 2D simulation robot soccer competition platform is a platform for the research of multi-agent robot system. Researchers can test different machine learning algorithms on the platform. Reinforcement learning is one of the most important algorithms in machine learning. It allows agents to interact with the environment continuously to obtain the maximum cumulative reward. Under certain conditions, reinforcement learning can ensure that agent learning converges to the optimal strategy. Reinforcement learning has been widely used in games such as go, Gobang, Tetris, Unreal Arena and so on, but it has not been fully studied in Robo Cup 2D simulation competition. This paper introduces SARSA algorithm into Robo Cup 2D simulation competition and improves it. According to the position of the defensive player and the position of the ball, the state space of the player agent is mapped, and the corresponding precondition function is obtained according to the mapping of the space state, which is used as the basis for the action selection of the SARSA algorithm. The SARSA algorithm is designed and implemented in the framework of Helios. Based on football domain knowledge, this paper proposes two kinds of reward correction functions based on domain knowledge, including one based on team dispersion and one based on football transfer distance, so as to make the team perform better. In order to solve this problem, the method of multi-agent sharing Q table is studied in this paper, in order to solve this problem, the Q table is often sparse and cannot represent the overall situation of the whole system by the single-agent reinforcement learning independently, in order to solve this problem, this paper studies the method of multi-agent sharing Q table. A multi-Q-table fusion algorithm is proposed to make the team win higher in the match. Since the design of reinforcement learning algorithm needs to ensure the convergence of Q table, this paper first compares the convergence of adaptive-greedy action selection strategy and fixed-greedy action selection strategy. Finally, the adaptive greedy action selection strategy which can converge is selected. Then for the design of reward and reward function, this paper compares the effect of different reward values on goal score, determines the correct reward value, and compares the winning rate of the team after introducing two kinds of reward correction by SARSA algorithm. Experimental results show that the introduction of reward correction is beneficial to improve the winning rate of the team. Finally, several games were conducted with the team participating in Robo Cup 2D, and the results were statistically analyzed to verify the effectiveness of this algorithm.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP242
【參考文獻】
相關博士學位論文 前1條
1 柏愛俊;基于馬爾科夫理論的不確定性規(guī)劃和感知問題研究[D];中國科學技術大學;2014年
,本文編號:2369012
本文鏈接:http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/2369012.html
最近更新
教材專著