鴿子視覺—行為抉擇的強(qiáng)化學(xué)習(xí)研究
發(fā)布時(shí)間:2019-04-08 15:28
【摘要】:行為抉擇(認(rèn)知執(zhí)行)是人類與動(dòng)物等智體(agent)在自然界優(yōu)勝劣汰下得以生存而必備的技能,通過對(duì)外界信息的判斷而指導(dǎo)其行為做出抉擇。智體獲取外界信息的主要來源是視覺,占據(jù)所有感知信息的80%以上。在自然界中,智體賴以生存的視覺-行為抉擇大部分是后天學(xué)習(xí)(強(qiáng)化學(xué)習(xí))得來。鴿子因其強(qiáng)大的視覺感知能力和不亞于哺乳動(dòng)物的行為抉擇能力,成為視覺認(rèn)知領(lǐng)域的典型模式動(dòng)物。因此開展鴿子視覺-行為抉擇的強(qiáng)化學(xué)習(xí)研究,對(duì)于揭示智體在行為抉擇中的認(rèn)知機(jī)制具有重要意義,有助于理解智能抉擇行為的腦機(jī)制,深化對(duì)大腦認(rèn)知抉擇工作原理的認(rèn)識(shí)。關(guān)于鴿子視覺-行為抉擇的研究雖已取得一些進(jìn)展,但多側(cè)重于靜態(tài)規(guī)則下的強(qiáng)化學(xué)習(xí)研究,實(shí)驗(yàn)范式過于簡化,多采用固定不變的學(xué)習(xí)率或單一的獎(jiǎng)勵(lì)矩陣,并不能真正的模擬智體在動(dòng)態(tài)環(huán)境規(guī)則下的行為抉擇機(jī)制。此外,NCL區(qū)(nidopallium caudolaterale)神經(jīng)元在強(qiáng)化學(xué)習(xí)過程中所起的作用尚不明確。為此本文以鴿子為實(shí)驗(yàn)對(duì)象,設(shè)計(jì)了動(dòng)態(tài)強(qiáng)化規(guī)則的視覺-行為抉擇實(shí)驗(yàn)范式,開展行為訓(xùn)練,同步采集了鴿子NCL區(qū)神經(jīng)元電信號(hào),從行為學(xué)和神經(jīng)元響應(yīng)角度分析了鴿子在動(dòng)態(tài)強(qiáng)化學(xué)習(xí)過程中的行為抉擇特性和NCL區(qū)神經(jīng)元的響應(yīng)特性。本文主要開展的工作如下:(1)設(shè)計(jì)了兩種動(dòng)態(tài)規(guī)則下的視覺-行為抉擇訓(xùn)練范式。設(shè)計(jì)了隨機(jī)強(qiáng)化和反轉(zhuǎn)強(qiáng)化兩種視覺-行為抉擇實(shí)驗(yàn)范式;根據(jù)擬定的實(shí)驗(yàn)流程搭建了行為訓(xùn)練的硬件與軟件平臺(tái),實(shí)現(xiàn)了鴿子基于特定獎(jiǎng)懲信息的自動(dòng)化訓(xùn)練;同步采集了強(qiáng)化學(xué)習(xí)訓(xùn)練過程中鴿子NCL區(qū)神經(jīng)元電信號(hào),完成了神經(jīng)元電信號(hào)的預(yù)處理。(2)提出了一種新的動(dòng)態(tài)強(qiáng)化學(xué)習(xí)模型。通過對(duì)經(jīng)典Q-Learning模型的學(xué)習(xí)率和獎(jiǎng)勵(lì)矩陣進(jìn)行改進(jìn),提出一種新的動(dòng)態(tài)強(qiáng)化學(xué)習(xí)模型,對(duì)鴿子在兩種訓(xùn)練過程中的行為反饋數(shù)據(jù)進(jìn)行分析,并與經(jīng)典Q-Learning模型對(duì)比,結(jié)果表明采用動(dòng)態(tài)強(qiáng)化學(xué)習(xí)模型預(yù)測(cè)行為的誤差分別降低了46.98%與30.55%,同時(shí)發(fā)現(xiàn)該模型的學(xué)習(xí)率反映了鴿子在不同訓(xùn)練階段的內(nèi)部學(xué)習(xí)狀態(tài)。(3)提取了不同訓(xùn)練階段鴿子NCL區(qū)神經(jīng)元的響應(yīng)特征,并做了統(tǒng)計(jì)分析。通過篩選有效試次響應(yīng)信號(hào),選取合適的響應(yīng)時(shí)間窗,計(jì)算了特定時(shí)間窗內(nèi)的放電頻率,作為神經(jīng)元響應(yīng)特征;采用曼惠特尼檢驗(yàn)分析了鴿子在強(qiáng)化學(xué)習(xí)過程中NCL區(qū)神經(jīng)元響應(yīng)特征差異顯著性。結(jié)果表明,部分(10/60)神經(jīng)元的響應(yīng)特征反映了訓(xùn)練中的獎(jiǎng)懲信息;部分(21/60)神經(jīng)元的響應(yīng)特征包含了鴿子學(xué)習(xí)狀態(tài)的信息。該結(jié)果說明NCL區(qū)的神經(jīng)元在強(qiáng)化學(xué)習(xí)過程中扮演了不同的角色。
[Abstract]:Behavioral decision-making (cognitive execution) is a necessary skill for human and animal (agent) to survive under the survival of the fittest in nature. It guides the decision-making of human and animal behavior by judging the external information. Vision is the main source of external information, accounting for more than 80% of all perceptual information. In nature, most of the visual-behavioral choices on which intellectual bodies depend are acquired learning (reinforcement learning). Pigeons have become a typical model animal in the field of visual cognition because of their powerful visual perception and behavioral decision-making ability of mammals. Therefore, the study of enhanced learning of pigeon visual-behavioral choice is of great significance for revealing the cognitive mechanism of intellectual body in behavioral decision-making, and it is helpful to understand the brain mechanism of intelligent decision-making behavior. Deepen the understanding of the working principle of cognitive choice in the brain. Although some progress has been made in the study of pigeon visual-behavioral choice, most of them focus on reinforcement learning under static rules. The experimental paradigm is too simplified, and the fixed learning rate or a single reward matrix is often used. It can not really simulate the behavior choice mechanism of intelligent body under the dynamic environment rule. In addition, the role of (nidopallium caudolaterale) neurons in the NCL region in reinforcement learning is unclear. In this paper, a visual-behavioral choice experiment paradigm based on dynamic reinforcement rules was designed for pigeons, and the behavior training was carried out. The electrical signals of NCL neurons in pigeons were collected synchronously. The behavioral choice characteristics and the response characteristics of neurons in NCL region of pigeons in the process of dynamic reinforcement learning were analyzed in terms of behavior and neuron response. The main work of this paper is as follows: (1) two visual-behavioral decision-making training paradigms under dynamic rules are designed. Two experimental paradigms of visual-behavioral choice, random reinforcement and reverse reinforcement, are designed, and the hardware and software platform of behavior training is built according to the proposed experimental procedure, and the automatic training of pigeons based on specific rewards and punishments is realized. The neural signals in the NCL region of pigeons were collected synchronously in the process of intensive learning and training, and the preprocessing of neuron signals was completed. (2) A new dynamic reinforcement learning model was proposed. By improving the learning rate and reward matrix of the classical Q-Learning model, a new dynamic reinforcement learning model is proposed. The behavior feedback data of pigeons in the two training processes are analyzed and compared with the classical Q-Learning model. The results show that the error of predicting behavior by dynamic reinforcement learning model is reduced by 46.98% and 30.55%, respectively. At the same time, it was found that the learning rate of the model reflected the internal learning state of pigeons in different training stages. (3) the response characteristics of NCL neurons in different training stages were extracted and analyzed statistically. By selecting the effective response signal and selecting the appropriate response time window, the discharge frequency in the specific time window is calculated as the response characteristic of the neuron. ManWhitney test was used to analyze the characteristics of neuronal responses in the NCL region of pigeons during intensive learning. The results show that the response characteristics of some (10 ~ 60) neurons reflect the information of rewards and punishments in training, and the response characteristics of some (21 ~ (60) neurons contain the information of pigeons' learning state. The results show that the neurons in the NCL region play different roles in reinforcement learning.
【學(xué)位授予單位】:鄭州大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:Q42
本文編號(hào):2454693
[Abstract]:Behavioral decision-making (cognitive execution) is a necessary skill for human and animal (agent) to survive under the survival of the fittest in nature. It guides the decision-making of human and animal behavior by judging the external information. Vision is the main source of external information, accounting for more than 80% of all perceptual information. In nature, most of the visual-behavioral choices on which intellectual bodies depend are acquired learning (reinforcement learning). Pigeons have become a typical model animal in the field of visual cognition because of their powerful visual perception and behavioral decision-making ability of mammals. Therefore, the study of enhanced learning of pigeon visual-behavioral choice is of great significance for revealing the cognitive mechanism of intellectual body in behavioral decision-making, and it is helpful to understand the brain mechanism of intelligent decision-making behavior. Deepen the understanding of the working principle of cognitive choice in the brain. Although some progress has been made in the study of pigeon visual-behavioral choice, most of them focus on reinforcement learning under static rules. The experimental paradigm is too simplified, and the fixed learning rate or a single reward matrix is often used. It can not really simulate the behavior choice mechanism of intelligent body under the dynamic environment rule. In addition, the role of (nidopallium caudolaterale) neurons in the NCL region in reinforcement learning is unclear. In this paper, a visual-behavioral choice experiment paradigm based on dynamic reinforcement rules was designed for pigeons, and the behavior training was carried out. The electrical signals of NCL neurons in pigeons were collected synchronously. The behavioral choice characteristics and the response characteristics of neurons in NCL region of pigeons in the process of dynamic reinforcement learning were analyzed in terms of behavior and neuron response. The main work of this paper is as follows: (1) two visual-behavioral decision-making training paradigms under dynamic rules are designed. Two experimental paradigms of visual-behavioral choice, random reinforcement and reverse reinforcement, are designed, and the hardware and software platform of behavior training is built according to the proposed experimental procedure, and the automatic training of pigeons based on specific rewards and punishments is realized. The neural signals in the NCL region of pigeons were collected synchronously in the process of intensive learning and training, and the preprocessing of neuron signals was completed. (2) A new dynamic reinforcement learning model was proposed. By improving the learning rate and reward matrix of the classical Q-Learning model, a new dynamic reinforcement learning model is proposed. The behavior feedback data of pigeons in the two training processes are analyzed and compared with the classical Q-Learning model. The results show that the error of predicting behavior by dynamic reinforcement learning model is reduced by 46.98% and 30.55%, respectively. At the same time, it was found that the learning rate of the model reflected the internal learning state of pigeons in different training stages. (3) the response characteristics of NCL neurons in different training stages were extracted and analyzed statistically. By selecting the effective response signal and selecting the appropriate response time window, the discharge frequency in the specific time window is calculated as the response characteristic of the neuron. ManWhitney test was used to analyze the characteristics of neuronal responses in the NCL region of pigeons during intensive learning. The results show that the response characteristics of some (10 ~ 60) neurons reflect the information of rewards and punishments in training, and the response characteristics of some (21 ~ (60) neurons contain the information of pigeons' learning state. The results show that the neurons in the NCL region play different roles in reinforcement learning.
【學(xué)位授予單位】:鄭州大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:Q42
【相似文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前5條
1 陶夢(mèng)妍;鴿子視覺—行為抉擇的強(qiáng)化學(xué)習(xí)研究[D];鄭州大學(xué);2017年
2 陳雪美;鴿子海馬區(qū)位置細(xì)胞識(shí)別及位置野分布特性分析[D];鄭州大學(xué);2017年
3 李珊;鋒電位功能網(wǎng)絡(luò)構(gòu)建與鴿子轉(zhuǎn)向行為解碼[D];鄭州大學(xué);2017年
4 楊松領(lǐng);鴿子迷宮訓(xùn)練系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];鄭州大學(xué);2017年
5 陳艷;基于同步似然的gamma子帶功能網(wǎng)絡(luò)構(gòu)建與鴿子轉(zhuǎn)向行為解碼[D];鄭州大學(xué);2017年
,本文編號(hào):2454693
本文鏈接:http://www.sikaile.net/shoufeilunwen/benkebiyelunwen/2454693.html
最近更新
教材專著