循環(huán)注意力模型的訓(xùn)練優(yōu)化策略研究

發(fā)布時(shí)間：2018-11-09 13:15

【摘要】：近年來(lái),深度學(xué)習(xí)在計(jì)算機(jī)視覺(jué),機(jī)器翻譯,語(yǔ)音識(shí)別等領(lǐng)域取得了極大的成功,在多個(gè)應(yīng)用領(lǐng)域上取得了當(dāng)前的最好成績(jī)。但是這些模型所取得的高精確度主要來(lái)源于在訓(xùn)練以及檢測(cè)時(shí)投入了高額計(jì)算成本。傳統(tǒng)的深度學(xué)習(xí)的一個(gè)主要運(yùn)算瓶頸在于需要對(duì)整幅圖像處理,而人類視覺(jué)只需要將視覺(jué)焦點(diǎn)集中在當(dāng)前感興趣的區(qū)域上,這一特點(diǎn)能夠很有效的減少人類視覺(jué)系統(tǒng)的“帶寬”。在視覺(jué)領(lǐng)域中,盡管研究人員提出了如減少滑動(dòng)窗口的方法以提高計(jì)算效率,但是深度模型的計(jì)算成本依然跟輸入圖像的大小成正比。為了解決該問(wèn)題,本文模擬人類視覺(jué)系統(tǒng)特點(diǎn)引入注意力機(jī)制。目前的注意力機(jī)制主要分為Soft Attention以及Hard Attention。Soft Attention基于顯著圖的可微模型,而Hard Attention采用離散的注意力位置產(chǎn)生凝視(Glimpse)區(qū)域來(lái)產(chǎn)生注意力特征。本文從Hard Attention出發(fā),基于循環(huán)注意力模型(RAM)提出了兩種優(yōu)化策略——OV-RAM以及EM算法,并且在弱標(biāo)記的數(shù)據(jù)集Translated MNIST以及Cluttered MNIST上進(jìn)行了測(cè)試。循環(huán)注意力模型基于RNN,能夠每次凝視不同的感知區(qū)域從而更新自身隱含狀態(tài),通過(guò)這些累積信息來(lái)做決策。由于每次只處理感興趣的小部分區(qū)域,所以運(yùn)算效率比傳統(tǒng)深度學(xué)習(xí)網(wǎng)絡(luò)更高。由于循環(huán)注意力采用了離散的、不可導(dǎo)的注意力位置,并且運(yùn)用了強(qiáng)化學(xué)習(xí)來(lái)求解注意力位置選擇策略,這使得循環(huán)注意力模型訓(xùn)練速率較慢。本文借鑒前人的模型,將Soft Attention與Hard Attention結(jié)合,給循環(huán)注意力模型加入Overview層以提供上下文信息,得到OV-RAM模型。同時(shí),本文分析了循環(huán)注意力模型結(jié)構(gòu)中的問(wèn)題,從有監(jiān)督學(xué)習(xí)的角度重新推導(dǎo)了目標(biāo)函數(shù),將兩個(gè)耦合部分拆分,從而引入EM算法對(duì)進(jìn)行訓(xùn)練。最后,我們對(duì)一些失敗樣例進(jìn)行了分析,并給出了一些解決措施。本文使用Translated MNIST以及Cluttered MNIST數(shù)據(jù)集進(jìn)行訓(xùn)練及測(cè)試。實(shí)驗(yàn)結(jié)果證實(shí),本文提出的OV-RAM以及EM算法能夠有效的提升循環(huán)注意力模型的訓(xùn)練速率。該方法只需要更短的迭代次數(shù)就可達(dá)到同樣的收斂精度,證明了本文提出的兩種優(yōu)化策略的有效性。
[Abstract]:In recent years, deep learning has achieved great success in computer vision, machine translation, speech recognition and so on. However, the high accuracy of these models is mainly due to the high cost of training and detection. One of the main operational bottlenecks of traditional in-depth learning is the need to process the entire image, while human vision only needs to focus on the current region of interest. This feature can effectively reduce the "bandwidth" of human visual systems. In the field of vision, although researchers have proposed methods such as reducing sliding windows to improve computational efficiency, the computational cost of the depth model is still proportional to the size of the input image. In order to solve this problem, the attention mechanism is introduced to simulate the characteristics of human visual system. At present, the attention mechanism is mainly divided into Soft Attention and Hard Attention.Soft Attention differentiable model based on salient graph, while Hard Attention uses discrete attention position to produce gaze (Glimpse) region to produce attention feature. In this paper, based on Hard Attention, two optimization strategies, OV-RAM and EM algorithm, are proposed based on the cyclic attention model (RAM) and tested on the weakly marked dataset Translated MNIST and Cluttered MNIST. The cyclic attention model is based on the fact that RNN, can gaze into different perceptual regions each time to update its own implicit state and make decisions through these cumulative information. Since only a small part of the region of interest is processed at a time, the computational efficiency is higher than that of the traditional deep learning network. Because cyclic attention adopts discrete and underivable attention positions and reinforcement learning is used to solve the attention position selection strategy, the training rate of cyclic attention model is slow. Based on the previous models, this paper combines Soft Attention with Hard Attention, adds Overview layer to the cyclic attention model to provide context information, and obtains the OV-RAM model. At the same time, this paper analyzes the problems in the structure of the cyclic attention model, rededuces the objective function from the perspective of supervised learning, splits the two coupling parts, and then introduces the EM algorithm to train. Finally, we analyze some examples of failure and give some solutions. This paper uses Translated MNIST and Cluttered MNIST data sets for training and testing. Experimental results show that the proposed OV-RAM and EM algorithms can effectively improve the training rate of the cyclic attention model. The method can achieve the same convergence accuracy only with shorter iterations, which proves the effectiveness of the two optimization strategies proposed in this paper.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP391.41

【相似文獻(xiàn)】

相關(guān)碩士學(xué)位論文前1條

1 陳少鵬;循環(huán)注意力模型的訓(xùn)練優(yōu)化策略研究[D];哈爾濱工業(yè)大學(xué);2017年

，

本文編號(hào)：2320514

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2320514.html

上一篇：融合對(duì)比度與背景先驗(yàn)的顯著目標(biāo)檢測(cè)算法
下一篇：基于粒子濾波的自適應(yīng)目標(biāo)跟蹤算法研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

循環(huán)注意力模型的訓(xùn)練優(yōu)化策略研究