基于深度學(xué)習(xí)的圖像語義標(biāo)注與描述研究
發(fā)布時間:2018-04-11 07:37
本文選題:圖像標(biāo)注 + 卷積神經(jīng)網(wǎng)絡(luò)。 參考:《廣西師范大學(xué)》2017年碩士論文
【摘要】:隨著信息科學(xué)技術(shù)的高速發(fā)展,伴隨而來的是多樣的媒體數(shù)據(jù)飛速增長,這得益于數(shù)字設(shè)備的普及和存儲技術(shù)的發(fā)展。面對大量無標(biāo)簽數(shù)據(jù)的產(chǎn)生,如文本、音頻、圖像及視頻等,如何管理和使用這些無標(biāo)注數(shù)據(jù),儼然成為一個亟需解決的問題。當(dāng)前的圖像語義標(biāo)注技術(shù)可以有效地對圖像進(jìn)行標(biāo)注,這不僅能夠幫助人們管理大量的無標(biāo)記圖像,還能夠讓機(jī)器更智能的理解圖像,所以圖像語義標(biāo)注是一項(xiàng)非常有意義的研究工作。所謂圖像理解技術(shù),其核心技術(shù)是在圖像處理分析基礎(chǔ)上,結(jié)合計(jì)算機(jī)視覺和自然語言處理等相關(guān)理論,進(jìn)而分析、理解圖像內(nèi)容,并以文本語義信息的形式反饋給人類。因此圖像理解技術(shù)的完成不僅需要圖像標(biāo)注,還需要圖像描述。圖像標(biāo)注的任務(wù)是以圖像為對象,語義信息為載體,研究圖像中有何物體以及物體之間的聯(lián)系。圖像描述的任務(wù)是以自然語言處理技術(shù)分析并產(chǎn)生標(biāo)注詞,進(jìn)而將生成的標(biāo)注詞組合為自然語言的描述語句。近年來,圖像描述得到了研究界的極大興趣,同圖像標(biāo)注工作一樣,它們都具有廣闊的應(yīng)用前景。論文以圖像語義標(biāo)注為研究主線,以多媒體數(shù)據(jù)中的圖像作為研究對象,以圖像描述為應(yīng)用擴(kuò)展,按照特征提取表示-語義映射模型構(gòu)建-分析理解語義的研究思路,重點(diǎn)研究圖像標(biāo)注中的目標(biāo)識別和語義分析問題,其中包括特征學(xué)習(xí)、多標(biāo)簽分類、語義關(guān)聯(lián)性分析和單詞語句序列生成等技術(shù);谝陨涎芯,本文的主要工作有:為了縮減不同模態(tài)數(shù)據(jù)間的語義鴻溝,提出了 一種基于深度卷積神經(jīng)網(wǎng)絡(luò)(Deep Convolutional Neural Network,CNN)和集成的分類器鏈(Ensembles of Classifier Chains,ECC)的圖像多標(biāo)注混合架構(gòu)CNN-ECC。該模型框架主要由生成式特征學(xué)習(xí)和判別式語義學(xué)習(xí)兩階段構(gòu)成。第一步利用改進(jìn)的卷積神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)圖像多示例融合的高級視覺特征。第二步基于獲取的視覺特征與圖像的語義標(biāo)簽集訓(xùn)練集成的分類器鏈,集成的分類器鏈不僅能夠?qū)W習(xí)到視覺特征包含的語義信息,還能夠充分挖掘語義標(biāo)簽間的關(guān)聯(lián)性,使得生成的標(biāo)簽間具有更強(qiáng)的關(guān)聯(lián)性,從而避免產(chǎn)生冗余的標(biāo)簽。最終利用訓(xùn)練得到的模型對未知的圖像進(jìn)行自動語義標(biāo)注。圖像標(biāo)注為圖像描述工作奠定了基礎(chǔ),為了將圖像生成的標(biāo)注詞組裝成自然語言的語句描述,提出了一種基于卷積神經(jīng)網(wǎng)絡(luò)(Convolutional Neural Network,CNN)和雙向長短期記憶單元(Double Long-short Term Memory,DLSTM)的圖像描述模型 CNN-DLSTM。該模型框架由視覺模型和語言模型兩部分組成。首先視覺模型用于學(xué)習(xí)圖像視覺內(nèi)容概念,生成圖像關(guān)鍵語義詞。其次語言模型基于人工的描述序列學(xué)習(xí)詞法與語法,結(jié)合視覺概念詞和相應(yīng)的語法生成對應(yīng)的語言描述,完成圖像描述任務(wù)。為了使模型生成的語句更加類人化,最后CNN-DLSTM還引入了一個生成描述質(zhì)量的置信評估模型,選擇性輸出得分更高的圖像描述語句。圖像的內(nèi)容不僅復(fù)雜而抽象,而且在語義概念上也存在模糊和多義性等特點(diǎn)。因而本文在圖像標(biāo)注的特征學(xué)習(xí)、語義學(xué)習(xí)等關(guān)鍵工作上做出改進(jìn),實(shí)現(xiàn)圖像自動標(biāo)注,改善了圖像標(biāo)注及描述性能。
[Abstract]:With the rapid development of information science and technology, accompanied by the rapid growth of a variety of media data, which benefited from the development of digital devices and storage technology. In the face of a large number of unlabeled data, such as text, audio, image and video, how to manage and use the unlabeled data, has become a a problem to be solved. The current image semantic annotation technology can annotate the image effectively, unmarked images of this can not only help to manage a large number of people, also can make the machines understand more intelligent image, so image semantic annotation is a very meaningful research work. The image understanding technology, its core in the image processing technology is the basis of the analysis, combined with computer vision and Natural Language Processing and other related theories, and analysis, understand the content of the image, and the semantic information of text in the form of feedback to Human image understanding technology. Therefore the need not only to complete image annotation, image description. The task still need image annotation based on image semantic information for the object, as the carrier, research object and object relation between any image. The task of image description is based on Natural Language Processing technology analysis and annotation, which will generate a statement the combination of natural language annotation words. In recent years, image description has been great interest in the research community, with the image annotation work, which has a wide application prospect. Based on image semantic annotation is the main line to the image in multimedia data as the research object, the description of the image application, in accordance with the characteristics of extraction - research ideas of constructing - Analysis and understanding semantic mapping model, object recognition and semantic analysis focus on image annotation, including The characteristics of learning, multi label classification, semantic association analysis and word and sentence sequence generation. Based on the above research, the main works of this paper are: in order to reduce the semantic gap between different modal data, proposes a convolutional neural network based on (Deep Convolutional Neural Network, CNN) and the integrated classifier chain (Ensembles of Classifier Chains, ECC) of the image annotation CNN-ECC. hybrid architecture of the model framework is mainly composed of generative learning and discriminative learning of semantic features two stage. The first step in the use of improved convolution neural network multi instance learning image fusion advanced visual features. The second step semantic label visual features and image acquisition based on the training set the integrated classifier chain, integrated classifier chain can not only learn the semantic information contained in visual features, but also can fully dig the semantic labels The relevance, relevance is more generated between tags, so as to avoid redundant label. Finally using the model which is trained on the unknown image automatic semantic annotation. The image has laid the foundation for image annotation description, in order to generate image annotation phrases into natural language sentence description, proposed a based on a convolutional neural network (Convolutional Neural Network, CNN) and two long short term memory unit (Double Long-short Term Memory, DLSTM) of the image description model CNN-DLSTM. the model framework by visual and language model is composed of two parts. The first visual model for image visual content of concept learning, image semantic key. Second language model based on artificial description of sequence learning of lexical and syntax, combined with visual concept words and the corresponding grammar generates a corresponding language description, complete graph Like describing the task. In order to make the statement more humanoid model generation, finally, CNN-DLSTM also introduced a confidence evaluation model is generated to describe the quality of the output image selective score higher description statement. The content of the image is complicated and abstract, but also vague and ambiguous and other features in the semantic concept. So this study in the characteristics of image annotation, semantic learning and other key work to improve, to achieve automatic image annotation, image annotation and description of improved performance.
【學(xué)位授予單位】:廣西師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.41
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 李志欣;施智平;張燦龍;王金艷;;混合生成式和判別式模型的圖像自動標(biāo)注[J];中國圖象圖形學(xué)報(bào);2015年05期
2 裴明濤;王永杰;賈云得;郭志強(qiáng);;基于多尺度模板匹配和部件模型的車牌字符分割方法[J];北京理工大學(xué)學(xué)報(bào);2014年09期
3 向征;譚恒良;馬爭鳴;;改進(jìn)的HOG和Gabor,LBP性能比較[J];計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào);2012年06期
4 尹文杰;韓軍偉;郭雷;賀勝;許明;;基于顯著區(qū)域的圖像自動標(biāo)注[J];計(jì)算機(jī)應(yīng)用研究;2011年10期
5 李志欣;施智平;李志清;史忠植;;圖像檢索中語義映射方法綜述[J];計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào);2008年08期
,本文編號:1735035
本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/1735035.html
最近更新
教材專著