多標(biāo)簽學(xué)習(xí)中關(guān)鍵問(wèn)題研究

發(fā)布時(shí)間：2018-06-27 12:27

本文選題：多標(biāo)簽學(xué)習(xí) + 多標(biāo)簽分類��；參考：《西安電子科技大學(xué)》2016年博士論文

【摘要】：隨著科技的發(fā)展,越來(lái)越多的應(yīng)用涉及到多標(biāo)簽問(wèn)題,如文本分類、圖像標(biāo)注、基因功能分析等。與傳統(tǒng)的單標(biāo)簽(二類分類或多類分類)問(wèn)題不同,多標(biāo)簽問(wèn)題中允許一個(gè)示例可同時(shí)與多個(gè)標(biāo)簽相關(guān)聯(lián),標(biāo)簽之間存在更豐富的標(biāo)簽關(guān)系,導(dǎo)致多標(biāo)簽問(wèn)題的分析變得更加復(fù)雜。多標(biāo)簽學(xué)習(xí)研究的是如何給多標(biāo)簽問(wèn)題中的待測(cè)示例賦予所有合適的類別標(biāo)簽。由于標(biāo)簽關(guān)系的存在,多標(biāo)簽學(xué)習(xí)比傳統(tǒng)的單標(biāo)簽學(xué)習(xí)復(fù)雜得多,更加難以分析。出于應(yīng)用需求,越來(lái)越多的研究人員開(kāi)始多標(biāo)簽學(xué)習(xí)研究。多標(biāo)簽學(xué)習(xí)研究已成為機(jī)器學(xué)習(xí)和模式識(shí)別領(lǐng)域的研究熱點(diǎn)之一。雖然多標(biāo)簽學(xué)習(xí)研究已經(jīng)取得了很大的進(jìn)展,但其仍面臨著一些關(guān)鍵問(wèn)題的挑戰(zhàn),如已有多標(biāo)簽分類算法的分類性能仍有待提高、較高的標(biāo)簽空間維度導(dǎo)致較高的訓(xùn)練和測(cè)試時(shí)間成本以及較高的特征空間維度容易導(dǎo)致訓(xùn)練模型過(guò)擬合等亟待解決的挑戰(zhàn)性問(wèn)題。因此,多標(biāo)簽分類、標(biāo)簽空間降維和多標(biāo)簽維度約簡(jiǎn)是目前多標(biāo)簽學(xué)習(xí)研究中的三個(gè)重點(diǎn)研究方面。其中,多標(biāo)簽分類算法研究以提升分類性能為目標(biāo)；標(biāo)簽空間降維算法以降低標(biāo)簽空間的維度為手段利用標(biāo)簽關(guān)系,以期提高分類性能,同時(shí)減少訓(xùn)練和測(cè)試時(shí)間；多標(biāo)簽維度約簡(jiǎn)用于解決多標(biāo)簽學(xué)習(xí)中的“維度災(zāi)難”問(wèn)題,通過(guò)降低特征空間的維度,以獲得更好的示例表示。本論文正是圍繞這三個(gè)方面開(kāi)展多標(biāo)簽學(xué)習(xí)研究,主要工作包括以下幾點(diǎn)：1.鑒于標(biāo)簽間常常有簇狀標(biāo)簽關(guān)系,提出了基于簇狀本征標(biāo)簽關(guān)系的多標(biāo)簽分類算法。該算法中每個(gè)標(biāo)簽的權(quán)值向量由公共分量和獨(dú)有分量?jī)刹糠謽?gòu)成。公共分量是所有標(biāo)簽共有的部分,對(duì)應(yīng)示例中的背景信息；獨(dú)有分量歸單個(gè)標(biāo)簽所有,對(duì)應(yīng)示例中該標(biāo)簽的獨(dú)有信息,標(biāo)簽之間的本征關(guān)系反映在獨(dú)有分量之間的關(guān)系上,而標(biāo)簽之間往往有簇狀關(guān)系。本文所提出的方法基于上述權(quán)值向量結(jié)構(gòu)對(duì)支持向量機(jī)進(jìn)行擴(kuò)展,在所有標(biāo)簽的獨(dú)有分量上通過(guò)施加簇狀關(guān)系正則項(xiàng)利用簇狀標(biāo)簽關(guān)系提高分類性能。通過(guò)放松正交約束條件,文中將非凸問(wèn)題變?yōu)槁?lián)合凸的半正定規(guī)劃問(wèn)題,并利用基于交替迭代更新規(guī)則的塊坐標(biāo)下降方法提出了該問(wèn)題的一種優(yōu)化方法。實(shí)驗(yàn)結(jié)果表明,所提出算法的分類性能明顯優(yōu)于相關(guān)多標(biāo)簽分類算法。2.針對(duì)現(xiàn)有多標(biāo)簽分類算法中所有標(biāo)簽用同一示例進(jìn)行訓(xùn)練的問(wèn)題,提出了一種利用示例分布情況為每個(gè)標(biāo)簽構(gòu)造更易判別的新示例表示的多標(biāo)簽分類算法。由于同一示例表示無(wú)法較好地反映各標(biāo)簽的特點(diǎn),為此,所提出的算法基于一對(duì)所有策略將多標(biāo)簽分類問(wèn)題轉(zhuǎn)化為多個(gè)二類分類子問(wèn)題,每個(gè)標(biāo)簽對(duì)應(yīng)一個(gè)子問(wèn)題。每個(gè)子問(wèn)題中正、負(fù)示例局部結(jié)構(gòu)之間的關(guān)聯(lián)關(guān)系對(duì)構(gòu)造高效分類模型有著很重要的作用,為挖掘這些關(guān)聯(lián)關(guān)系,本文提出了一種新的譜聚類方法一譜示例校準(zhǔn)。所提出的多標(biāo)簽分類算法利用譜示例校準(zhǔn)算法得到聚類結(jié)果為每個(gè)標(biāo)簽構(gòu)建更符合標(biāo)簽特點(diǎn)的示例表示,然后基于新的示例表示訓(xùn)練二類分類模型。實(shí)驗(yàn)結(jié)果驗(yàn)證了該算法的有效性。3.為在標(biāo)簽空間降維過(guò)程中充分利用示例信息,提出了一種基于依賴最大化(Dependence maximization)的標(biāo)簽空間降維算法。該算法的目標(biāo)函數(shù)包括兩部分：編碼損失和依賴損失。編碼損失衡量用主成分分析方法對(duì)標(biāo)簽矩陣壓縮過(guò)程中的信息損失。當(dāng)標(biāo)簽向量經(jīng)過(guò)降維變成碼字向量后,還需學(xué)習(xí)從特征空間到碼字空間的回歸模型,故示例和碼字向量之間的關(guān)系很重要,依賴損失便是用來(lái)衡量?jī)烧咧g依賴關(guān)系的損失情況。為利用示例信息,所提出的算法首次用希爾伯特-施密特獨(dú)立標(biāo)準(zhǔn)來(lái)衡量依賴損失,以能更充分地挖掘并利用示例和碼字向量之間的依賴關(guān)系。此外,我們還探討了兩種不同示例核矩陣對(duì)所提出算法性能的影響,其中一種示例核矩陣基于全局結(jié)構(gòu)信息,另一種示例核矩陣基于局部潛在結(jié)構(gòu)信息。實(shí)驗(yàn)結(jié)果表明,該算法不僅大大縮短了訓(xùn)練和測(cè)試時(shí)間,還能有效提高分類性能：利用后一種示例核矩陣的算法具有更好的分類性能,而其訓(xùn)練和測(cè)試時(shí)間與利用前一種示例核矩陣的算法相當(dāng)。4.針對(duì)示例和標(biāo)簽向量中的孤立點(diǎn)問(wèn)題,本文提出了一種基于l2.1范數(shù)的魯棒標(biāo)簽空間降維算法。由于數(shù)據(jù)采集設(shè)備的問(wèn)題,數(shù)據(jù)集的示例中往往存在孤立點(diǎn)問(wèn)題；標(biāo)簽向量孤立點(diǎn)是指與標(biāo)簽空間降維算法中所利用的主要標(biāo)簽關(guān)系明顯不符的標(biāo)簽向量。目標(biāo)函數(shù)包括編碼損失和依賴損失兩部分。編碼損失衡量用主成分分析方法對(duì)標(biāo)簽矩陣壓縮過(guò)程中的信息損失。依賴損失衡量示例和碼字向量間線性回歸關(guān)系的損失情況。為解決孤立點(diǎn)問(wèn)題,該算法目標(biāo)函數(shù)中的編碼損失和依賴損失均采用l2.1范數(shù)。所得到的目標(biāo)問(wèn)題是一個(gè)非光滑問(wèn)題,本文提出的變形交替迭代更新方法有效地解決了該問(wèn)題,并對(duì)其進(jìn)行了收斂性分析。實(shí)驗(yàn)結(jié)果表明,所提出的魯棒標(biāo)簽空間降維既能縮短訓(xùn)練和測(cè)試時(shí)間,又能提高分類性能。此外,在標(biāo)簽受污染的數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果表明,與其它標(biāo)簽空間降維算法相比,該算法具有更好的魯棒性。5.現(xiàn)有多標(biāo)簽維度約簡(jiǎn)方法沒(méi)有利用局部潛在結(jié)構(gòu),而傳統(tǒng)維度約簡(jiǎn)方法研究已表明這些結(jié)構(gòu)的有用性。為此,本文提出了一種新的多標(biāo)簽維度約簡(jiǎn)方法一多標(biāo)簽局部判別嵌入。該方法利用與實(shí)際情況更符合的非對(duì)稱標(biāo)簽關(guān)系矩陣,這樣既賦予了包含信息量多的示例更大的權(quán)重,又克服多標(biāo)簽學(xué)習(xí)中的過(guò)計(jì)數(shù)問(wèn)題；通過(guò)構(gòu)建兩個(gè)鄰接圖集合來(lái)分析局部潛在結(jié)構(gòu),以更好地挖掘并利用數(shù)據(jù)內(nèi)部的幾何結(jié)構(gòu),使維度約簡(jiǎn)結(jié)果有更好的類內(nèi)緊致性和類間可分性。通過(guò)對(duì)得到的優(yōu)化問(wèn)題施加正交約束條件,獲得一組正交投影向量。實(shí)驗(yàn)結(jié)果表明,與相關(guān)多標(biāo)簽維度約簡(jiǎn)方法相比,該方法的維度約簡(jiǎn)結(jié)果更合理,能產(chǎn)生更有判別信息的特征,從而取得更好的分類精度。
[Abstract]:With the development of science and technology, more and more applications involve multi label problems, such as text classification, image annotation, gene function analysis, etc.. Different from the traditional single label (two class classification or multi class classification) problem, the multi label problem allows one example to be associated with multiple labels simultaneously, and there is a more rich label relationship between the labels. The analysis of multiple label problems becomes more complex. Multi label learning studies how to give all appropriate category labels to examples in the multi label problem. Because of the existence of the label relationship, multi label learning is much more complex and difficult to analyze than traditional single label learning. More and more researchers, out of application requirements, have become more and more researchers. Multi label learning has become one of the hotspots in the field of machine learning and pattern recognition. Although much progress has been made in the study of multi label learning, it still faces some key challenges, such as the classification performance of the existing multi label classification algorithms still needs to be improved and the label space is higher. Dimensionality leads to higher training and test time cost and high feature space dimension easily leads to the challenge of training model overfitting. Therefore, multi label classification, label spatial reduction and multi label dimension reduction are three key research aspects of multi label learning. The objective of the study is to improve the classification performance. The label space reduction algorithm uses the label relationship to reduce the dimension of the label space as a means to improve the classification performance, while reducing the training and testing time. In order to obtain a better example, this thesis is to carry out the study of multi label learning around these three aspects. The main work includes the following points: 1. in view of the often clustered label relationship between tags, a multi label classification algorithm based on cluster eigenvalue label relations is proposed. There are two components. The common component is the common part of all labels, corresponding to the background information in the example; the unique component belongs to the single label, corresponding to the unique information of the label in the example, the intrinsic relationship between the tags is reflected in the relationship between the unique components, and the label often has a cluster relationship. This method extends the support vector machine based on the weight vector structure above, and improves the classification performance by applying the cluster relation regular term on the unique component of all labels. By relaxing the orthogonal constraint conditions, the non convex problem is transformed into a joint convex semi positive programming problem, and the alternative iteration is used to make use of the alternate iteration more. The block coordinate descending method of the new rule proposes an optimization method of this problem. The experimental results show that the classification performance of the proposed algorithm is obviously better than that of the related multi label classification algorithm.2., which uses the same example for all the tags in the existing multi label classification algorithm. The multi label classification algorithm represented by the new example is more easily discriminating. Because the same example is not good to reflect the characteristics of each label, the proposed algorithm is based on a pair of all strategies to transform the multi label classification problem into multiple two class classification subproblems, each tag corresponds to a sub problem. Each sub problem is positive, The correlation between negative examples of local structures plays an important role in constructing an efficient classification model. In order to excavate these relationships, a new spectral clustering method, a spectral example calibration, is proposed in this paper. The proposed multi label classification algorithm uses the spectral example calibration algorithm to get the clustering results for each label more conforming to the label. The characteristics of the example are expressed, and then the two class classification models are trained based on the new example. The experimental results verify that the validity of the algorithm.3. is to make full use of the example information in the process of reducing the dimension of the label space. A space reduction algorithm based on the dependency maximization (Dependence maximization) is proposed. The target function of the algorithm includes the algorithm. The two part: coding loss and dependence loss. The code loss measure uses principal component analysis method to reduce the information loss in the label matrix compression process. When the label vector passes the dimension reduction to the codeword vector, it is necessary to learn the regression model from the feature space to the codeword space, so the relationship between the example and the codeword vector is very important and depends on the loss. It is used to measure the loss of dependence between the two. For the first time, the proposed algorithm uses the Hilbert Schmidt independent standard to measure the dependence loss for the first time, so that the dependence between the example and the codeword vector can be more fully excavated and used. In addition, we also discuss two different examples of the kernel matrix pairs. One example kernel matrix is based on global structure information, and the other example kernel matrix is based on local potential structure information. The experimental results show that the algorithm not only greatly reduces the training and test time, but also improves the classification performance effectively: the algorithm of the latter example kernel matrix has better classification. Yes, while its training and testing time is equivalent to the algorithm of the previous example kernel matrix using.4., a robust tag space reduction algorithm based on l2.1 norm is proposed in this paper, which is based on the problem of data acquisition equipment. The outlier is a label vector which is obviously incompatible with the main label relationship in the dimension reduction algorithm of the label space. The target function includes two parts of the coding loss and the dependence loss. The loss of information in the compression process of the tag matrix using the principal component analysis method, the example of the loss imbalance and the linear return between the codeword vectors. In order to solve the problem of the outlier, the l2.1 norm is used for both the coding loss and the dependence loss in the objective function of the algorithm. The target problem is a non smooth problem. The proposed alternation iterative updating method is effective in solving the problem, and the convergence analysis is carried out. The experimental results show that the problem is not smooth. The proposed robust label space reduction can not only shorten the training and test time, but also improve the performance of the classification. In addition, the experimental results on the contaminated data set show that the algorithm has better robustness compared with the other label space reduction algorithms, and the existing multi label dimensionality reduction method has not made use of the local potential structure for.5.. The study of the traditional dimensionality reduction method has shown the usefulness of these structures. For this reason, a new multi label dimensionality reduction method with multi label local discriminant embedding is proposed. This method uses the asymmetric label relation matrix which is more consistent with the actual situation, so it not only gives a larger weight of the example with more information in the packet, but also overcomes the fact that the packet has more information. The problem of counting the over counting in multi label learning; by constructing two adjacent atlas to analyze the local potential structure to better excavate and utilize the geometric structure of the data, make the result of dimension reduction have better intra class compactness and interclass separability. By applying orthogonal constraints to the optimized questions obtained, a set of orthogonal input is obtained. The experimental results show that, compared with the related multi label dimension reduction method, the dimensional reduction results of the proposed method are more reasonable and can produce more discriminant information, thus achieving better classification accuracy.
【學(xué)位授予單位】：西安電子科技大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2016
【分類號(hào)】：TP181

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 林茜卡;傅秀芬;滕少華;李云;;協(xié)同標(biāo)簽系統(tǒng)的應(yīng)用研究[J];暨南大學(xué)學(xué)報(bào)(自然科學(xué)與醫(yī)學(xué)版);2009年01期

2 吳超;周波;;基于復(fù)雜網(wǎng)絡(luò)的社會(huì)化標(biāo)簽分析[J];浙江大學(xué)學(xué)報(bào)(工學(xué)版);2010年11期

3 吳金成;曹嬌;趙文棟;張磊;;標(biāo)簽集中式發(fā)布訂閱機(jī)制性能分析[J];指揮控制與仿真;2010年06期

4 李曉燕;陳剛;壽黎但;董金祥;;一種面向協(xié)作標(biāo)簽系統(tǒng)的圖片檢索聚類方法[J];中國(guó)圖象圖形學(xué)報(bào);2010年11期

5 袁柳;張龍波;;基于概率主題模型的標(biāo)簽預(yù)測(cè)[J];計(jì)算機(jī)科學(xué);2011年07期

6 張斌;張引;高克寧;郭朋偉;孫達(dá)明;;融合關(guān)系與內(nèi)容分析的社會(huì)標(biāo)簽推薦[J];軟件學(xué)報(bào);2012年03期

7 王永剛;嚴(yán)寒冰;許俊峰;胡建斌;陳鐘;;垃圾標(biāo)簽的抵御方法研究[J];計(jì)算機(jī)研究與發(fā)展;2013年10期

8 汪祥;賈焰;周斌;陳儒華;韓毅;;基于交互關(guān)系的微博用戶標(biāo)簽預(yù)測(cè)[J];計(jì)算機(jī)工程與科學(xué);2013年10期

9 顧亦然;陳敏;;一種三部圖網(wǎng)絡(luò)中標(biāo)簽時(shí)間加權(quán)的推薦方法[J];計(jì)算機(jī)科學(xué);2012年08期

10 趙亞楠;董晶;董佳梁;;基于社會(huì)化標(biāo)注的博客標(biāo)簽推薦方法[J];計(jì)算機(jī)工程與設(shè)計(jì);2012年12期

相關(guān)會(huì)議論文前6條

1 朱廣飛;董超;王衡;汪國(guó)平;;照片標(biāo)簽的智能化管理[A];第四屆和諧人機(jī)環(huán)境聯(lián)合學(xué)術(shù)會(huì)議論文集[C];2008年

2 房冠南;袁彩霞;王小捷;李江;宋占江;;面向?qū)υ捳Z(yǔ)料的標(biāo)簽推薦[A];中國(guó)計(jì)算語(yǔ)言學(xué)研究前沿進(jìn)展（2009-2011）[C];2011年

3 梅放;林鴻飛;;基于社會(huì)化標(biāo)簽的移動(dòng)音樂(lè)檢索[A];第五屆全國(guó)信息檢索學(xué)術(shù)會(huì)議論文集[C];2009年

4 李靜;林鴻飛;;基于用戶情感標(biāo)簽的音樂(lè)檢索算法[A];第六屆全國(guó)信息檢索學(xué)術(shù)會(huì)議論文集[C];2010年

5 駱雄武;萬(wàn)小軍;楊建武;吳於茜;;基于后綴樹(shù)的Web檢索結(jié)果聚類標(biāo)簽生成方法[A];第四屆全國(guó)信息檢索與內(nèi)容安全學(xué)術(shù)會(huì)議論文集（上）[C];2008年

6 王波;唐常杰;段磊;尹佳;左R，

本文編號(hào)：2073918

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/shoufeilunwen/xxkjbs/2073918.html

上一篇：復(fù)雜場(chǎng)景下基于局部分塊和上下文信息的單視覺(jué)目標(biāo)跟蹤
下一篇：氮化物MIS-HEMT器件界面工程研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

多標(biāo)簽學(xué)習(xí)中關(guān)鍵問(wèn)題研究