子空間聚類分析新算法及應(yīng)用研究
發(fā)布時間:2018-01-05 21:18
本文關(guān)鍵詞:子空間聚類分析新算法及應(yīng)用研究 出處:《江南大學(xué)》2017年博士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 子空間聚類 稀疏表示 低秩表示 半監(jiān)督學(xué)習 遷移學(xué)習
【摘要】:高維數(shù)據(jù)普遍存在于各個領(lǐng)域,特別是進入大數(shù)據(jù)時代,這對于傳統(tǒng)聚類算法提出了很大的挑戰(zhàn),子空間聚類算法作為有效的解決高維數(shù)據(jù)聚類問題的有效算法吸引了研究人員的廣泛關(guān)注。近來,基于稀疏表示(SR)和低秩表示(LRR)的子空間聚類算法憑借其優(yōu)良的性能成為新的研究熱點。本文也集中研究了基于稀疏表示和低秩表示的子空間聚類算法,對其進行了深入研究分析,提出了相關(guān)改進方法,提高了算法在處理具體問題的性能。論文的主要工作如下:1.提出了一種魯棒的結(jié)構(gòu)約束低秩表示算法(RSLRR)。低秩表示算法在挖掘數(shù)據(jù)子空間結(jié)構(gòu)方法得到了成功的應(yīng)用。但是基于低秩表示的算法通常分類分離的兩個步驟,第一,通過求解秩最小化構(gòu)造親和圖;第二,利用譜聚類算法對親和圖進行劃分得到最終的分割結(jié)果。這表示親和圖的構(gòu)造和譜聚類是相互依賴的,而傳統(tǒng)的基于低秩表示的算法是無法保證最終的結(jié)果為全局最優(yōu)解。論文提出的魯棒的結(jié)構(gòu)約束低秩表示算法通過將親和圖構(gòu)造和譜聚類結(jié)合在一個統(tǒng)一的優(yōu)化框架之內(nèi),通過聯(lián)合優(yōu)化可以同時得到數(shù)據(jù)聚類結(jié)果和數(shù)據(jù)集的低秩表示結(jié)構(gòu)信息。在多個數(shù)據(jù)集上的實驗證明了該算法的有效性。2.提出了一種基于流形局部約束的低秩表示算法(MLCLRR)。低秩表示算法能夠有效的挖掘數(shù)據(jù)集的低維子空間結(jié)構(gòu)。但是大部分基于低秩表示的算法并沒有考慮數(shù)據(jù)集的非線性幾何結(jié)構(gòu),那么在算法處理過程中會丟失數(shù)據(jù)集的局部結(jié)構(gòu)信息和相似性信息,而這些信息對數(shù)據(jù)分析問題也起到重要作用。為了提高低秩表示算法在此問題上的性能,本文提出了一種基于流形局部約束的低秩表示算法,通過在在算法框架中引入數(shù)據(jù)的局部流形結(jié)構(gòu),本文提出的算法不僅能夠有效保持數(shù)據(jù)的全局低維子空間結(jié)構(gòu),同時能夠挖掘數(shù)據(jù)的局部非線性幾何結(jié)構(gòu)信息。在不同計算機視覺任務(wù)上的實驗表明了算法的有效性。3.提出了一種Latent Space結(jié)構(gòu)約束低秩表示算法(Lat RSLRR)。大部分已經(jīng)提出的基于稀疏表示和低秩表示的子空間聚類算法實在原始空間上對數(shù)據(jù)集進行處理,當原始數(shù)據(jù)集的維數(shù)較高時,會大大增加算法的時間成本。本文提出了一種基于Latent Space的結(jié)構(gòu)約束低秩表示算法,通過在低維Latent Space中求解數(shù)據(jù)的低秩表示系數(shù)大大提高了計算效率。同時多數(shù)低秩表示算法采用數(shù)據(jù)集本身作為數(shù)據(jù)字典,當數(shù)據(jù)集中含有較多噪聲和例外點時,會嚴重影響算法最終性能,本文提出的算法通過利用矩陣恢復(fù)技術(shù)求解得到的鑒別性字典作為低秩表示的字典。子空間聚類問題上的實驗表明了算法的有效性。4.將半監(jiān)督學(xué)習和低秩表示進行了有機的結(jié)合,通過將圖嵌入學(xué)習和稀疏回歸方法統(tǒng)一在一個優(yōu)化框架之中,提出了基于低秩表示的半監(jiān)督學(xué)習算法。目前,大部分基于圖的半監(jiān)督學(xué)習算法考慮了數(shù)據(jù)的局部近鄰信息,但是忽略了樣本數(shù)據(jù)的全局結(jié)構(gòu)信息。本文提出的方法通過將數(shù)據(jù)投影到低維子空間中學(xué)習得到低秩權(quán)重矩陣,在親和圖的構(gòu)造過程中充分利用數(shù)據(jù)集的已標記樣本信息。降維過程中,算法能夠有效的保留數(shù)據(jù)集的全局結(jié)構(gòu)信息,并且學(xué)習得到的低秩權(quán)重矩陣能夠有效的降低噪聲數(shù)據(jù)對最終結(jié)果的影響。在多個數(shù)據(jù)集上的實驗表明了該算法能夠獲得較高的分類準確率。5.提出了一種熵加權(quán)遷移軟子空間聚類算法。為了獲得較高的聚類準確率,傳統(tǒng)聚類算法通常需要大量歷史樣本數(shù)據(jù)的支持,這帶來的影響是:如果當前數(shù)據(jù)采集環(huán)境中存在信息丟失或者數(shù)據(jù)之間的劃分關(guān)系不明確的情況下,這會導(dǎo)致聚類算法的失效。遷移學(xué)習對解決數(shù)據(jù)樣本不足的問題具有很好的效果,通過利用數(shù)據(jù)集的歷史信息,本文提出了一種熵加權(quán)的軟子空間聚類算法。在多個UCI標準數(shù)據(jù)集和高維基因表達數(shù)據(jù)集上的實驗表明了算法能夠充分利用數(shù)據(jù)集的歷史信息彌補當前數(shù)據(jù)樣本量不足的缺點,提高聚類算法的準確率。
[Abstract]:High dimensional data exists in various fields, especially in the era of big data, it is a big challenge to the traditional clustering algorithm, subspace clustering algorithm is an effective algorithm effectively solve the clustering problem of high dimensional data has attracted wide attention from researchers. In recent years, based on sparse representation (SR) and low rank (LRR) subspace clustering algorithm with its excellent performance has become a new research topic. This paper also concentrated on the sparse subspace clustering algorithm and low rank based on the in-depth research and analysis, put forward relevant improvement methods, improve the performance of the algorithm in dealing with specific problems. The main work of this paper the structure of the thesis are as follows: 1. a robust low rank constraint representation algorithm (RSLRR). The low rank representation algorithm in data mining subspace structure method has been successfully used. But based on low rank representation The two step, the classification algorithm is usually separated by solving the first rank minimization tectonic Affinity Diagram; second, using spectral clustering algorithm to classify the affinity graph to get the final segmentation result. This indicates the affinity graph structure and spectral clustering are interdependent, and the traditional algorithm based on low rank representation is not guaranteed the final result is the global optimal solution. The structure of the proposed robust low rank constraint representation algorithm by affinity graph structure and spectral clustering combination within a unified optimization framework, through the joint optimization can be obtained simultaneously low rank data clustering results and data sets representing structural information. On multiple data sets the experiment proved that.2. the effectiveness of the algorithm this paper proposes a new algorithm based on low rank manifold local constraints (MLCLRR). The low rank representation algorithm to a low dimensional subspace of data mining in the effective structure. But most based on low rank representation algorithm does not consider the nonlinear geometric structure of the data set, then the local structure information in the algorithm process lost data set and the similarity information, and the information of data analysis problems also play an important role. In order to improve the performance of low rank representation algorithm on this problem in this paper. A low rank manifold representation algorithm based on local constraints, through the introduction of data in the local manifold structure in the algorithm framework, the proposed algorithm can not only effectively maintain the data of the global low dimensional space structure, at the same time to local nonlinear geometric structure information of data mining. In different computer vision tasks on the experiment the.3. algorithm presents a Latent Space constraint structure low rank representation algorithm (Lat RSLRR). Most have been proposed based on sparse representation and Subspace clustering algorithm of low rank representation is the original space to deal with the data set, when the high dimension of the original data set, the algorithm will greatly increase the cost of time. This paper presents a structural constraint Latent low rank representation algorithm based on Space, by Latent Space in the low dimensional representation of data in low rank solution the coefficient of the computational efficiency is greatly improved. At the same time, the majority of low rank representation algorithm using the data set itself as the data dictionary, when the data set contains more noise and exceptional point, will seriously affect the final performance of the algorithm, this algorithm through the identification of the dictionary is obtained by using matrix recovery technology as a low rank representation of the subspace dictionary. The problem of clustering experiments show the effectiveness of the.4. algorithm of semi supervised learning and low rank representation for the organic combination of the graph embedding learning and sparse regression method In a unified optimization framework, proposes a semi supervised learning algorithm based on low rank representation. At present, most of the semi supervised learning algorithm based on graph considering local neighbor information of the data, but ignore the global structure information of the sample data. The method proposed in this paper by projecting the data onto a low dimensional subspace learning low rank weight matrix, full data set of labeled samples in the construction process of information using the affinity graph. In process of reduction, the algorithm can effectively preserve the global structure information data set, the low rank weight matrix and learning can effectively reduce the effect of noise data on the final result. In multiple data the set of experiments show that the algorithm can achieve higher classification accuracy.5. an entropy weighted migration soft subspace clustering algorithm is proposed. In order to obtain a higher clustering accuracy, the traditional Clustering algorithms usually need a large number of historical data, the impact of this is: if the relationship between the division of information loss current data acquisition environment or data uncertainty, which causes the failure of clustering algorithm. Transfer learning has good effect on solving the problem of insufficient data, through the use of data in the history of information, this paper proposes a soft subspace clustering algorithm for weighted entropy. Experiments on the data sets show that the algorithm can make full use of the historical data set to make up for the current lack of sample information disadvantages expressed in multiple UCI data sets and high dimension gene, to improve the accuracy of clustering algorithm.
【學(xué)位授予單位】:江南大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2017
【分類號】:TP311.13
【參考文獻】
相關(guān)期刊論文 前8條
1 張濤;唐振民;呂建勇;;一種基于低秩表示的子空間聚類改進算法[J];電子與信息學(xué)報;2016年11期
2 許凱;吳小俊;尹賀峰;;基于分布式低秩表示的子空間聚類算法[J];計算機研究與發(fā)展;2016年07期
3 劉展杰;陳曉云;;局部子空間聚類[J];自動化學(xué)報;2016年08期
4 王衛(wèi)衛(wèi);李小平;馮象初;王斯琪;;稀疏子空間聚類綜述[J];自動化學(xué)報;2015年08期
5 許凱;吳小俊;;基于重建系數(shù)的子空間聚類融合算法[J];計算機應(yīng)用研究;2015年11期
6 舒振球;趙春霞;張浩峰;;基于監(jiān)督學(xué)習的稀疏編碼及在數(shù)據(jù)表示中的應(yīng)用[J];控制與決策;2014年06期
7 王駿;王士同;鄧趙紅;;聚類分析研究中的若干問題[J];控制與決策;2012年03期
8 陳黎飛;郭躬德;姜青山;;自適應(yīng)的軟子空間聚類算法[J];軟件學(xué)報;2010年10期
,本文編號:1384872
本文鏈接:http://www.sikaile.net/shoufeilunwen/xxkjbs/1384872.html
最近更新
教材專著