基于改進字典學習的隱子空間聚類算法的研究
發(fā)布時間:2019-06-12 14:02
【摘要】:聚類分析作為一種數據分析的工具,是指將抽象的數據對象進行聚集而形成多個簇的分析過程,其在模式識別,機器學習,文檔檢索,數據挖掘等領域有著廣泛的應用。近年來,隨著網絡的普及,計算機圖像技術的發(fā)展,使得行業(yè)內新增了大量的圖像視頻數據,并且伴隨著人們對視頻圖像清晰度的要求越來越高,出現了高達數百TB的高維度數據。大多數傳統聚類算法都是針對低維度的數據進行設計的,因而很難高效的處理高維度數據。子空間聚類算法作為傳統聚類算法的一種擴展,是處理高維度數據聚類的一種有效途徑。本文的主要研究內容是針對基于稀疏表示的隱子空間聚類算法進行改進,進而提高算法的聚類性能,具體內容如下:1.詳細介紹了稀疏表示模型與字典學習模型的基本原理,并分別講解了稀疏表示領域與字典學習領域的一些經典的算法的步驟及優(yōu)缺點,包括MP,OMP,MOD,KSVD等。接著介紹了子空間聚類與譜聚類的一些背景知識,并詳細推導譜聚類的算法流程,為之后算法的改進奠定基礎。2.綜合闡述了一種基于譜聚類,稀疏表示,以及字典學習的子空間聚類算法,即隱子空間聚類算法(LSC),并詳細介紹了算法的主要思想及相關的推導過程。3.針對隱子空間聚類算法的訓練字典缺乏穩(wěn)定性和判別性這一缺陷,提出了一種基于判別式字典學習的隱子空間聚類算法的改進算法(ILSC)。該算法在字典學習階段利用一小部分訓練樣本的標簽信息,改進字典學習模型,除了原有的重構誤差項外新增稀疏編碼誤差項,構造出具有判別性的自適應字典,使得信號的稀疏表示更加準確,進而提高算法的聚類精度。4.ILSC算法為了增強字典判別性而新增了兩個誤差項,導致字典學習階段的耗時也成倍增加,針對此缺陷,提出了一種基于增量式字典訓練算法的ILSC算法的改進算法I2LSC。該算法引入增量式算法的思想,每次讀取一小撮訓練數據,增量式的更新字典及相應誤差項,在保證字典判別性的同時大大縮減字典學習階段的耗時。
[Abstract]:Clustering analysis, as a tool of data analysis, refers to the analysis process in which abstract data objects are aggregated to form multiple clusters. Cluster analysis has a wide range of applications in pattern recognition, machine learning, document retrieval, data mining and other fields. In recent years, with the popularity of the network and the development of computer image technology, a large number of image and video data have been added in the industry, and with the increasing requirements for video image clarity, hundreds of TB high-dimensional data have emerged. Most of the traditional clustering algorithms are designed for low-dimensional data, so it is difficult to deal with high-dimensional data efficiently. Subspace clustering algorithm, as an extension of traditional clustering algorithm, is an effective way to deal with high-dimensional data clustering. The main research content of this paper is to improve the hidden subspace clustering algorithm based on sparse representation, and then improve the clustering performance of the algorithm. The specific contents are as follows: 1. The basic principles of sparse representation model and dictionary learning model are introduced in detail, and the steps, advantages and disadvantages of some classical algorithms in sparse representation field and dictionary learning field are explained respectively, including MP,OMP,MOD,KSVD and so on. Then some background knowledge of subspace clustering and spectral clustering is introduced, and the algorithm flow of spectral clustering is deduced in detail, which lays the foundation for the improvement of the algorithm. 2. This paper comprehensively expounds a subspace clustering algorithm based on spectral clustering, sparse representation and dictionary learning, that is, hidden subspace clustering algorithm (LSC), and introduces in detail the main idea of the algorithm and the related derivation process. In order to solve the problem that the training dictionary of hidden subspace clustering algorithm is lack of stability and discrimination, an improved hidden subspace clustering algorithm based on discriminant dictionary learning, (ILSC)., is proposed. In the dictionary learning stage, the algorithm improves the dictionary learning model by using a small part of the label information of the training samples. In addition to the original reconstruction error term, the sparse coding error term is added to construct the discriminant adaptive dictionary, which makes the sparse representation of the signal more accurate, and then improves the clustering accuracy of the algorithm. 4. ILSC algorithm adds two error items to enhance dictionary discrimination. In order to solve this problem, an improved ILSC algorithm I2LSC based on incremental dictionary training algorithm is proposed. The algorithm introduces the idea of incremental algorithm, reads a handful of training data at a time, updates the dictionary and the corresponding error items incrementally, which greatly reduces the time consuming in dictionary learning stage while ensuring dictionary discrimination.
【學位授予單位】:江南大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13
本文編號:2498080
[Abstract]:Clustering analysis, as a tool of data analysis, refers to the analysis process in which abstract data objects are aggregated to form multiple clusters. Cluster analysis has a wide range of applications in pattern recognition, machine learning, document retrieval, data mining and other fields. In recent years, with the popularity of the network and the development of computer image technology, a large number of image and video data have been added in the industry, and with the increasing requirements for video image clarity, hundreds of TB high-dimensional data have emerged. Most of the traditional clustering algorithms are designed for low-dimensional data, so it is difficult to deal with high-dimensional data efficiently. Subspace clustering algorithm, as an extension of traditional clustering algorithm, is an effective way to deal with high-dimensional data clustering. The main research content of this paper is to improve the hidden subspace clustering algorithm based on sparse representation, and then improve the clustering performance of the algorithm. The specific contents are as follows: 1. The basic principles of sparse representation model and dictionary learning model are introduced in detail, and the steps, advantages and disadvantages of some classical algorithms in sparse representation field and dictionary learning field are explained respectively, including MP,OMP,MOD,KSVD and so on. Then some background knowledge of subspace clustering and spectral clustering is introduced, and the algorithm flow of spectral clustering is deduced in detail, which lays the foundation for the improvement of the algorithm. 2. This paper comprehensively expounds a subspace clustering algorithm based on spectral clustering, sparse representation and dictionary learning, that is, hidden subspace clustering algorithm (LSC), and introduces in detail the main idea of the algorithm and the related derivation process. In order to solve the problem that the training dictionary of hidden subspace clustering algorithm is lack of stability and discrimination, an improved hidden subspace clustering algorithm based on discriminant dictionary learning, (ILSC)., is proposed. In the dictionary learning stage, the algorithm improves the dictionary learning model by using a small part of the label information of the training samples. In addition to the original reconstruction error term, the sparse coding error term is added to construct the discriminant adaptive dictionary, which makes the sparse representation of the signal more accurate, and then improves the clustering accuracy of the algorithm. 4. ILSC algorithm adds two error items to enhance dictionary discrimination. In order to solve this problem, an improved ILSC algorithm I2LSC based on incremental dictionary training algorithm is proposed. The algorithm introduces the idea of incremental algorithm, reads a handful of training data at a time, updates the dictionary and the corresponding error items incrementally, which greatly reduces the time consuming in dictionary learning stage while ensuring dictionary discrimination.
【學位授予單位】:江南大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13
【參考文獻】
相關期刊論文 前3條
1 李滔;王士同;;適合大規(guī)模數據集的增量式模糊聚類算法[J];智能系統學報;2016年02期
2 王衛(wèi)衛(wèi);李小平;馮象初;王斯琪;;稀疏子空間聚類綜述[J];自動化學報;2015年08期
3 蔡曉妍;戴冠中;楊黎斌;;譜聚類算法綜述[J];計算機科學;2008年07期
相關碩士學位論文 前6條
1 郭新海;基于稀疏表示和低秩矩陣分解的人臉識別與圖像對齊方法研究[D];北京交通大學;2015年
2 付賽男;基于特征降維的場景分類方法研究[D];上海交通大學;2013年
3 王孟月;視覺對象分類:多核多示例學習[D];中國科學技術大學;2011年
4 雷洋;壓縮感知OMP重構算法稀疏字典中匹配原子的選擇方法[D];華南理工大學;2011年
5 趙曉娟;手寫體數字及英文字符的識別研究[D];東北師范大學;2010年
6 席秋波;基于Ncut的圖像分割算法研究[D];電子科技大學;2010年
,本文編號:2498080
本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2498080.html