演化聚類算法研究及其應用

發(fā)布時間：2018-05-17 05:30

本文選題：聚類 + 演化數(shù)據(jù)　；參考：《揚州大學》2017年碩士論文

【摘要】：聚類作為數(shù)據(jù)挖掘領域中一種非常有效的數(shù)據(jù)分析方法,得到了很多學者的研究,在模式識別、圖像處理、數(shù)據(jù)壓縮等領域得到了廣泛的應用。所謂聚類就是將數(shù)據(jù)對象分組成為多個類或簇(Cluster),其劃分原則是將具有較高的相似度的數(shù)據(jù)對象間劃分到同一個簇中,而相似度誤差較大的數(shù)據(jù)對象應劃分到不同的簇中。傳統(tǒng)的聚類算法只能針對一些靜態(tài)數(shù)據(jù)有很好的處理效果,而對于近年來引起機器學習和數(shù)據(jù)挖掘領域重點關注的演化數(shù)據(jù),更有待學者們的研究。由于演化數(shù)據(jù)是隨時間的推移數(shù)據(jù)分布會發(fā)生變化、有新數(shù)據(jù)的出現(xiàn)或舊數(shù)據(jù)的消亡,那么怎樣做到使每一時刻上的數(shù)據(jù)聚類性能盡可能的好,能夠基本正確地反映每一時刻的數(shù)據(jù)分布;通過聚類發(fā)掘數(shù)據(jù)的演化機制,例如聚類的出現(xiàn)、變化、分裂、消失等;還要使得聚類結果在時間上要盡可能平滑,使得當前時刻的聚類結果與前一時刻的聚類結果盡可能的相類似,已有小部分學者進行了研究。本文著重研究演化數(shù)據(jù)的聚類問題,研究了兩種無監(jiān)督的演化聚類算法和半監(jiān)督(帶約束)的演化聚類算法,并進行了簡單的應用。具體研究工作和成果如下:(1)本文提出了基于時間平滑性的演化聚類框架,其框架是在Chakrabarti等人提出的在線式框架基礎上進行修改完善得到的。除此之外,本文還對數(shù)據(jù)間的相似度矩陣作出了公式定義,相似度計算包括兩個部分之和:當前時刻數(shù)據(jù)間的相似度與時間序列上的相似度。最后,并將框架具體應用到標準譜聚類當中,得到兩種新的演化譜聚類算法并進行實驗驗證。(2)本文提出了演化的雙層隨機游走半監(jiān)督聚類算法,其算法是針對處理帶有約束信息的演化聚類的。原始的靜態(tài)雙層隨機游走半監(jiān)督聚類算法在處理不斷變化增加的數(shù)據(jù)時,會花費大量的時間與內存,并且不能得到很好的效果。本文在雙層隨機游走半監(jiān)督聚類算法的基礎上很好的利用之前時刻的信息,通過在高層隨機游走時求解組件間的兩兩相似度直接加入前一時刻舊數(shù)據(jù)信息,大大減少了計算的時間,更好的處理演化半監(jiān)督數(shù)據(jù),能夠得到較好的聚類結果。(3)本文設計了一種演化的人臉聚類系統(tǒng),此系統(tǒng)中的人臉聚類匹配是通過應用本文提出的演化聚類算法來處理的。本系統(tǒng)主要功能包括三個部分:人臉處理(演化聚類)、識別結果顯示、文件的管理。
[Abstract]:Clustering, as a very effective data analysis method in the field of data mining, has been studied by many scholars, and has been widely used in the fields of pattern recognition, image processing, data compression and so on. Clustering is the grouping of data objects into multiple classes or clusters. The principle of clustering is to divide data objects with high similarity into the same cluster, while data objects with large similarity errors should be divided into different clusters. The traditional clustering algorithm can only deal with some static data very well, but for the evolutionary data which has attracted the attention of the field of machine learning and data mining in recent years, it needs to be studied by scholars. Because evolutionary data change over time, new data emerge or old data die out, how to make the data clustering performance at every moment as good as possible, Can basically accurately reflect the data distribution at every moment; discover the evolution mechanism of data through clustering, such as the appearance, change, splitting, disappearing of clustering; and make the clustering results as smooth as possible in time. So that the clustering results at the present time are as similar as possible to those at the previous time, a small number of scholars have studied them. In this paper, we focus on the clustering of evolutionary data, and study two unsupervised evolutionary clustering algorithms and semi-supervised (constrained) evolutionary clustering algorithms, and make a simple application. In this paper, an evolutionary clustering framework based on time smoothness is proposed, which is modified and perfected on the basis of the on-line framework proposed by Chakrabarti et al. In addition, the similarity matrix of data is defined in this paper. The similarity calculation includes the sum of two parts: the similarity between data at current time and the similarity in time series. Finally, the framework is applied to the standard spectral clustering, and two new evolutionary spectral clustering algorithms are obtained and verified by experiments. (2) in this paper, an evolutionary double-layer random walk semi-supervised clustering algorithm is proposed. The algorithm is designed to deal with evolutionary clustering with constraint information. The original static double-layer random walk semi-supervised clustering algorithm spends a lot of time and memory on processing the ever-changing and increasing data, and it can not get a good effect. On the basis of two-layer random walk semi-supervised clustering algorithm, this paper makes good use of the information of the previous time, and directly adds the old data information of the previous moment by solving the similarity between components in the high-level random walk. This paper designs an evolutionary face clustering system, which greatly reduces the computing time and processes the evolution semi-supervised data better, and can get a better clustering result. The face clustering matching in this system is processed by applying the evolutionary clustering algorithm proposed in this paper. The main functions of the system include three parts: face processing (evolutionary clustering, recognition result display, file management).
【學位授予單位】：揚州大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP311.13

【相似文獻】

相關期刊論文前10條

1 孫旭東;;淺談算法學習[J];程序員;2008年02期

2 王本顏,王新國;平衡穿線排序樹刪除S算法[J];計算機工程與科學;1990年04期

3 陸萍;;一堂算法課上的小插曲[J];數(shù)學學習與研究;2010年23期

4 徐詩恒;聶幼三;柳波;;一種新的群組發(fā)現(xiàn)算法[J];計算機應用與軟件;2009年11期

5 張文彬,朱曉;一種帶加權調整的公平排隊算法[J];計算機工程與應用;2004年04期

6 陳長清,程懇;一種計算部分數(shù)據(jù)立方的算法[J];計算機工程與應用;2005年01期

7 姜秋霞;王中杰;;混合蟻群算法的研究及其應用[J];裝備制造技術;2008年02期

8 李慧,王備戰(zhàn),李濤,楊占華;一種改進的移動Agent主動通信算法[J];計算機應用研究;2005年11期

9 潘文斌;;邁進算法世界的大門[J];程序員;2006年04期

10 劉旭;吳灝;常艷;;基于窮舉策略的縮水算法[J];計算機工程與設計;2007年02期

相關會議論文前9條

1 潘瑾;嚴勇;王晨;方晨;汪衛(wèi);施伯樂;;Chopper:一個高效的有序標號樹頻繁結構的挖掘算法[A];第二十屆全國數(shù)據(jù)庫學術會議論文集（研究報告篇）[C];2003年

2 吳鐵峰;彭宏;張東娜;;一種網(wǎng)絡告警的增量挖掘算法[A];第二十一屆中國數(shù)據(jù)庫學術會議論文集（技術報告篇）[C];2004年

3 王玲芳;;大長度數(shù)的準確表示及其運算算法研究[A];中國聲學學會2003年青年學術會議[CYCA'03]論文集[C];2003年

4 趙元;張新長;康停軍;;基于多叉樹蟻群算法在區(qū)位選址中的應用[A];廣東省測繪學會第九次會員代表大會暨學術交流會論文集[C];2010年

5 趙元;張新長;康停軍;;基于多叉樹蟻群算法在區(qū)位選址中的應用[A];全國測繪科技信息網(wǎng)中南分網(wǎng)第二十四次學術信息交流會論文集[C];2010年

6 李杏;李中年;;M~2E~2算法的研究[A];04'中國企業(yè)自動化和信息化建設論壇暨中南六省區(qū)自動化學會學術年會專輯[C];2004年

7 張曉艷;唐吳;韓江洪;周雷;;多Agent系統(tǒng)連續(xù)時間Option算法[A];第二十九屆中國控制會議論文集[C];2010年

8 郭云峰;張集祥;;一種基于位向量的關聯(lián)規(guī)則挖掘算法[A];2008'中國信息技術與應用學術論壇論文集（一）[C];2008年

9 劉彤;孫永香;張振洪;;一種有效的基于密度和層次的聚類算法[A];2007'儀表，，自動化及先進集成技術大會論文集（一）[C];2007年

相關重要報紙文章前1條

1 新野縣第一高級中學校羅勤;算法思想在生活及數(shù)學學習中的滲透[N];學知報;2011年

相關博士學位論文前2條

1 張池軍;基于語義Web的LBS服務架構及其服務發(fā)現(xiàn)算法研究[D];吉林大學;2012年

2 陳文豪;X射線局部顯微CT偽全局算法及其應用研究[D];中國科學院研究生院（上海應用物理研究所）;2014年

相關碩士學位論文前10條

1 張小軍;高中數(shù)學算法思想及其滲透[D];四川師范大學;2015年

2 周將運;Massive MIMO系統(tǒng)的檢測算法研究[D];電子科技大學;2015年

3 朱霽悅;基于光束追蹤的高頻漸近算法及其應用[D];東南大學;2015年

4 李帥;面向服務質量的副本放置及更新算法[D];天津工業(yè)大學;2016年

5 王潤;影響力節(jié)點選擇算法研究及其在傳染病控制領域中的應用[D];東南大學;2015年

6 徐萍;機場圍界入侵目標移動視覺檢測算法研究[D];中國民航大學;2012年

7 王天華;基于改進的GBDT算法的乘客出行預測研究[D];大連理工大學;2016年

8 郗洋;基于云計算的并行聚類算法研究[D];南京郵電大學;2011年

9 王瑛岐;基于情感強度定律的社會情感優(yōu)化算法及應用研究[D];太原科技大學;2012年

10 鄭向瑜;改進的蟻群算法在移動Agent路徑選擇中的應用研究[D];江南大學;2009年

本文編號：1900132

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/1900132.html

上一篇：一種新的4階偏微分方程圖像處理方法
下一篇：微掃描紅外成像超分辨重建

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

演化聚類算法研究及其應用