K-means聚類算法研究及應(yīng)用

發(fā)布時(shí)間：2018-06-18 19:06

本文選題：數(shù)據(jù)挖掘 + 聚類分析��；參考：《長(zhǎng)沙理工大學(xué)》2016年碩士論文

【摘要】：數(shù)據(jù)挖掘技術(shù)是一門融合了數(shù)據(jù)庫(kù)、機(jī)器學(xué)習(xí)和AI等眾多領(lǐng)域的交叉學(xué)科,能夠從無(wú)序、雜亂和大量的數(shù)據(jù)集中挖掘出我們所需要的信息。聚類分析是數(shù)據(jù)挖掘領(lǐng)域最為重要的技術(shù)之一,至今已在理論和方法上取得了豐碩的研究成果。聚類已經(jīng)被廣泛的研究了許多年,主要集中在基于距離的聚類算法上,其中以K-均值聚類算法最為經(jīng)典。K-means算法被認(rèn)為是聚類中最重要的無(wú)監(jiān)督機(jī)器學(xué)習(xí)方法。它是一種劃分聚類算法,將全部數(shù)據(jù)分為k個(gè)相互差異很大的子類,通過(guò)不斷的迭代,使得k-means算法中每個(gè)數(shù)據(jù)對(duì)象到其所在的子類的中心點(diǎn)的距離最小。由于k-means算法具有簡(jiǎn)單易行和效率高等優(yōu)點(diǎn),它被廣泛應(yīng)用于數(shù)據(jù)壓縮、圖像分割、市場(chǎng)營(yíng)銷、異常數(shù)據(jù)分析以及統(tǒng)計(jì)學(xué)科等領(lǐng)域。但是,k-means算法仍然存在一定的局限性,例如:對(duì)初始聚類中心極為敏感,如果初始聚類中心選擇不當(dāng),算法很容易陷入局部最優(yōu)解,而非全局最優(yōu)解。本文主要深入研究和分析了聚類算法中的經(jīng)典K-means聚類算法,總結(jié)出其優(yōu)點(diǎn)和不足。考慮到K-means算法簡(jiǎn)單、快速的特性,在本文中將K-means算法應(yīng)用于視頻目標(biāo)的跟蹤技術(shù)中。針對(duì)K-means算法對(duì)初始中心值選取的依賴性,我們提出了一種新的初始中心值選取方法,并用大量的實(shí)驗(yàn)驗(yàn)證了隨機(jī)選取初始值對(duì)聚類結(jié)果的影響性。具體研究?jī)?nèi)容和工作成果如下:(1)在本文中將K-means算法應(yīng)用于視頻目標(biāo)的跟蹤技術(shù)中,首先通過(guò)對(duì)視頻圖像的背景像素點(diǎn)建立樣本模型,然后利用聚類的特性對(duì)樣本進(jìn)行劃分,用來(lái)模擬物體的相關(guān)動(dòng)作特征。并根據(jù)背景像素點(diǎn)的樣本模型對(duì)視頻幀中的相關(guān)像素點(diǎn)進(jìn)行檢測(cè),檢查出圖像幀中存在的背景點(diǎn)。針對(duì)圖像中相關(guān)像素點(diǎn)在樣本模型中所劃分的類對(duì)樣本模型進(jìn)行更新,從而達(dá)到提高圖像背景檢測(cè)的有效性。(2)本文通過(guò)利用均值漂移的快速局部收斂特性和分區(qū)域的全局劃分特點(diǎn)對(duì)初始中心點(diǎn)進(jìn)行了優(yōu)化,能夠在一定程度上減少算法的整體迭代次數(shù),降低算法的復(fù)雜度,增強(qiáng)算法的全局性和穩(wěn)定性。實(shí)驗(yàn)結(jié)果證明,新的改進(jìn)算法能夠增強(qiáng)結(jié)果的穩(wěn)定性,提高數(shù)據(jù)聚類分組的準(zhǔn)確度。
[Abstract]:Data mining technology is an interdisciplinary subject which combines many fields such as database machine learning and AI. It can mine the information we need from disorder clutter and a large number of data sets. Clustering analysis is one of the most important techniques in the field of data mining. Clustering has been widely studied for many years, mainly focused on distance-based clustering algorithm, in which the K-means clustering algorithm is considered as the most important unsupervised machine learning method. It is a partitioning and clustering algorithm, which divides all data into k subclasses which are very different from each other. Through continuous iteration, the distance between each data object in k-means algorithm and the center point of its subclass is minimized. Because of its simplicity and high efficiency, k-means algorithm is widely used in data compression, image segmentation, marketing, abnormal data analysis and statistics. However, the k-means algorithm still has some limitations, for example, it is very sensitive to the initial clustering center. If the initial clustering center is not properly selected, the algorithm can easily fall into the local optimal solution rather than the global optimal solution. In this paper, the classical K-means clustering algorithm is studied and analyzed, and its advantages and disadvantages are summarized. Considering that K-means algorithm is simple and fast, K-means algorithm is applied to video target tracking technology in this paper. In view of the dependence of K-means algorithm on the selection of initial center value, we propose a new method for selecting initial center value, and verify the influence of random selection of initial value on clustering results by a large number of experiments. In this paper, K-means algorithm is applied to video target tracking technology. Firstly, the sample model is established through the background pixels of video image, and then the samples are divided by clustering characteristics. Used to simulate the related action characteristics of an object. Based on the sample model of background pixels, the correlation pixels in video frames are detected, and the background points in image frames are detected. The sample model is updated according to the classes divided in the sample model for the relevant pixel points in the image. In order to improve the effectiveness of image background detection, this paper optimizes the initial center by using the fast local convergence of mean shift and the global partition of sub-region. To some extent, it can reduce the number of global iterations, reduce the complexity of the algorithm, and enhance the global and stability of the algorithm. Experimental results show that the new algorithm can enhance the stability of the results and improve the accuracy of data clustering.
【學(xué)位授予單位】：長(zhǎng)沙理工大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2016
【分類號(hào)】：TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 黃紅偉;黃天民;;基于網(wǎng)格和圖論的初始聚類中心確定算法[J];計(jì)算機(jī)應(yīng)用與軟件;2015年02期

2 邢長(zhǎng)征;谷浩;;基于平均密度優(yōu)化初始聚類中心的k-means算法[J];計(jì)算機(jī)工程與應(yīng)用;2014年20期

3 楊玉珠;;數(shù)據(jù)挖掘技術(shù)綜述與應(yīng)用[J];河南科技;2014年19期

4 屈晶晶;辛云宏;;連續(xù)幀間差分與背景差分相融合的運(yùn)動(dòng)目標(biāo)檢測(cè)方法[J];光子學(xué)報(bào);2014年07期

5 周毅敏;李光耀;;一種根據(jù)決策樹結(jié)合信息論的經(jīng)典算法復(fù)雜度可能下界分析[J];計(jì)算機(jī)科學(xué);2013年S2期

6 郭紅建;黃兵;;潛在語(yǔ)義分析聚類算法在文摘句子排序中的應(yīng)用[J];計(jì)算機(jī)應(yīng)用研究;2013年11期

7 張靖;段富;;優(yōu)化初始聚類中心的改進(jìn)k-means算法[J];計(jì)算機(jī)工程與設(shè)計(jì);2013年05期

8 李思男;李寧;李戰(zhàn)懷;;多標(biāo)簽數(shù)據(jù)挖掘技術(shù):研究綜述[J];計(jì)算機(jī)科學(xué);2013年04期

9 黃宇達(dá);王迤冉;;基于樸素貝葉斯與ID3算法的決策樹分類[J];計(jì)算機(jī)工程;2012年14期

10 陳光平;王文鵬;黃俊;;一種改進(jìn)初始聚類中心選擇的K-means算法[J];小型微型計(jì)算機(jī)系統(tǒng);2012年06期

相關(guān)博士學(xué)位論文前1條

1 楊小兵;聚類分析中若干關(guān)鍵技術(shù)的研究[D];浙江大學(xué);2005年

相關(guān)碩士學(xué)位論文前2條

1 鄭杰;基于改進(jìn)人工蜂群的K均值混合聚類算法及其應(yīng)用[D];江西理工大學(xué);2015年

2 段明秀;層次聚類算法的研究及應(yīng)用[D];中南大學(xué);2009年

，

本文編號(hào)：2036554

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2036554.html

上一篇：網(wǎng)絡(luò)學(xué)習(xí)社區(qū)中基于對(duì)話的知識(shí)建構(gòu):理論與模型
下一篇：大數(shù)據(jù)聚類算法綜述

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

K-means聚類算法研究及應(yīng)用