天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

一種基于密度的分布式聚類方法

發(fā)布時(shí)間:2018-12-12 08:55
【摘要】:聚類是數(shù)據(jù)挖掘領(lǐng)域中的一種重要的數(shù)據(jù)分析方法.它根據(jù)數(shù)據(jù)間的相似度,將無(wú)標(biāo)注數(shù)據(jù)劃分為若干聚簇.CSDP是一種基于密度的聚類算法,當(dāng)數(shù)據(jù)量較大或數(shù)據(jù)維數(shù)較高時(shí),聚類的效率相對(duì)較低.為了提高聚類算法的效率,提出了一種基于密度的分布式聚類方法 MRCSDP,利用MapReduce框架對(duì)實(shí)驗(yàn)數(shù)據(jù)進(jìn)行聚類.該方法定義了獨(dú)立計(jì)算單元和獨(dú)立計(jì)算塊的概念.首先,將數(shù)據(jù)拆分為若干數(shù)據(jù)塊,構(gòu)建獨(dú)立計(jì)算單元和獨(dú)立計(jì)算塊,在集群中分配獨(dú)立計(jì)算塊的任務(wù);然后進(jìn)行分布式計(jì)算,得到數(shù)據(jù)塊的局部密度,將局部密度合并得到全局密度,根據(jù)全局密度計(jì)算中心值,由全局密度和中心值得到每個(gè)數(shù)據(jù)塊中候選聚簇中心;最后,從候選聚簇中心選舉出最終的聚簇中心.MRCSDP在充分降低時(shí)間復(fù)雜度的基礎(chǔ)上得到較好的聚類效果.實(shí)驗(yàn)結(jié)果表明,分布式環(huán)境下的聚類方法MRCSDP相對(duì)于CSDP更能快速、有效地處理大規(guī)模數(shù)據(jù),并使各節(jié)點(diǎn)負(fù)載均衡.
[Abstract]:Clustering is an important data analysis method in the field of data mining. CSDP is a density-based clustering algorithm, and the clustering efficiency is relatively low when the amount of data is large or the dimension of data is high. In order to improve the efficiency of the clustering algorithm, a density based distributed clustering method, MRCSDP, is proposed to cluster experimental data using the MapReduce framework. This method defines the concepts of independent computing unit and independent computing block. Firstly, the data is divided into several data blocks, the independent computing unit and the independent computing block are constructed, and the task of the independent computing block is assigned in the cluster. Then the local density of the data block is obtained by distributed computation, and the global density is combined to get the global density. According to the global density, the global density and center are worth to the candidate cluster center in each data block. Finally, the final cluster center is selected from the candidate cluster center. MRCSDP can get better clustering effect on the basis of fully reducing the time complexity. The experimental results show that the clustering method MRCSDP in distributed environment can deal with large scale data more quickly and effectively than CSDP and make each node load balance.
【作者單位】: 吉林大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院;吉林大學(xué)符號(hào)計(jì)算與知識(shí)工程教育部重點(diǎn)實(shí)驗(yàn)室;
【分類號(hào)】:TP311.13

【相似文獻(xiàn)】

相關(guān)會(huì)議論文 前1條

1 任瑞瑞;蔡正敏;楊菊生;;導(dǎo)向隨鉆測(cè)量?jī)x在扭-壓荷載下的強(qiáng)度校核[A];第14屆全國(guó)結(jié)構(gòu)工程學(xué)術(shù)會(huì)議論文集(第三冊(cè))[C];2005年

相關(guān)重要報(bào)紙文章 前1條

1 郭見(jiàn)冽;“分離”計(jì)算惹人盼[N];計(jì)算機(jī)世界;2002年

,

本文編號(hào):2374295

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2374295.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶654b1***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com