天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

圖挖掘算法的研究與應(yīng)用

發(fā)布時(shí)間:2018-04-14 00:16

  本文選題:圖挖掘 + 極大團(tuán) ; 參考:《北京郵電大學(xué)》2016年碩士論文


【摘要】:近年來(lái),隨著社交網(wǎng)絡(luò)的興起,圖挖掘越來(lái)越受到研究人員的重視,有關(guān)圖挖掘研究的論文在SigKDD、ICDM、SiamDM等會(huì)議中有逐年遞增的趨勢(shì)。極大團(tuán)挖掘是圖挖掘中的一個(gè)基本問(wèn)題,有著豐富的研究意義與極高的實(shí)踐價(jià)值。如進(jìn)行社交網(wǎng)絡(luò)分析、利用郵件網(wǎng)絡(luò)推斷社會(huì)等級(jí)、研究行為與認(rèn)知網(wǎng)絡(luò)的結(jié)構(gòu)、對(duì)金融網(wǎng)絡(luò)進(jìn)行統(tǒng)計(jì)分析、對(duì)動(dòng)態(tài)網(wǎng)絡(luò)進(jìn)行聚類(lèi)、偵測(cè)各種恐怖活動(dòng)和緊急事件等。本論文以極大團(tuán)算法的研究為切入點(diǎn),對(duì)這一基本的圖挖掘問(wèn)題進(jìn)行了理論上的研究并給出了其實(shí)踐中的應(yīng)用。論文的研究主要分為三大部分:1.研究并實(shí)現(xiàn)了 Base BK算法、Improved BK系列算法和Kose系列算法,對(duì)部分算法進(jìn)行對(duì)比實(shí)驗(yàn),分析算法的適用場(chǎng)景與時(shí)空優(yōu)劣。首先分析Base BK算法,這是最早的一個(gè)使用遞歸方式挖掘極大團(tuán)的經(jīng)典算法。進(jìn)而研究Improved BK系列算法,這些算法致力于解決Base BK算法遞歸空間樹(shù)規(guī)模過(guò)于龐大的問(wèn)題,提出了各種啟發(fā)式選擇標(biāo)志節(jié)點(diǎn)的方法對(duì)Base BK算法的遞歸空間樹(shù)進(jìn)行剪枝。其中著名的是最小計(jì)數(shù)器版本的Improved BK算法和隨機(jī)CANDIDATESNOT版本的Improved BK算法,論文都有理論分析與實(shí)現(xiàn)。也研究了 Kose系列算法,Kose算法使用迭代方式挖掘圖中的極大團(tuán),雖然避免了生成不必要的子團(tuán),卻導(dǎo)致內(nèi)存消耗急劇增大。由此衍生了 Kose RAM算法和Kose Disk算法,它們分別將中間數(shù)據(jù)存入內(nèi)存和硬盤(pán),所以耗費(fèi)的內(nèi)存不同。本論文研究并實(shí)現(xiàn)了 Kose RAM算法。最后取七個(gè)經(jīng)典的極大團(tuán)挖掘算法,在單進(jìn)程模式下實(shí)現(xiàn),在相同的實(shí)驗(yàn)環(huán)境下做了 1000多個(gè)實(shí)驗(yàn),根據(jù)實(shí)驗(yàn)結(jié)果反饋分析算法的原理并給出了不同環(huán)境下如何選擇極大團(tuán)挖掘算法的建議。2.研究FAMCELM算法,指出使其運(yùn)行失效的場(chǎng)景,進(jìn)而提出受限內(nèi)存下的解決方案,并以理論和實(shí)驗(yàn)證明算法的可行性。Fast Algorithms for Maximal Clique Enumeration with Limited Memory 論文提出了在大規(guī)模圖中挖掘極大團(tuán)一個(gè)分布式的算法。這一算法有著三大優(yōu)勢(shì):適用于內(nèi)存有限的場(chǎng)景、減少了挖掘極大團(tuán)的機(jī)器的CPU消耗、可以在并行環(huán)境下實(shí)現(xiàn)。研究中發(fā)現(xiàn),面對(duì)圖中存在度極大的節(jié)點(diǎn)的情形,論文中的SeqMCE算法會(huì)失效。分析了產(chǎn)生這個(gè)問(wèn)題的原因,給出了解決這一問(wèn)題的一個(gè)方案及嚴(yán)格的理論證明。在對(duì)SeqMCE做了其他的優(yōu)化后提出了 Improved SeqMCE算法,這一算法解決了遇到度極大的節(jié)點(diǎn)的問(wèn)題,代價(jià)則是重復(fù)挖掘了部分全局極大團(tuán)。最后在內(nèi)存受限的場(chǎng)景下運(yùn)行Improved SeqMCE算法,從而證明其在實(shí)踐中的可行性。3.依托于Improved BK算法和Improved SeqMCE算法,為解決統(tǒng)計(jì)段時(shí)間內(nèi)不同的發(fā)送垃圾短信的手機(jī)號(hào)碼總數(shù)的問(wèn)題,提出了基于先驗(yàn)概率的Hash算法——GHash算法。作為算法的預(yù)處理部分,首先需挖掘出等長(zhǎng)字符串(場(chǎng)景下則是手機(jī)號(hào)碼)的統(tǒng)計(jì)規(guī)律。論文將字符串的各個(gè)位置抽象成一個(gè)圖的節(jié)點(diǎn),創(chuàng)造性地將挖掘統(tǒng)計(jì)規(guī)律的問(wèn)題轉(zhuǎn)化成尋找極長(zhǎng)鏈的問(wèn)題,進(jìn)而轉(zhuǎn)化成在其傳遞閉包中挖掘極大團(tuán)的問(wèn)題。在使用Improved BK算法挖掘出圖中的極大團(tuán)后,再將節(jié)點(diǎn)映射回原字符串的相應(yīng)位置,從而得到原字符串的統(tǒng)計(jì)規(guī)律。為解決極大團(tuán)挖掘算法為NP問(wèn)題,算法耗時(shí)過(guò)久的問(wèn)題,又提出了通過(guò)計(jì)算信息熵從而得到統(tǒng)計(jì)規(guī)律的一個(gè)子集的替代方案,這在手機(jī)號(hào)碼的場(chǎng)景下運(yùn)行良好。在一個(gè)有著3000多萬(wàn)手機(jī)號(hào)碼的數(shù)據(jù)集上的實(shí)驗(yàn)表明,GHash算法所占用的內(nèi)存遠(yuǎn)小于使用Bitmap算法所占用的內(nèi)存;速度上則遠(yuǎn)快于使用AVL樹(shù)、紅黑樹(shù)和Trie樹(shù)完成檢索的場(chǎng)景;相比較布隆過(guò)濾器,GHash算法是1000%準(zhǔn)確的;相比較傳統(tǒng)的部分Hash算法,如djb2、fnv-1、sdbm等,GHash算法無(wú)論是運(yùn)行時(shí)間還是沖突數(shù),都有明顯的優(yōu)勢(shì)。
[Abstract]:In recent years, with the rise of social networks, graph mining researchers pay more attention to the related research, graph mining papers in SigKDD, ICDM, SiamDM etc. is increasing meeting. The maximum clique mining is a basic problem in graph mining, has abundant research significance and high practical value. Such as social network analysis, using e-mail network to infer the social hierarchy, structure of behavioral and cognitive networks, statistical analysis of the financial network, clustering of dynamic network, detection of various terrorist activities and emergencies. This thesis studies the maximum clique algorithm as the starting point, mining of the basic map the problem was presented on the theory research and its application in practice. The thesis is divided into three parts: 1. the research and implementation of Base BK algorithm, Improved algorithm and BK series Kose series of algorithm. Some algorithms are compared, the suitable scene analysis algorithm and spatial analysis of Base BK algorithm performance. First, this is a classic algorithm using the recursive method the earliest mining maximum. Then study the Improved series of BK algorithm, the algorithm to solve Base BK algorithm recursive tree space scale is too large, the the various heuristic node selection marker method of recursive space tree of Base BK algorithm for pruning. The famous Improved BK algorithm Improved BK algorithm and random version of the CANDIDATESNOT version of the minimum counter, the theory and implementation of the theoretical analysis. Also study the series of Kose algorithm, Kose algorithm using iterative method, mining maximal clique graph although, to avoid generating unnecessary sub group, but lead to memory consumption increases rapidly. The Kose derived from the RAM algorithm and Kose Disk algorithm, respectively. The intermediate data stored in memory and hard disk, so the cost of memory. The research and implementation of Kose RAM algorithm. The maximum clique finally take seven classical mining algorithm, implemented in a single process mode, more than 1000 experiments were done in the same experimental environment, according to the proposed.2. FAMCELM algorithm research results feedback analysis of the principle of the algorithm is given under different circumstances, how to choose the maximum clique algorithm, the operation failure of the scene, and then propose solutions under the limited memory, and by theoretical analysis and experimental results show the feasibility of the algorithm.Fast Algorithms for Maximal Clique Enumeration with Limited Memory is proposed in a large graph mining maximal clique a distributed algorithm. This algorithm has three major advantages: suitable for limited memory of the scene, reduce the mining maximal clique CPU machine consumption, can In the parallel environment. The study found that in the face of great presence of node degree in the case of SeqMCE algorithm in this paper will failure. Analysis of the causes of this problem, gives a solution to this problem and the strict theoretical proof. In the other optimization of SeqMCE is proposed Improved SeqMCE algorithm, this algorithm solves the problems encountered node degree greatly, it is repeated mining of partial global maximal clique. Finally, run the Improved SeqMCE algorithm in memory constrained scenarios, thus proving the feasibility in the practice of.3. based on Improved BK algorithm and Improved SeqMCE algorithm, for the total number of mobile phone number to send spam messages of different time. To solve the problem, Hash algorithm is proposed based on prior probability -- GHash algorithm. As a preprocessing algorithm, first of all need to dig out so long String (scenario is the mobile phone number) of the statistical law. This paper will abstract each position string nodes of a graph, creatively mining statistical regularity problem into finding the very long chain problem, and then transformed into mining maximal clique in the transitive closure of the problem. In the use of Improved BK algorithm the maximal cliques in graphs, then the corresponding node mapped back to the original string, so as to obtain the statistical regularity of the original string. In order to solve the maximum clique algorithm for NP problem, the algorithm is time-consuming too long, and put forward the alternative one by calculating the information entropy to obtain statistical rule sets, this running well in the mobile phone number of scenes. Show that in a about 30000000 mobile phone number of the data set on the experiment, the GHash algorithm is far less than the memory occupied by using the Bitmap algorithm the memory speed; Is far faster than the AVL tree, tree and Trie tree search scene; comparison of Bloom filter, GHash algorithm is 1000% accurate; part compared with the traditional Hash algorithm, such as djb2, fnv-1, sdbm, GHash algorithm both running time or the number of conflicts, have obvious advantages.

【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前8條

1 黃金;吳曉東;武紅斌;;哈希索引在交警專(zhuān)用移動(dòng)執(zhí)法終端數(shù)據(jù)檢索中的應(yīng)用研究[J];中國(guó)公共安全(學(xué)術(shù)版);2010年03期

2 肖波;徐前方;藺志青;郭軍;李春光;;可信關(guān)聯(lián)規(guī)則及其基于極大團(tuán)的挖掘算法[J];軟件學(xué)報(bào);2008年10期

3 辛向軍;李剛;董慶寬;肖國(guó)鎮(zhèn);;一個(gè)高效的隨機(jī)化的可驗(yàn)證加密簽名方案[J];電子學(xué)報(bào);2008年07期

4 李先通;李建中;高宏;;一種高效頻繁子圖挖掘算法[J];軟件學(xué)報(bào);2007年10期

5 唐德權(quán);夏耀穩(wěn);朱林立;夏幼明;;基于有向圖的關(guān)聯(lián)規(guī)則挖掘算法研究[J];云南大學(xué)學(xué)報(bào)(自然科學(xué)版);2006年S2期

6 馬根峰;哈希技術(shù)在廣東電信公話200話單處理中的應(yīng)用[J];廣東通信技術(shù);2003年07期

7 馬文平,王新梅;多發(fā)送認(rèn)證碼的幾個(gè)新的構(gòu)造方法[J];電子學(xué)報(bào);2000年04期

8 陳瀅,王能斌;半結(jié)構(gòu)化數(shù)據(jù)查詢(xún)的處理和優(yōu)化[J];軟件學(xué)報(bào);1999年08期

,

本文編號(hào):1746864

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/1746864.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)cbf69***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com