天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

異構(gòu)社交網(wǎng)絡(luò)中社區(qū)發(fā)現(xiàn)算法研究

發(fā)布時間:2018-03-23 09:01

  本文選題:社區(qū)發(fā)現(xiàn) 切入點:異構(gòu)社交網(wǎng)絡(luò) 出處:《中國礦業(yè)大學(xué)(北京)》2016年博士論文 論文類型:學(xué)位論文


【摘要】:社會網(wǎng)絡(luò)中的社區(qū)發(fā)現(xiàn)作為數(shù)據(jù)挖掘研究領(lǐng)域的一個熱點,近幾年發(fā)展迅速,研究內(nèi)容主要集中在通過對網(wǎng)絡(luò)中存在的關(guān)系進行分析,得到社區(qū)劃分的結(jié)果。隨著Web2.0的興起和社交網(wǎng)絡(luò)的蓬勃發(fā)展,出現(xiàn)了多種新型的在線社交方式,單一的社會網(wǎng)絡(luò)關(guān)系結(jié)構(gòu)已經(jīng)不足以應(yīng)對解決現(xiàn)實世界的問題,所以學(xué)者們進一步提出了異構(gòu)社交網(wǎng)絡(luò)(Heterogeneous Social Networks)的概念。這是一個復(fù)雜的網(wǎng)絡(luò)抽象結(jié)構(gòu),網(wǎng)絡(luò)中通常包含多種關(guān)系和實體,這些不同的關(guān)系和實體組合形成了網(wǎng)絡(luò)的多樣化復(fù)雜結(jié)構(gòu)。如何處理這些復(fù)雜的結(jié)構(gòu)和獲取社區(qū)結(jié)構(gòu)信息,是對傳統(tǒng)的社會網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)的一個新的挑戰(zhàn)。本文將針對劃分異構(gòu)網(wǎng)絡(luò)過程中,多維度、多維復(fù)雜關(guān)系、多類型節(jié)點等特性所帶來的網(wǎng)絡(luò)數(shù)據(jù)重構(gòu)與降維問題;傳統(tǒng)研究僅僅局限于圖鏈接關(guān)系,并未考慮語義信息,也就是主題對劃分社區(qū)的幫助作用;再者傳統(tǒng)劃分算法需要預(yù)先知識和預(yù)先設(shè)定社區(qū)個數(shù)來得到劃分結(jié)果,但真實世界社交網(wǎng)絡(luò)中的社區(qū)個數(shù)往往是不可知的,尤其在大規(guī)模的社交網(wǎng)絡(luò)中預(yù)先知識不可知等問題展開研究。主要包括異構(gòu)社交網(wǎng)絡(luò)通用分析框架,基于標簽傳播近似線性的社區(qū)發(fā)現(xiàn)算法,基于主題感知的社區(qū)發(fā)現(xiàn)算法等,從而設(shè)計出高效快速的異構(gòu)網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)算法。研究內(nèi)容及創(chuàng)新點包括:1)提出了一種異構(gòu)社交網(wǎng)絡(luò)分析框架,針對異構(gòu)網(wǎng)絡(luò)進行數(shù)據(jù)重構(gòu),利用降維方法得到同構(gòu)網(wǎng)絡(luò)或者二分圖,然后使用社區(qū)發(fā)現(xiàn)算法對同構(gòu)網(wǎng)絡(luò)或者二分圖進行社區(qū)劃分,從而將異構(gòu)網(wǎng)絡(luò)社區(qū)劃分問題進行有效轉(zhuǎn)化。2)將多維異構(gòu)網(wǎng)絡(luò)轉(zhuǎn)化為同構(gòu)網(wǎng)絡(luò)后,提出了一種并行種子擴展算法PHSE,用來發(fā)現(xiàn)社交網(wǎng)絡(luò)中的重疊社區(qū)結(jié)構(gòu)。算法PHSE通過局部適應(yīng)度函數(shù)優(yōu)化和混合種子擴展策略來得到自然社區(qū)。相較于算法LFM,算法PHSE不僅在合成網(wǎng)絡(luò)中有非常好的劃分結(jié)果,同時在真實世界社交網(wǎng)絡(luò)中也有非常好的劃分結(jié)果。尤其,當合成網(wǎng)絡(luò)的節(jié)點重疊度高達On=50%時,依舊可以準確的劃分出重疊社區(qū)。3)提出了一種基于標簽傳播的社區(qū)發(fā)現(xiàn)算法iSLPA,同時支持有向圖、無向圖和二分圖的社區(qū)劃分,算法在迭代過程中采用標簽混合更新模式,并且在真實社交網(wǎng)絡(luò)數(shù)據(jù)集上表現(xiàn)出準確的社區(qū)劃分結(jié)果。4)提出了一種基于并行計算框架Dpark的標簽傳播算法HLPA,針對不同的網(wǎng)絡(luò)使用了不同的節(jié)點標簽初始化策略,包括有向圖、無向圖和二分圖,同時還使用了混合標簽更新策略使得算法更加穩(wěn)定,標簽衰減策略使得算法可以避免劃分出“monster”社區(qū),也可以使得較小的社區(qū)得到充分成長。相較于之前的基于標簽傳播的算法,算法HLPA在劃分benchmark基準真實社交網(wǎng)絡(luò)時表現(xiàn)出了非常高的準確性,而且在劃分大規(guī)模真實社交網(wǎng)絡(luò)時表現(xiàn)得非常有競爭力,大大提高算法效率,針對300萬節(jié)點、1.7億條邊的二分圖劃分社區(qū)只需要37.12分鐘,而且通過分析驗證了劃分出的社區(qū)是有真實含義的社區(qū)結(jié)構(gòu)。5)根據(jù)1)中提出的異構(gòu)社交網(wǎng)絡(luò)分析框架,提出了一種基于主題感知的異構(gòu)網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)算法,該算法通過對數(shù)據(jù)重構(gòu)將多模網(wǎng)絡(luò)轉(zhuǎn)化為二模網(wǎng)絡(luò)(用戶-文檔),采用算法LDA-light從異構(gòu)網(wǎng)絡(luò)轉(zhuǎn)化得到的二模網(wǎng)絡(luò)映射為帶權(quán)重的二分圖網(wǎng)絡(luò)(用戶-主題),采用新提出的帶權(quán)重的二分圖社區(qū)發(fā)現(xiàn)算法WLPA對二分圖進行社區(qū)劃分,最終將用戶和主題兩種不同的實體劃分在同一社區(qū)內(nèi),即劃分出的社區(qū)帶有語義信息,從而可以更好地進行社區(qū)結(jié)構(gòu)分析。本文提出的社區(qū)發(fā)現(xiàn)算法具有一般性,可以推廣到許多同構(gòu)或者異構(gòu)社交網(wǎng)絡(luò)和數(shù)據(jù)集,并且可以應(yīng)用到更廣泛的實際問題中。
[Abstract]:In a social network community discovery as the data mining is a hot research field, the rapid development in recent years, the study focused on the relationship existing in the network analysis, get the results of community division. With the rise of Web2.0 and social networks flourish, the emergence of a variety of new online socializing. The social network structure is not enough to cope with solve real-world problems, so scholars further proposed heterogeneous social network (Heterogeneous Social Networks) concept. It is a complex network abstract structure, the network usually contains a variety of relations and entities, and the relationship between these different entities combined to form a network diverse and complex structure. How to deal with the complicated structure and obtain the community structure information, is a new challenge to the traditional social network community discovery. This article will focus on the classification of heterogeneous networks in the process of multi dimension, multi dimension and complex relationship, and dimension of network data reconstruction caused by multiple types of nodes and other characteristics of the problem; the traditional study focused only on map links, did not consider the semantic information, also is the theme of community with the help of the division; in addition to the traditional division algorithm requires knowledge and advance set number of community division results get in advance, but the number of community real world social networks are often unknown, especially in large-scale social networks prior knowledge of unknown issues. The research mainly includes the analysis of the general framework of heterogeneous social networks, discovery algorithm approximate linear label propagation algorithm based on community, theme the perception of the community based on, to design a heterogeneous network community discovery algorithm is efficient and fast. The research contents and innovations include: 1) proposed a different The social network analysis framework for data reconstruction in heterogeneous networks, using dimensionality reduction method to get two points or homogeneous network graph, and then use the community discovery algorithm of homogeneous network community division or two figure, which will be the effective transformation of.2 community into the heterogeneous network) will be transformed into multidimensional heterogeneous network homogeneous network, is proposed an extended PHSE algorithm parallel seeds, used to find the overlapping community structure in social networks. The PHSE algorithm through local fitness function optimization and hybrid seed expansion strategy to get the natural community. Compared to the LFM algorithm, PHSE algorithm not only has very good classification results in the synthesis of the network, but also has very good classification results in the real world social networks. Especially, when the node network synthesis overlap is as high as On=50%, still can be accurately divided into overlapping community.3) put forward a ISLPA found the label propagation algorithm based on community, while supporting the directed graph, undirected graph and graph two community division algorithm using hybrid label update mode in the iterative process, and show the community classification results accurate.4 in the data set on the real social network) proposed a parallel HLPA label propagation algorithm framework based on Dpark, according to the different network using node label different initialization strategies, including directed graph, undirected graph and two points, while also using the hybrid tag update strategy which makes the algorithm more stable label attenuation strategy makes the algorithm can avoid the division of the "monster" community, also can make small the community has been fully grow. Compared to the previous label propagation algorithm based on HLPA algorithm in the division of benchmark benchmark real social network showed very high accuracy, but also in row Divided into large-scale real social networks is very competitive, greatly improve the efficiency of the algorithm for the 3 million node, two graph partitioning community 170 million edges need 37.12 minutes only, and verified through the analysis of the community is a community structure of.5) 1) according to the true meaning of the heterogeneous social network analysis framework proposed that presents a heterogeneous network community discovery algorithm based on perception of the subject, based on data reconstruction multimode networks into the network (the second mock exam user - document), use LDA-light algorithm to get the second mock exam network mapping transformation from heterogeneous network two network graph with weight (user topic), using the new with the weight of the two figure WLPA community discovery algorithm of community division two figure, will end users and theme two different entity in the same community, which is divided into the community with Semantic information can better analyze community structure. The community detection algorithm proposed in this paper is general, and can be extended to many isomorphic or heterogeneous social networks and data sets, and it can be applied to a wider range of practical problems.

【學(xué)位授予單位】:中國礦業(yè)大學(xué)(北京)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2016
【分類號】:O157.5;TP393.09

【參考文獻】

相關(guān)期刊論文 前5條

1 辛宇;楊靜;謝志強;;基于標簽傳播的語義重疊社區(qū)發(fā)現(xiàn)算法[J];自動化學(xué)報;2014年10期

2 王莉;程學(xué)旗;;在線社會網(wǎng)絡(luò)的動態(tài)社區(qū)發(fā)現(xiàn)及演化[J];計算機學(xué)報;2015年02期

3 張俊麗;常艷麗;師文;;標簽傳播算法理論及其應(yīng)用研究綜述[J];計算機應(yīng)用研究;2013年01期

4 劉欣;Tsuyoshi Murata;;Detecting Communities in K-Partite K-Uniform (Hyper)Networks[J];Journal of Computer Science & Technology;2011年05期

5 楊博;劉大有;金弟;馬海賓;;復(fù)雜網(wǎng)絡(luò)聚類方法[J];軟件學(xué)報;2009年01期

,

本文編號:1652785

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/shoufeilunwen/jckxbs/1652785.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶8dbca***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com