基于標簽傳播算法的社區(qū)發(fā)現(xiàn)新算法
發(fā)布時間:2018-08-12 19:39
【摘要】:互聯(lián)網(wǎng)近年來的飛速發(fā)展,造就了一批以社交為主的網(wǎng)站,其中國外的Facebook,Twitter,Google+,國內(nèi)的QQ空間,豆瓣,人人等最為流行。這些社交網(wǎng)站每天都會有大量的用戶使用,并且產(chǎn)生大量的分享數(shù)據(jù),建立新的朋友關(guān)系等。對于這些數(shù)據(jù)來說,具有很高的利用價值,比如網(wǎng)絡(luò)營銷,輿情分析等。因此對這些數(shù)據(jù)的處理方式尤為重要,其中社區(qū)發(fā)現(xiàn)是研究熱點。本文主要是針對目前常用的社區(qū)發(fā)現(xiàn)算法進行改進。最常用的標簽傳播算法LPA的思想是在初始情況下,網(wǎng)絡(luò)中的每個節(jié)點都被初始化為唯一的標簽,在迭代更新每個節(jié)點的標簽時,節(jié)點是根據(jù)其鄰居節(jié)點中標簽個數(shù)最多的作為更新標簽,如果標簽個數(shù)最多的標簽并不是唯一的,那么隨機從中選擇一個標簽來更新當前節(jié)點,最終達到收斂或者震蕩,該算法停止。由于該算法的思想和實現(xiàn)過程導致該算法有一些缺點,例如不穩(wěn)定性,發(fā)現(xiàn)的社區(qū)要么是巨型社區(qū),要么就是無意義的小型社區(qū),分布極其不均勻并且對網(wǎng)絡(luò)的結(jié)構(gòu)比較敏感,二分網(wǎng)絡(luò)情況下會發(fā)生循環(huán)震蕩。針對上述LPA算法的一系列缺點,本文提出了LAAPA(label-attributeattenuation progagation algorithm),基于標簽屬性和衰減因素的社區(qū)發(fā)現(xiàn)算法。該算法引人了傳播衰減因子和節(jié)點屬性,其中傳播衰減因子顧名思義就是節(jié)點標簽的傳播距離是有限的,并且隨著距離的增加節(jié)點標簽的影響力逐漸降低,更新標簽的權(quán)重也隨之減少;節(jié)點屬性是指在社交網(wǎng)絡(luò)中節(jié)點之間的關(guān)系并不僅僅是關(guān)注與被關(guān)注這種簡單的傳統(tǒng)上的“邊”,為了與實際情況更加吻合本文提出了節(jié)點屬性,具體是指將節(jié)點的其他屬性比如豆瓣網(wǎng)中用戶加入的“小組”作為節(jié)點的屬性,節(jié)點之間的相同屬性將會反映到節(jié)點之間的邊上,使用權(quán)值來表示。在算法迭代中,節(jié)點更新標簽時,將考慮鄰居節(jié)點的標簽傳播距離,節(jié)點之間邊的權(quán)值,節(jié)點的度等因素。本文通過兩組標準數(shù)據(jù)對提出的LAAPA算法和LPA算法進行比較,在社區(qū)大小,模塊度等方面LAAPA算法發(fā)現(xiàn)的社區(qū)比LPA算法效果好。在使用Scrapy抓取的豆瓣網(wǎng)數(shù)據(jù)中,經(jīng)過清洗格式化后驗證了該數(shù)據(jù)符合社交網(wǎng)絡(luò)的特性,“小世界”,節(jié)點中心性等。利用兩種社區(qū)發(fā)現(xiàn)算法進行社區(qū)發(fā)現(xiàn),進行對比,結(jié)果也是LAAPA算法發(fā)現(xiàn)的社區(qū)質(zhì)量比LPA算法質(zhì)量高,并且穩(wěn)定,社區(qū)分布均勻,沒有巨型社區(qū)的出現(xiàn)。
[Abstract]:The rapid development of the Internet in recent years has created a number of social-oriented websites, including Facebook Twitter Google, domestic QZone, Douban, everyone and so on the most popular. These social networking sites are used by a large number of users every day and generate a lot of data sharing, new friendships and so on. For these data, it has a high value, such as network marketing, public opinion analysis and so on. Therefore, the processing of these data is particularly important, among which community discovery is a hot topic. This paper is mainly aimed at the improvement of community discovery algorithm commonly used at present. The idea of LPA, the most commonly used tag propagation algorithm, is that, initially, each node in the network is initialized as a unique tag, and each node's label is updated iteratively. The node updates the label according to the number of tags in the neighbor node. If the label with the largest number of tags is not unique, then randomly select a label from which to update the current node, and finally achieve convergence or oscillation. The algorithm stops. As a result of the thought and implementation process of the algorithm, the algorithm has some disadvantages, such as instability, the communities found are either giant communities or meaningless small communities, which are extremely unevenly distributed and sensitive to the structure of the network. In the case of binary networks, cyclic oscillations occur. In view of a series of shortcomings of the above LPA algorithm, this paper proposes a community discovery algorithm based on label attributes and attenuation factors for label-attributeattenuation progagation algorithm), (label-attributeattenuation progagation algorithm),). The algorithm induces propagation attenuation factor and node attribute, in which the propagation attenuation factor is, as the name implies, that the propagation distance of the node label is limited, and the influence of the node label decreases gradually with the increase of the distance. The weight of updating tags is also reduced; the node attribute refers to the relationship between nodes in the social network is not only concerned with and be concerned about this simple traditional "edge", in order to more consistent with the actual situation, this paper proposed node attributes. It refers to the other attributes of nodes, such as the "group" added by users in the Douban network, and the same attributes between nodes will be reflected to the edge of the nodes, and the value of the right to use will be used to express them. In the iteration of the algorithm, the label propagation distance of the neighbor node, the weight of the edge between nodes, the degree of the node and so on will be taken into account when the node updates the label. This paper compares the proposed LAAPA algorithm with LPA algorithm through two sets of standard data. The community found by LAAPA algorithm is better than that of LPA algorithm in community size, module degree and so on. In the data collected by Scrapy, it is verified that the data conforms to the characteristics of social network, "small world", node centrality and so on. Compared with the two community discovery algorithms, the results show that the community quality of LAAPA algorithm is higher than that of LPA algorithm, and the quality of community is stable, the community distribution is even, and there is no giant community.
【學位授予單位】:西安電子科技大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP301.6
本文編號:2180147
[Abstract]:The rapid development of the Internet in recent years has created a number of social-oriented websites, including Facebook Twitter Google, domestic QZone, Douban, everyone and so on the most popular. These social networking sites are used by a large number of users every day and generate a lot of data sharing, new friendships and so on. For these data, it has a high value, such as network marketing, public opinion analysis and so on. Therefore, the processing of these data is particularly important, among which community discovery is a hot topic. This paper is mainly aimed at the improvement of community discovery algorithm commonly used at present. The idea of LPA, the most commonly used tag propagation algorithm, is that, initially, each node in the network is initialized as a unique tag, and each node's label is updated iteratively. The node updates the label according to the number of tags in the neighbor node. If the label with the largest number of tags is not unique, then randomly select a label from which to update the current node, and finally achieve convergence or oscillation. The algorithm stops. As a result of the thought and implementation process of the algorithm, the algorithm has some disadvantages, such as instability, the communities found are either giant communities or meaningless small communities, which are extremely unevenly distributed and sensitive to the structure of the network. In the case of binary networks, cyclic oscillations occur. In view of a series of shortcomings of the above LPA algorithm, this paper proposes a community discovery algorithm based on label attributes and attenuation factors for label-attributeattenuation progagation algorithm), (label-attributeattenuation progagation algorithm),). The algorithm induces propagation attenuation factor and node attribute, in which the propagation attenuation factor is, as the name implies, that the propagation distance of the node label is limited, and the influence of the node label decreases gradually with the increase of the distance. The weight of updating tags is also reduced; the node attribute refers to the relationship between nodes in the social network is not only concerned with and be concerned about this simple traditional "edge", in order to more consistent with the actual situation, this paper proposed node attributes. It refers to the other attributes of nodes, such as the "group" added by users in the Douban network, and the same attributes between nodes will be reflected to the edge of the nodes, and the value of the right to use will be used to express them. In the iteration of the algorithm, the label propagation distance of the neighbor node, the weight of the edge between nodes, the degree of the node and so on will be taken into account when the node updates the label. This paper compares the proposed LAAPA algorithm with LPA algorithm through two sets of standard data. The community found by LAAPA algorithm is better than that of LPA algorithm in community size, module degree and so on. In the data collected by Scrapy, it is verified that the data conforms to the characteristics of social network, "small world", node centrality and so on. Compared with the two community discovery algorithms, the results show that the community quality of LAAPA algorithm is higher than that of LPA algorithm, and the quality of community is stable, the community distribution is even, and there is no giant community.
【學位授予單位】:西安電子科技大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP301.6
【參考文獻】
相關(guān)期刊論文 前6條
1 梁宗文;楊帆;李建平;;基于節(jié)點相似性度量的社團結(jié)構(gòu)劃分方法[J];計算機應(yīng)用;2015年05期
2 陽廣元;曹霞;甯佐斌;潘煦;;國內(nèi)社區(qū)發(fā)現(xiàn)研究進展[J];情報資料工作;2014年02期
3 武志昊;林友芳;Steve Gregory;萬懷宇School of Computer and Information Technology,Beijing Jiaotong University;田盛豐;;Balanced Multi-Label Propagation for Overlapping Community Detection in Social Networks[J];Journal of Computer Science & Technology;2012年03期
4 韓瑞凱;孟嗣儀;劉云;郭英慧;張彥超;;基于興趣相似度的社區(qū)結(jié)構(gòu)發(fā)現(xiàn)算法研究[J];鐵路計算機應(yīng)用;2010年10期
5 淦文燕;赫南;李德毅;王建民;;一種基于拓撲勢的網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)方法[J];軟件學報;2009年08期
6 沈華偉;程學旗;陳海強;劉悅;;基于信息瓶頸的社區(qū)發(fā)現(xiàn)[J];計算機學報;2008年04期
,本文編號:2180147
本文鏈接:http://www.sikaile.net/guanlilunwen/yingxiaoguanlilunwen/2180147.html
最近更新
教材專著