社交網(wǎng)絡(luò)中基于關(guān)系強度的用戶群體發(fā)現(xiàn)研究
發(fā)布時間:2018-06-08 14:46
本文選題:社交網(wǎng)絡(luò) + 用戶社會關(guān)系。 參考:《東華大學(xué)》2015年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)的飛速發(fā)展,各式各樣的社交網(wǎng)絡(luò)不停地涌現(xiàn)出來。作為一種新穎便捷的交友模式,社交網(wǎng)絡(luò)吸引了大量的用戶。越來越多的用戶通過社交網(wǎng)絡(luò)收集的各類資源信息來發(fā)表看法、交友等,國外知名社交網(wǎng)站Facebook每月活躍人數(shù)已經(jīng)達到11億人,國內(nèi)社交網(wǎng)絡(luò)代表新浪微博的用戶數(shù)已經(jīng)突破五億。面對日益增長的龐大數(shù)據(jù),無論用戶還是社交網(wǎng)絡(luò)的服務(wù)商都迫切需要解決一個問題:即如何尋找出與自己興趣愛好或者看法一致的人進行交流互動。用戶群體發(fā)現(xiàn)研究正是基于此目的而產(chǎn)生的,其目標是通過對社交網(wǎng)絡(luò)中的用戶關(guān)系圖進行挖掘,從中發(fā)現(xiàn)具有相似興趣的用戶群體,進而支持廣告投放、市場營銷、好友推薦等實際應(yīng)用。 傳統(tǒng)的用戶群體發(fā)現(xiàn)方法是基于社交網(wǎng)絡(luò)中用戶之間的原始關(guān)系圖,將用戶視為圖中的頂點,用戶間的關(guān)系作為圖的邊,通過對圖進行聚類分析從而獲得用戶的群體聚簇。這些傳統(tǒng)的方法未考慮到用戶關(guān)系的稀疏性,以及用戶關(guān)系在社交網(wǎng)絡(luò)與現(xiàn)實網(wǎng)絡(luò)中的差異。本文在發(fā)現(xiàn)用戶群體的過程中,一方面既考慮了用戶在各個主題上相似信息的總體分布,另一方面也考慮了主題熱門程度的差異對用戶關(guān)系的影響。結(jié)合以上兩個方面,本文給出了用戶關(guān)系強度的計算模型,通過該計算模型針對社交網(wǎng)絡(luò)的特點擴充了用戶關(guān)系,最后使用聚類分析實現(xiàn)用戶的群體發(fā)現(xiàn)。本文的具體工作內(nèi)容主要包括: 1)首先介紹了相關(guān)技術(shù),包括社交網(wǎng)絡(luò)的相關(guān)理論基礎(chǔ)、用戶關(guān)系強度的計算方法,,以及MapReduce編程模型與局部敏感哈希的基本思想。 2)接著闡述了一種通過構(gòu)建用戶特征同現(xiàn)向量,計算用戶關(guān)系強度的方法。該方法結(jié)合了多樣性指數(shù)以及權(quán)重頻率,從兩個相互獨立的角度,共同計算了用戶間的關(guān)系強度。 3)面對社交網(wǎng)絡(luò)的數(shù)據(jù)量挑戰(zhàn),將上述的計算過程通過MapReduce編程模型得以實現(xiàn),并在關(guān)系強度的計算結(jié)果基礎(chǔ)上,利用局部敏感哈希和MapReduce的特性實現(xiàn)了新的用戶關(guān)系圖上的用戶群體發(fā)現(xiàn)。 4)使用社交網(wǎng)站Last.fm所開放的端口獲取的數(shù)據(jù)進行實驗,并對模型的相關(guān)參數(shù)進行了估算。實驗結(jié)果從性能分析和可靠性分析上,證明了用戶關(guān)系強度計算及群體發(fā)現(xiàn)的可行性與實用性。
[Abstract]:With the rapid development of the Internet, a variety of social networks are emerging. As a new and convenient way to make friends, social networks attract a large number of users. More and more users are expressing their opinions and making friends through various resources collected by social networks. The number of people active on Facebook, a well-known foreign social network, has reached 1.1 billion a month. The number of users representing Sina Weibo on domestic social networks has exceeded 500 million. In the face of the growing volume of data, both users and social network service providers urgently need to solve a problem: how to find out how to interact with people who share their interests or views. The research of user group discovery is based on this purpose. Its goal is to find user groups with similar interests through mining user relationship diagrams in social networks, and then support advertising and marketing. The traditional method of user group discovery is based on the original graph of users in social network. The user is regarded as the vertex of the graph, and the relationship between users is regarded as the edge of the graph. The cluster of users is obtained by cluster analysis of graph. These traditional methods do not take into account the sparsity of user relationships and the differences between user relationships in social networks and real networks. In the process of discovering user groups, on the one hand, we consider the general distribution of users' similar information on each topic, on the other hand, we also consider the influence of the difference of topic popularity on user relationship. Combined with the above two aspects, this paper presents a computing model of user relationship strength, which extends the user relationship according to the characteristics of social network. Finally, cluster analysis is used to realize user group discovery. The main contents of this paper are as follows: 1) this paper first introduces the relevant technologies, including the relevant theoretical basis of social networks, the calculation method of user relationship intensity, And the basic idea of MapReduce programming model and local sensitive hashing. 2) then a method to calculate the strength of user relationship by constructing user feature co-occurrence vector is presented. This method combines diversity index and weight frequency, calculates the relationship strength between users from two independent angles. 3) facing the challenge of social network data, the above calculation process can be realized by MapReduce programming model. On the basis of the calculation results of the relationship strength, the new user group discovery on the user relationship diagram is realized by using the characteristics of local sensitive hashing and MapReduce. 4) the data obtained by the open port of Last.fm is used to carry out experiments. The related parameters of the model are estimated. The experimental results prove the feasibility and practicability of user relationship strength calculation and group discovery from performance analysis and reliability analysis.
【學(xué)位授予單位】:東華大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TP391.1
【參考文獻】
相關(guān)期刊論文 前7條
1 尹丹;高宏;鄒兆年;;一種新的高效圖聚集算法[J];計算機研究與發(fā)展;2011年10期
2 蔡曉妍;戴冠中;楊黎斌;;譜聚類算法綜述[J];計算機科學(xué);2008年07期
3 于海群;劉萬軍;邱云飛;;基于用戶話題偏好的社會網(wǎng)絡(luò)二級人脈推薦[J];計算機應(yīng)用;2012年05期
4 石晶;范猛;李萬龍;;基于LDA模型的主題分析[J];自動化學(xué)報;2009年12期
5 張艷桃;王國胤;于洪;;面向Folksonomy的用戶興趣相似性度量方法[J];南京大學(xué)學(xué)報(自然科學(xué)版);2013年05期
6 余學(xué)軍;;六度分割理論成就SNS[J];信息網(wǎng)絡(luò);2008年11期
7 馬宏偉;張光衛(wèi);李鵬;;協(xié)同過濾推薦算法綜述[J];小型微型計算機系統(tǒng);2009年07期
本文編號:1996161
本文鏈接:http://www.sikaile.net/guanlilunwen/yingxiaoguanlilunwen/1996161.html
最近更新
教材專著