基于相似性博客推薦技術(shù)的研究與應(yīng)用

發(fā)布時(shí)間：2018-11-11 15:02

【摘要】：隨著Web2.0的應(yīng)用，博客的傳播速度得到了前所未有的發(fā)展，使其擁有巨大的信息資源。在數(shù)目如此龐大的博客系統(tǒng)中，用戶想要找到自己最感興趣的博客或博文，，同時(shí)博主也想使自己的博客得到更高的訪問量，就顯得非常困難。博客搜索引擎的問世在一定程度上解決了這個(gè)問題，但是由于技術(shù)上以及對(duì)用戶要求上的原因，不能真正滿足用戶的需求。本文研究了目前常用的推薦算法，并對(duì)博主的社會(huì)信息和博文信息進(jìn)行分析，基于現(xiàn)有的技術(shù)設(shè)計(jì)了一種基于相似性的博客推薦算法，從博客的博文和博主的社會(huì)信息兩方面的相似性來計(jì)算研究博客的相似度。本文在算法設(shè)計(jì)之前先介紹了博客的博文相似性和博主社會(huì)信息相似性的概念，并闡述了采用相似性方法的優(yōu)點(diǎn)。構(gòu)造了博主社會(huì)信息相似度和博文信息相似度的計(jì)算公式，并把二者進(jìn)行了綜合來計(jì)算總的相似度，對(duì)相似性權(quán)重值的確定采用線性結(jié)合法，并結(jié)合參考文獻(xiàn)的內(nèi)容確定其大小。實(shí)驗(yàn)部分采用開源爬蟲工具(Heritrix)從新浪網(wǎng)上抓取相關(guān)的博客作為實(shí)驗(yàn)性數(shù)據(jù)，并對(duì)抓取回來的數(shù)據(jù)進(jìn)行處理，然后將相關(guān)的數(shù)據(jù)信息存儲(chǔ)到數(shù)據(jù)庫中。對(duì)于改進(jìn)的算法通過兩種評(píng)價(jià)標(biāo)準(zhǔn)進(jìn)行評(píng)估：一種是和文本算法對(duì)比準(zhǔn)確率，這種方法適合于計(jì)算機(jī)進(jìn)行自動(dòng)測評(píng)；另一種是通過人工參與的方法，對(duì)推薦的博客與目標(biāo)博客相似與否進(jìn)行判定。通過對(duì)實(shí)驗(yàn)結(jié)果進(jìn)行對(duì)比與分析，證明了改進(jìn)算法的有效性，為博客推薦提供了技術(shù)支持。
[Abstract]:With the application of Web2.0, the spreading speed of blog has been developed unprecedented, which makes it have huge information resources. In such a large number of blog systems, it is very difficult for users to find the blog or blog they are most interested in, and for bloggers to get more visitors to their blogs. The emergence of blog search engine solves this problem to some extent, but because of the technical and user requirements, it can not really meet the needs of users. In this paper, the commonly used recommendation algorithms are studied, and the social information and blog information of bloggers are analyzed. A blog recommendation algorithm based on similarity is designed based on existing technologies. The similarity of blog is calculated from the similarity of blog posts and social information of bloggers. Before the algorithm is designed, this paper introduces the concepts of blog similarity and social information similarity of bloggers, and expounds the advantages of using similarity method. In this paper, the formulas for calculating the similarity of social information and information of blog posts are constructed, and the total similarity is calculated by synthesizing them. The method of linear combination is used to determine the similarity weight. And combined with the content of reference to determine its size. In the experiment part, the open source crawler tool (Heritrix) is used to capture the relevant blog data from Sina.com as experimental data, and then the relevant data information is stored in the database. The improved algorithm is evaluated by two evaluation criteria: one is to compare the accuracy with the text algorithm, this method is suitable for computer automatic evaluation; The other is to judge whether the recommended blog is similar to the target blog by the method of artificial participation. The comparison and analysis of the experimental results prove the effectiveness of the improved algorithm and provide technical support for blog recommendation.
【學(xué)位授予單位】：內(nèi)蒙古科技大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2012
【分類號(hào)】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 楊丹;曹俊;;基于Web2.0的社會(huì)性標(biāo)簽推薦系統(tǒng)[J];重慶工學(xué)院學(xué)報(bào)(自然科學(xué)版);2008年07期

2 唐遠(yuǎn)洋,黃爾嘉;知識(shí)挖掘技術(shù)與網(wǎng)絡(luò)教育資源的組織[J];電化教育研究;2003年06期

3 陳春明;徐義峰;;協(xié)同過濾算法中一種改進(jìn)的相似性計(jì)算方法[J];桂林電子科技大學(xué)學(xué)報(bào);2009年03期

4 韓家煒,孟小峰,王靜,李盛恩;Web挖掘研究[J];計(jì)算機(jī)研究與發(fā)展;2001年04期

5 李曉明,朱家稷,閆宏飛;互聯(lián)網(wǎng)上主題信息的一種收集與處理模型及其應(yīng)用[J];計(jì)算機(jī)研究與發(fā)展;2003年12期

6 李峰;李軍懷;王瑞林;張t

本文編號(hào)：2325208

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2325208.html

上一篇：神經(jīng)網(wǎng)絡(luò)BP算法在網(wǎng)絡(luò)搜索中的應(yīng)用
下一篇：谷歌大中華區(qū)總裁:創(chuàng)新需要允許失敗的存在

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于相似性博客推薦技術(shù)的研究與應(yīng)用