天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

社區(qū)網(wǎng)絡(luò)實時搜索引擎的研究

發(fā)布時間:2018-02-21 04:43

  本文關(guān)鍵詞: 搜索引擎 社區(qū)網(wǎng)絡(luò) 網(wǎng)絡(luò)爬蟲 全文搜索 出處:《哈爾濱工業(yè)大學(xué)》2012年碩士論文 論文類型:學(xué)位論文


【摘要】:隨著互聯(lián)網(wǎng)技術(shù)的不斷發(fā)展,出現(xiàn)了各式各樣具有很多豐富功能的網(wǎng)站,人們對網(wǎng)絡(luò)的需求也不只滿足與以往的看新聞,查資料,越來越多的人喜歡在網(wǎng)絡(luò)中記錄自己日常的生活,用簡短的狀態(tài)來表達(dá)自己的心情,或者對某種事情的看法。網(wǎng)絡(luò)不僅是一個展現(xiàn)數(shù)據(jù)的平臺,而且變成了展現(xiàn)用戶的一個窗口。 這部分由用戶所創(chuàng)造的數(shù)據(jù)與之前的經(jīng)過專業(yè)編輯創(chuàng)建的數(shù)據(jù)不同,其具有數(shù)據(jù)更自由,方式更靈活,內(nèi)容更豐富,角度更全面,響應(yīng)更迅速的特點,因此對這類數(shù)據(jù)的研究有著很大的意義。然而,當(dāng)前的搜索引擎因為一些技術(shù)上的一些限制很難有效地獲取這類數(shù)據(jù)。 文章將搜索引擎劃分為數(shù)據(jù)抓取,索引建立,查詢處理,數(shù)據(jù)展示四個模塊,分析了每個模塊在抓取這類數(shù)據(jù)時遇到的難題,并針對這些困難,提出了新的理論和解決方案。 在數(shù)據(jù)抓取部分,以往學(xué)術(shù)界認(rèn)為網(wǎng)頁的變化遵循泊松過程,而本論文分析了不同時間斷對網(wǎng)頁變化規(guī)律的影響,并利用用戶之間的相互親密度修正該變化規(guī)律,提出了新的網(wǎng)頁變化模型。在索引建立方面,,提出了使用多種索引的方式,不但提高了結(jié)果的時效性,并且可以支持時間段內(nèi)的統(tǒng)計數(shù)據(jù)查詢。在數(shù)據(jù)排序中,改進(jìn)了原有的以網(wǎng)頁為基礎(chǔ)的PageRank,考慮到了社區(qū)數(shù)據(jù)的新的屬性,評論和回復(fù),并且加入了用戶的重要程度作為排序的指標(biāo)。在數(shù)據(jù)的展示方面,提出了利用情緒將數(shù)據(jù)結(jié)果分類,以便于展示給用戶更直觀的數(shù)據(jù)。 其次本論文以這些解決方案為基礎(chǔ),設(shè)計并實現(xiàn)了一個新型的面向社區(qū)網(wǎng)絡(luò)的搜索引擎。文章的最后給出了實驗結(jié)果,驗證了系統(tǒng)具有很好的性能。
[Abstract]:With the continuous development of Internet technology, a variety of websites with a lot of rich functions have emerged. People's demand for the network is not only to meet the needs of the past, but also to read the news and check the materials. More and more people like to record their daily life on the Internet, to express their feelings in a brief state, or to view something. The Internet is not only a platform for displaying data. And become a window to show the user. This part of the data created by the user is different from the previous data created by professional editors. It has the characteristics of freer data, more flexible way, richer content, more comprehensive angle, and faster response. Therefore, the research on this kind of data has great significance. However, the current search engine is very difficult to obtain this kind of data effectively because of some technical limitations. In this paper, the search engine is divided into four modules: data capture, index building, query processing and data display. The difficulties encountered by each module in capturing such data are analyzed, and a new theory and solution are put forward in view of these difficulties. In the part of data capture, the academic circles used to think that the changes of web pages follow the Poisson process. However, this paper analyzes the influence of different time breaks on the changing rules of web pages, and uses the mutual affinity between users to correct the rule of change. A new web page change model is put forward. In the aspect of index building, the method of using multiple indexes is put forward, which not only improves the timeliness of the results, but also supports the query of statistical data in the time period. Improved the existing Page Rank-based page, taking into account the new attributes, comments and responses of community data, and added the importance of the user as a ranking indicator. In order to display more intuitionistic data to the user, the data result is classified by emotion. Secondly, based on these solutions, a new type of search engine for community network is designed and implemented in this paper. Finally, the experimental results are given to verify the good performance of the system.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 曲佳彬;;網(wǎng)絡(luò)信息檢索中常用檢索模型分析[J];產(chǎn)業(yè)與科技論壇;2010年03期

2 郭利剛;姚寒冰;;基于倒排索引的密文數(shù)據(jù)庫檢索方法研究[J];計算機(jī)安全;2010年09期

3 張小慢;;百度李彥宏[J];記者觀察(上半月);2009年05期

4 李衛(wèi)疆;趙鐵軍;;面向Blog的爬行算法[J];計算機(jī)工程與應(yīng)用;2008年31期

5 楊為民;李龍澍;;基于場論的高精度信息檢索研究[J];計算機(jī)工程;2011年15期

6 高峰;楊連賀;;Flex技術(shù)與Django開發(fā)框架的整合研究[J];計算機(jī)與數(shù)字工程;2010年01期

7 劉金紅;陸余良;;主題網(wǎng)絡(luò)爬蟲研究綜述[J];計算機(jī)應(yīng)用研究;2007年10期

8 王進(jìn)孝;搜索引擎與網(wǎng)絡(luò)信息資源檢索研究[J];情報理論與實踐;2002年04期

9 顧玲華;;基于搜索引擎發(fā)現(xiàn)技術(shù)的網(wǎng)頁存儲[J];蘇州大學(xué)學(xué)報(工科版);2011年02期

10 王玲;簡論搜索引擎及其應(yīng)用技巧[J];圖書館論壇;2005年02期



本文編號:1521029

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1521029.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶4f7b3***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com