Nutch中網(wǎng)頁排序效果的改進(jìn)方法
發(fā)布時(shí)間:2018-12-17 12:08
【摘要】:Nutch是一個(gè)Java實(shí)現(xiàn)的開源搜索引擎。針對(duì)目前Nutch對(duì)中文進(jìn)行單字切分且沒有實(shí)現(xiàn)PageRank計(jì)算的缺點(diǎn),改進(jìn)PageRank算法,設(shè)計(jì)并實(shí)現(xiàn)基于MapReduce的PageRank計(jì)算方法,對(duì)Nutch中文分詞進(jìn)行改進(jìn),加入JE中文分詞器。實(shí)驗(yàn)結(jié)果表明,改進(jìn)后的Nutch具有更高的查詢結(jié)果準(zhǔn)確率和中文網(wǎng)頁排序效果。
[Abstract]:Nutch is an open source search engine implemented by Java. In view of the disadvantage of Nutch segmentation of Chinese characters without PageRank computation, the PageRank algorithm is improved, the PageRank computing method based on MapReduce is designed and implemented, the Chinese word segmentation of Nutch is improved and JE Chinese word segmentation is added. The experimental results show that the improved Nutch has higher query accuracy and Chinese web page sorting effect.
【作者單位】: 廣西大學(xué)計(jì)算機(jī)與電子信息學(xué)院;
【基金】:廣西自然科學(xué)基金資助項(xiàng)目(桂科自0832059)
【分類號(hào)】:TP391.3
[Abstract]:Nutch is an open source search engine implemented by Java. In view of the disadvantage of Nutch segmentation of Chinese characters without PageRank computation, the PageRank algorithm is improved, the PageRank computing method based on MapReduce is designed and implemented, the Chinese word segmentation of Nutch is improved and JE Chinese word segmentation is added. The experimental results show that the improved Nutch has higher query accuracy and Chinese web page sorting effect.
【作者單位】: 廣西大學(xué)計(jì)算機(jī)與電子信息學(xué)院;
【基金】:廣西自然科學(xué)基金資助項(xiàng)目(桂科自0832059)
【分類號(hào)】:TP391.3
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 潘濤;梁正友;;Nutch中網(wǎng)頁排序效果的改進(jìn)方法[J];計(jì)算機(jī)工程;2010年13期
2 詹恒飛;楊岳湘;方宏;;Nutch分布式網(wǎng)絡(luò)爬蟲研究與優(yōu)化[J];計(jì)算機(jī)科學(xué)與探索;2011年01期
3 江務(wù)學(xué);張t,
本文編號(hào):2384178
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2384178.html
最近更新
教材專著