天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

一個(gè)垂直搜索系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-04-25 20:30

  本文選題:垂直搜索 + 主題爬蟲 ; 參考:《中山大學(xué)》2012年碩士論文


【摘要】:當(dāng)前互聯(lián)網(wǎng)中傳統(tǒng)的通用搜索引擎服務(wù)商提供給用戶搜索水平的海量信息,它的優(yōu)點(diǎn)就是能搜索到比較全面的信息,但是由于范圍過廣,很難兼顧搜索的準(zhǔn)確度,而且當(dāng)用戶需要某一領(lǐng)域行業(yè)信息時(shí),通用搜索引擎更不能較好地滿足要求,這時(shí),就可以借助面向領(lǐng)域的垂直搜索引擎,該類搜索引擎基于某一特定領(lǐng)域或者行業(yè),,對(duì)其中的信息進(jìn)行深度加工,提供給用戶更加準(zhǔn)確的信息。 本文以人們對(duì)當(dāng)前流行的平板電腦搜索需求為背景,研究和分析了垂直搜索引擎的關(guān)鍵技術(shù),設(shè)計(jì)并實(shí)現(xiàn)一個(gè)平板電腦領(lǐng)域的垂直搜索系統(tǒng)。文章首先分析了垂直搜索引擎中的主題爬蟲、信息抽取以及全文檢索等核心技術(shù),尤其是索引技術(shù)中的倒排索引以及Lucene開源全文檢索工具包。然后重點(diǎn)分析中文分詞這另一關(guān)鍵技術(shù),包括中文分詞的常用方法和算法;谧址ヅ浞衷~這一方法,在建立了平板電腦領(lǐng)域的基本詞典之后,采用基于前綴的逐字最大匹配算法,最終設(shè)計(jì)和實(shí)現(xiàn)了適合平板電腦領(lǐng)域的中文自動(dòng)分詞組件,并實(shí)現(xiàn)了Lucene分詞器接口。將其與其他一些開源的分詞系統(tǒng)相比較的結(jié)果表明,在該領(lǐng)域內(nèi),該中文分詞組件的切分詞準(zhǔn)確度較好。 基于這些關(guān)鍵理論和技術(shù),本文先對(duì)待實(shí)現(xiàn)的系統(tǒng)進(jìn)行了總體設(shè)計(jì),包括功能模塊劃分、采用的架構(gòu)、開發(fā)技術(shù)和環(huán)境。最后是系統(tǒng)的詳細(xì)設(shè)計(jì)和實(shí)現(xiàn),采用UML設(shè)計(jì)分析技術(shù)以及J2EE三層架構(gòu)思想,較為詳細(xì)地論述了利用Lucene來構(gòu)建垂直搜索系統(tǒng)的整個(gè)設(shè)計(jì)和實(shí)現(xiàn)過程。通過本系統(tǒng)與傳統(tǒng)搜索引擎在平板電腦產(chǎn)品上的搜索進(jìn)行比較,可以看出本系統(tǒng)在搜索結(jié)果的查準(zhǔn)度上具有一定的直觀優(yōu)勢(shì)。
[Abstract]:At present, the traditional universal search engine service provider in the Internet provides users with a large amount of information at the level of search. Its advantage is that it can search more comprehensive information, but because of its wide scope, it is difficult to take into account the accuracy of the search. And when users need industry information in a certain field, the general search engine can not meet the requirements better. In this case, we can use the vertical search engine of the domain, which is based on a specific field or industry. The information is further processed to provide users with more accurate information. In this paper, the key technology of vertical search engine is studied and analyzed, and a vertical search system in the field of tablet computer is designed and implemented under the background of people's demand for popular tablet computer search. This paper first analyzes the core technologies of vertical search engine, such as topic crawler, information extraction and full-text retrieval, especially inverted index and Lucene open source full-text retrieval toolkit. Then it analyzes the other key technology of Chinese word segmentation, including the common methods and algorithms of Chinese word segmentation. Based on the method of string matching word segmentation, after establishing the basic dictionary of tablet computer field, the Chinese automatic word segmentation component suitable for tablet computer domain is designed and implemented by using the word for word maximum matching algorithm based on prefix. The interface of Lucene word Segmentation is realized. Compared with other open source word segmentation systems, the results show that the segmentation accuracy of the Chinese word segmentation component is good in this field. Based on these key theories and technologies, this paper first introduces the overall design of the system, including functional module partition, architecture, development technology and environment. Finally, the system is designed and implemented in detail. The whole design and implementation process of vertical search system based on Lucene is discussed in detail by using UML design and analysis technology and J2EE three-tier architecture. Through the comparison between this system and the traditional search engine on the tablet computer products, we can see that this system has some intuitive advantages in the search result checking degree.
【學(xué)位授予單位】:中山大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 吳雅娟,柳培林 ,丁子睿;基于統(tǒng)計(jì)分詞的中文文本分類系統(tǒng)[J];電腦知識(shí)與技術(shù);2005年11期

2 孫鐵利;劉延吉;;中文分詞技術(shù)的研究現(xiàn)狀與困難[J];信息技術(shù);2009年07期

3 曹桂宏,何丕廉,吳光遠(yuǎn),聶頌;中文分詞對(duì)中文信息檢索系統(tǒng)性能的影響[J];計(jì)算機(jī)工程與應(yīng)用;2003年19期

4 劉遷;賈惠波;;中文信息處理中自動(dòng)分詞技術(shù)的研究與展望[J];計(jì)算機(jī)工程與應(yīng)用;2006年03期

5 李晶;陳恩紅;;Web信息抽取[J];計(jì)算機(jī)科學(xué);2003年06期

6 印鑒,陳憶群,張鋼;搜索引擎技術(shù)研究與發(fā)展[J];計(jì)算機(jī)工程;2005年14期

7 周登朋;謝康林;;Lucene搜索引擎[J];計(jì)算機(jī)工程;2007年18期

8 邵輝;李芳;;基于樹模型算法的動(dòng)態(tài)網(wǎng)頁信息抽取研究和實(shí)現(xiàn)[J];計(jì)算機(jī)應(yīng)用與軟件;2007年10期

9 劉暢;;綜合搜索引擎與垂直搜索引擎的比較研究[J];情報(bào)科學(xué);2007年01期

10 羅麗姍;;垂直搜索引擎發(fā)展概述[J];圖書館學(xué)研究;2006年12期

相關(guān)碩士學(xué)位論文 前1條

1 王曉偉;垂直搜索引擎若干關(guān)鍵技術(shù)的研究[D];浙江大學(xué);2007年



本文編號(hào):1802844

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1802844.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d5978***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com