基于自然語言理解的全文搜索研究
本文關(guān)鍵詞: 自然語言理解 倒排索引 全文搜索 中文分詞 局部索引 出處:《湖北大學(xué)》2013年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著網(wǎng)絡(luò)技術(shù)的發(fā)展,網(wǎng)絡(luò)中存在的信息量也越來越大,如何高效、快速、準(zhǔn)確地從龐大的信息海中獲取到滿足要求的信息已經(jīng)成為人們重點(diǎn)關(guān)注的問題。傳統(tǒng)的信息檢索技術(shù)僅僅是從關(guān)鍵字的角度出發(fā)進(jìn)行信息的機(jī)械匹配,現(xiàn)在越來越多的人已經(jīng)開始將自然語言與搜索引擎技術(shù)結(jié)合研究,探索智能搜索引擎的開發(fā)。本文分析研究了信息檢索技術(shù)中比較主流的全文搜索技術(shù),全文搜索技術(shù)對非結(jié)構(gòu)化文本的處理就是將文檔中的所有內(nèi)容作為研究對象,經(jīng)過文本處理得到可以被索引的純文本信息,然后對文本信息分詞建立索引形成索引庫,當(dāng)有用戶進(jìn)行信息檢索時(shí),對用戶輸入的關(guān)鍵字進(jìn)行一定的處理再與索引庫中的索引關(guān)鍵字進(jìn)行匹配,從索引庫中提取出滿足用戶要求的信息。在全文搜索技術(shù)的基礎(chǔ)上,加入自然語言理解的中文分詞處理層次,具體的研究內(nèi)容和成果如下: ①分析研究了全文搜索、自然語言理解的關(guān)鍵原理及處理機(jī)制,在理論的基礎(chǔ)上,結(jié)合SS (Struts+Spring)框架開發(fā)出一個(gè)基于自然語言理解全切分中文分詞的全文搜索系統(tǒng)原型,此系統(tǒng)原型是針對目前各種典型非結(jié)構(gòu)化文檔的全部內(nèi)容進(jìn)行文本預(yù)處理、中文分詞、建立索引庫、在索引庫中進(jìn)行信息檢索; ②已開發(fā)出的系統(tǒng)原型對于文檔信息量較小的文檔庫進(jìn)行建立索引庫檢索信息的效率、準(zhǔn)確率都比較高。但是可以預(yù)想,當(dāng)文檔庫所包含的信息量非常大,對文檔全部內(nèi)容進(jìn)行預(yù)處理,再分詞建立索引庫,時(shí)空耗費(fèi)必然也相當(dāng)龐大。針對這一缺陷,本文提出了一種對文檔內(nèi)容建立局部索引的思想,并且在已開發(fā)完成的系統(tǒng)原型基礎(chǔ)上進(jìn)一步研究,比較兩種不同的文檔處理機(jī)制,經(jīng)過試驗(yàn),得出對文檔內(nèi)容建立局部索引在信息檢索領(lǐng)域是相當(dāng)有研究價(jià)值的。
[Abstract]:With the development of network technology, the amount of information in the network is increasing. It has become a focus of attention to get the information that meets the requirements from the huge information sea accurately. Traditional information retrieval technology is only to carry out the mechanical matching of information from the perspective of keywords. Now more and more people have begun to combine natural language and search engine technology to explore the development of intelligent search engine. Full-text search technology for unstructured text processing is to take all the contents of the document as the research object, through the text processing can be indexed pure text information, and then the text information participle is indexed to form an index library. When a user carries out information retrieval, the keywords entered by the user are processed and matched with the index keywords in the index library, and the information that meets the user's requirements is extracted from the index library. On the basis of full-text search technology, Add the Chinese word segmentation processing level of natural language understanding, the specific research contents and results are as follows:. 1. The key principles and processing mechanisms of full-text search and natural language understanding are analyzed and studied. On the basis of the theory, a full-text search system based on natural language understanding is developed based on the framework of SS Struts Spring. The prototype of the system is to carry out text preprocessing, Chinese word segmentation, index database and information retrieval in various typical unstructured documents. (2) the prototype of the developed system is more efficient and accurate in building index library for document library with less document information. However, it can be expected that when the document library contains a large amount of information, In order to preprocess all the contents of a document and build an index database with word segmentation, the cost of time and space is bound to be very large. In view of this defect, this paper puts forward a kind of idea of building local index to the document content. And on the basis of the system prototype that has been developed, this paper compares two different document processing mechanisms. Through experiments, it is concluded that local indexing of document content is of considerable value in the field of information retrieval.
【學(xué)位授予單位】:湖北大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 曹元大,賀海軍,涂哲明;中文Web文檔全文檢索系統(tǒng)的設(shè)計(jì)及實(shí)現(xiàn)[J];北京理工大學(xué)學(xué)報(bào);2002年01期
2 譚義紅;王鑫;周鐵軍;;基于概念檢索的中文搜索引擎的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)應(yīng)用與軟件;2006年05期
3 鄭倫衛(wèi);自然語言在全文檢索系統(tǒng)中的應(yīng)用及發(fā)展對策[J];圖書館理論與實(shí)踐;2004年03期
4 余海燕,張仲義;基于單漢字索引的全文檢索系統(tǒng)的優(yōu)化研究[J];中文信息學(xué)報(bào);2001年04期
5 王燦輝;張敏;馬少平;;自然語言處理在信息檢索中的應(yīng)用綜述[J];中文信息學(xué)報(bào);2007年02期
6 耿騫;賴茂生;;自然語言檢索的實(shí)現(xiàn)及其關(guān)鍵問題[J];情報(bào)科學(xué);2007年05期
7 何莘;王琬蕪;;自然語言檢索中的中文分詞技術(shù)研究進(jìn)展及應(yīng)用[J];情報(bào)科學(xué);2008年05期
8 張琪玉;自然語言檢索中各種因素對檢索效率的影響[J];情報(bào)理論與實(shí)踐;1997年05期
9 張琪玉;關(guān)于自然語言檢索問題[J];圖書館論壇;2004年06期
10 高琰,谷士文,譚立球,費(fèi)耀平;基于Lucene的搜索引擎設(shè)計(jì)與實(shí)現(xiàn)[J];微機(jī)發(fā)展;2004年10期
,本文編號:1542666
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1542666.html