天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于PSO-BP神經網絡的Lucene搜索引擎的研究

發(fā)布時間:2019-02-23 09:37
【摘要】:Lucene是一個全文搜索體系架構,具有優(yōu)異的索引結構、良好的系統(tǒng)架構以及高性能、可伸縮的信息搜索庫等優(yōu)點,但是對于中文分詞以及多種文本格式的支持卻很是不足。目前Lucene采用的中文分詞算法有很多,包括Lucene自身提供的StandardAnalyzer和CJKAnalyzer,以及第三方提供的ChineseAnalyzer和IK_CAnalyzer等等很多種中文分詞系統(tǒng)。其中,StandardAnalyzer是基于單字分詞的,即在對中文文本進行分詞時,以字為單位進行切分,其缺點是需要復雜的單字匹配算法,以及大量的CPU運算;CJKAnalyzer和ChineseAnalyzer采用的均是二分法,所謂二分法就是每每兩個字當作一個詞來切分;IK_CAnalyzer分詞技術是基于分詞詞典的,采用了特有的正向迭代最細粒度切分算法和多子處理器分析模式。目前,Lucene搜索引擎并未實現基于理解的中文分詞方法,因為計算機無法識別每個詞在不同語境中的含義,所以基于理解的分詞方法還未有實際的運用效果。 針對Lucene對中文分詞的不足,尤其是缺少基于理解領域的中文分詞技術等缺陷,本文探討了BP(Back Propagation)神經網絡算法在中文分詞中的應用研究,并針對BP神經網絡應用中文分詞具有收斂速度慢,容易陷入局部極小值以及速度和效率低等缺陷,提出了一種改進的微粒群優(yōu)化算法(PSO, Particle SwarmOptimization)優(yōu)化BP神經網絡——PSO-BP神經網絡,并將其運用于中文分詞中,與傳統(tǒng)的BP神經網絡相比較,可以得出PSO-BP神經網絡不僅解決了傳統(tǒng)BP神經網絡收斂速度慢的缺陷,同時也提高了分詞的精度。 然后,本文對Lucene提供的第三方中文分詞組件的API進行了系統(tǒng)地研究與分析,將經PSO-BP神經網絡優(yōu)化后的中文分詞技術成功應用于Lucene中,并與Lucene自帶的中文分詞技術進行比較,得出該技術明顯優(yōu)于自帶的中文分詞技術。 最后,,本文采用包含PSO-BP神經網絡中文分詞組件的Lucene進行搜索引擎的設計和實現,從而實現搜索引擎的中文分詞的智能化探索,為后續(xù)的工作和研究提供了一個良好的平臺。
[Abstract]:Lucene is a full-text search architecture with excellent index structure, good system architecture and high performance, scalable information search library. However, the support for Chinese word segmentation and various text formats is very inadequate. At present, there are many Chinese word segmentation algorithms used in Lucene, including StandardAnalyzer and CJKAnalyzer, provided by Lucene itself and ChineseAnalyzer and IK_CAnalyzer provided by third parties. Among them, StandardAnalyzer is based on word segmentation, that is to say, word segmentation is based on word segmentation. Its disadvantage is that it needs complex word matching algorithm and a large number of CPU operations. CJKAnalyzer and ChineseAnalyzer use dichotomy, so called dichotomy is each word as a word to divide; The word segmentation technology of IK_CAnalyzer is based on the word segmentation dictionary, and adopts the special forward iterative finest granularity segmentation algorithm and the analysis mode of multiple sub-processors. At present, the Lucene search engine has not realized the Chinese word segmentation method based on understanding, because the computer can not recognize the meaning of each word in different context, so the word segmentation method based on understanding has no practical application effect. In view of the deficiency of Lucene in Chinese word segmentation, especially the lack of Chinese word segmentation technology based on understanding, this paper discusses the application of BP (Back Propagation) neural network algorithm in Chinese word segmentation. Aiming at the shortcomings of BP neural network in the application of Chinese word segmentation, such as slow convergence, easy to fall into local minima, and low speed and efficiency, an improved particle swarm optimization algorithm (PSO,) is proposed. Particle SwarmOptimization) optimizes BP neural network, PSO-BP neural network, and applies it to Chinese word segmentation. Compared with traditional BP neural network, PSO-BP neural network not only solves the problem of slow convergence speed of traditional BP neural network. At the same time, the accuracy of word segmentation is improved. Then, the API of the third-party Chinese word segmentation component provided by Lucene is systematically studied and analyzed in this paper. The Chinese word segmentation technology optimized by PSO-BP neural network is successfully applied to Lucene, and compared with the Chinese word segmentation technology provided by Lucene. The result shows that this technique is superior to the Chinese word segmentation technology. Finally, this paper uses Lucene which includes PSO-BP neural network Chinese word segmentation component to design and implement the search engine, so as to realize the intelligent exploration of Chinese word segmentation of search engine, which provides a good platform for the follow-up work and research.
【學位授予單位】:中國石油大學(華東)
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP391.3;TP183

【參考文獻】

相關期刊論文 前10條

1 龔漢明,周長勝;漢語分詞技術綜述[J];北京機械工業(yè)學院學報;2004年03期

2 余華;曹亮;李啟元;;BP神經網絡算法的改進及其在手寫體漢字識別中的應用[J];江西師范大學學報(自然科學版);2009年05期

3 周平;;Lucene全文檢索引擎技術及應用[J];重慶工學院學報(自然科學版);2007年04期

4 于洪波;;中文分詞技術研究[J];東莞理工學院學報;2010年05期

5 張利;張立勇;張曉淼;耿鐵鎖;岳宗閣;;基于改進BP網絡的中文歧義字段分詞方法研究[J];大連理工大學學報;2007年01期

6 劉玲;嚴登俊;龔燈才;張紅梅;李大鵬;;基于粒子群模糊神經網絡的短期電力負荷預測[J];電力系統(tǒng)及其自動化學報;2006年03期

7 姚李孝,宋玲芳,李慶宇,萬詩新;基于模糊聚類分析與BP網絡的電力系統(tǒng)短期負荷預測[J];電網技術;2005年01期

8 丁麗;相玉紅;黃安民;張卓勇;;BP神經網絡與近紅外光譜定量預測杉木中的綜纖維素、木質素、微纖絲角[J];光譜學與光譜分析;2009年07期

9 王欣;葉華俊;黎慶濤;謝錦春;盧家炯;夏阿林;王健;;近紅外光譜結合人工神經網絡分析蔗汁的錘度和旋光度[J];光譜學與光譜分析;2010年07期

10 嚴文娟;張晶;胡廣芹;趙靜;林凌;陸小左;李剛;;BP神經網絡用于肝炎患者舌診近紅外光譜的研究[J];光譜學與光譜分析;2010年10期



本文編號:2428689

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2428689.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶bba29***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com