基于搜索引擎的中文自動問答系統(tǒng)的設(shè)計與實現(xiàn)
發(fā)布時間:2017-12-28 10:40
本文關(guān)鍵詞:基于搜索引擎的中文自動問答系統(tǒng)的設(shè)計與實現(xiàn) 出處:《北京工業(yè)大學(xué)》2016年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 問答系統(tǒng) Site Q 多特征融合 語義依存樹 答案抽取
【摘要】:我們處于一個信息極其豐富的時代,人們對于快速準(zhǔn)確地獲取信息產(chǎn)生極大的需求。搜索引擎以其使用方便、反應(yīng)迅捷而備受人們歡迎,成為信息獲取的最主要方式。然而搜索引擎以關(guān)鍵詞的檢索方式很難清晰表達(dá)用戶的意圖,并且以網(wǎng)頁集合的方式返回結(jié)果仍然需要用戶自己手動查找答案。因此自動問答系統(tǒng)應(yīng)運而生,但是傳統(tǒng)的自動問答系統(tǒng)存在信息覆蓋不全面、更新不及時的缺陷,并且需要維護一個龐大的知識庫。為了發(fā)揮兩者的優(yōu)勢,本文擬將兩者結(jié)合起來,設(shè)計與實現(xiàn)一個改進(jìn)的基于搜索引擎的中文自動問答系統(tǒng)。本文的主要工作如下:(1)改進(jìn)Site Q算法,提出Topic-Site Q算法:首尾段落和首尾語句對語義有著較大的貢獻(xiàn),本文將其以恰當(dāng)權(quán)重融入Site Q算法,提出關(guān)聯(lián)首尾段落和首尾語句的多特征融合段落檢索算法Topic-Site Q:采用多特征融合的算法計算首尾語句的語義相似度,并以一定的權(quán)值體現(xiàn)它們對段落相關(guān)度的貢獻(xiàn),同時提高首尾段落的評分值,最后根據(jù)該評分值進(jìn)行段落排序并返回候選段落集。(2)改進(jìn)基于語義依存樹的答案抽取算法:基于語義依存樹的答案抽取算法主要對語義和語法結(jié)構(gòu)進(jìn)行考察,考察方式單一不夠全面。詞頻作為重要的語義特征之一,應(yīng)該在答案抽取算法中體現(xiàn)該特征。本文對基于語義依存樹的答案抽取算法進(jìn)行改進(jìn),將關(guān)鍵詞出現(xiàn)的頻率考慮進(jìn)去,利用對數(shù)線性模型將兩者融合在一起,提出改進(jìn)的基于語義依存樹的答案抽取算法。(3)設(shè)計并實現(xiàn)了一個改進(jìn)的基于搜索引擎的中文自動問答系統(tǒng),并根據(jù)改進(jìn)的兩個算法對該系統(tǒng)進(jìn)行優(yōu)化。先是詳細(xì)的分析了系統(tǒng)的需求,然后描述系統(tǒng)的總體結(jié)構(gòu)并給出系統(tǒng)結(jié)構(gòu)圖。在詳細(xì)設(shè)計與實現(xiàn)部分,分模塊詳細(xì)論述各個模塊的功能、處理流程、實現(xiàn)細(xì)節(jié)以及使用的核心算法及其改進(jìn)。(4)為了驗證提出的優(yōu)化方法的有效性,人工構(gòu)建問題測試集對算法和系統(tǒng)的改進(jìn)效果進(jìn)行實驗,計算兩個算法改進(jìn)前后以及系統(tǒng)使用改進(jìn)后的算法的MRR值、查準(zhǔn)率、召回率和F1值并進(jìn)行對比分析。實驗結(jié)果表明,算法的改進(jìn)效果良好,使用改進(jìn)的算法后系統(tǒng)性能有所提高。
[Abstract]:We are in an era of extremely rich information, and people have a great demand for fast and accurate access to information. Search engine is popular for its convenience and quick response, and it has become the most important way of information acquisition. However, it is hard for search engines to express users' intentions clearly in keyword search mode, and to return results by web page collection still requires users to manually find answers. Therefore, the automatic question answering system arises at the historic moment, but the traditional automatic question answering system has the defects of incomplete information coverage and untimely updating, and it needs to maintain a huge knowledge base. In order to give full play to the advantages of the two, this paper will combine the two, design and implement an improved Chinese automatic question answering system based on search engine. The main work of this paper are as follows: (1) the improved Site Q algorithm, Topic-Site Q algorithm is proposed: the following paragraphs and end statement has a greater contribution to the semantic, the proper weight into the Site Q algorithm, the multi feature association end and end statement fusion paragraph retrieval semantic similarity algorithm using multiple Topic-Site Q: feature fusion algorithm and statements, and with a certain value reflects their relevance to the paragraph contribution, while improving the end value of the score, according to the final score value and return paragraph ranking candidate segment set down. (2) improve the answer extraction algorithm based on the semantic dependency tree: the answer extraction algorithm based on the semantic dependency tree mainly investigates the semantic and grammatical structure, and the single way is not comprehensive enough. Word frequency, as one of the important semantic features, should embody this feature in the algorithm of answer extraction. In this paper, we improve the answer extraction algorithm based on semantic dependency tree. We take into account the frequency of keywords. We use logarithmic linear model to integrate them, and propose an improved algorithm based on semantic dependency tree for answer extraction. (3) an improved Chinese automatic question answering system based on search engine is designed and implemented, and the system is optimized according to the improved two algorithms. First, it analyzes the requirements of the system in detail, then describes the overall structure of the system and gives the structure of the system. In the detailed design and implementation part, the functions of each module, the process of processing, the implementation details, the core algorithms used and their improvements are discussed in detail. (4) in order to optimize the effectiveness of the proposed method is verified, artificial construct test set improvement effect on the algorithm and system experiments were conducted before and after the improvement of system using the improved algorithm and two algorithms for computing the MRR value, the precision, recall and F1 value and carries on the contrast analysis. The experimental results show that the improved effect of the algorithm is good, and the performance of the system is improved after the improved algorithm.
【學(xué)位授予單位】:北京工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.3
,
本文編號:1345620
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1345620.html
最近更新
教材專著