基于知識(shí)庫的自然語言問答方法研究
發(fā)布時(shí)間:2018-04-16 03:24
本文選題:知識(shí)庫問答 + 詞向量; 參考:《中國科學(xué)技術(shù)大學(xué)》2017年碩士論文
【摘要】:基于知識(shí)庫的自然語言問答指的是針對(duì)以自然語言形式給出的問題,利用結(jié)構(gòu)化的知識(shí)庫給出答案,它是自然語言處理的重要研究方向之一。知識(shí)庫問答的主要方法可以分為基于信息提取的方法、基于語義解析的方法和基于向量空間建模的方法三類,其中的關(guān)鍵技術(shù)包括知識(shí)的抽取和表示、用戶問句的語義表征和基于知識(shí)庫的答案生成等。受到問句語義表征準(zhǔn)確性、問答對(duì)訓(xùn)練數(shù)據(jù)規(guī)模等因素的影響,現(xiàn)階段知識(shí)庫問答系統(tǒng)的性能仍有待提升。此外,開源的大規(guī)模開放領(lǐng)域中文知識(shí)庫較為缺乏,這也制約了面向中文的知識(shí)庫問答技術(shù)的研究開展。本文圍繞基于知識(shí)庫的自然語言問答任務(wù),從問句語義表征、訓(xùn)練數(shù)據(jù)準(zhǔn)備和中文知識(shí)庫構(gòu)建等多個(gè)方面開展研究工作,主要研究內(nèi)容包括面向知識(shí)庫問答中復(fù)述問句評(píng)分的詞向量構(gòu)建方法、結(jié)合神經(jīng)網(wǎng)絡(luò)問句生成的知識(shí)庫問答方法以及中文知識(shí)庫構(gòu)建中的知識(shí)融合方法。傳統(tǒng)詞向量通過與具體任務(wù)無關(guān)的無監(jiān)督訓(xùn)練方法得到,用于知識(shí)庫問答中的復(fù)述問句評(píng)分時(shí)無法體現(xiàn)句子級(jí)的語義約束關(guān)系。因此,本文提出了一種基于復(fù)述知識(shí)約束的詞向量訓(xùn)練方法。該方法在詞向量訓(xùn)練過程中引入句子級(jí)的語義約束信息,在不改變句子語義合成方法的前提下,通過優(yōu)化單詞層面的語義向量,來改善句子層面的語義表征,最后達(dá)到提升復(fù)述問句評(píng)分以及知識(shí)庫問答系統(tǒng)回答問題的準(zhǔn)確度的效果,F(xiàn)有基于向量空間建模的知識(shí)庫問答方法依賴訓(xùn)練數(shù)據(jù),而人工生成大規(guī)模的問答對(duì)數(shù)據(jù)較為困難。本章針對(duì)以上問題將基于編碼器-解碼器神經(jīng)網(wǎng)絡(luò)模型的問句生成方法引入知識(shí)庫問答系統(tǒng)構(gòu)建,通過構(gòu)建問句生成模型實(shí)現(xiàn)由知識(shí)庫中三元組自動(dòng)生成問句,用于知識(shí)庫問答的模型訓(xùn)練。實(shí)驗(yàn)結(jié)果表明使用模型生成問句相對(duì)傳統(tǒng)模版生成問句,有效改善了知識(shí)庫問答系統(tǒng)的準(zhǔn)確率。最后,本論文介紹一種基于知識(shí)融合的中文知識(shí)庫構(gòu)建方法。該方法首先從百度百科網(wǎng)頁的信息框中抽取信息構(gòu)建初始知識(shí)庫,然后采用基于鏈接詞信息的實(shí)體對(duì)齊和基于Jaccard系數(shù)的屬性映射方法,實(shí)現(xiàn)初始知識(shí)庫與現(xiàn)有Freebase知識(shí)庫的融合。通過構(gòu)建人物、地理等部分領(lǐng)域的中文知識(shí)庫,驗(yàn)證了以上方法在已有本體庫基礎(chǔ)上實(shí)現(xiàn)知識(shí)庫擴(kuò)充的有效性。
[Abstract]:The question and answer of natural language based on knowledge base refers to the question given in the form of natural language. It is one of the important research directions of natural language processing by using the structured knowledge base to give the answer.The main methods of knowledge base question and answer can be divided into three kinds: one is based on information extraction, the other is based on semantic analysis and vector space modeling. The key technologies include knowledge extraction and representation.The semantic representation of user question and the answer generation based on knowledge base.Due to the accuracy of semantic representation of question sentences and the effect of question answering on the scale of training data, the performance of the knowledge base question answering system still needs to be improved.In addition, the lack of Chinese knowledge base in open-source and large-scale open field also restricts the research of Chinese-oriented knowledge base question and answer technology.This paper focuses on the question and answer task of natural language based on knowledge base, including the semantic representation of question sentence, the preparation of training data and the construction of Chinese knowledge base, etc.The main contents of this paper include the word vector construction method which is oriented to the scoring of quizzes in the knowledge base, the knowledge base question answering method combined with the neural network question generation method and the knowledge fusion method in the Chinese knowledge base construction.The traditional word vector is obtained by unsupervised training method which is independent of the specific task, and can not reflect the semantic constraint relationship of sentence level when used in the scoring of question retelling in the knowledge base question answering.Therefore, this paper proposes a word vector training method based on retelling knowledge constraints.This method introduces sentence level semantic constraint information in the process of word vector training, and improves the semantic representation of sentence level by optimizing the semantic vector of word level without changing the sentence semantic synthesis method.Finally, the accuracy of answering questions in question answering system is improved.The existing knowledge base question-and-answer methods based on vector space modeling rely on training data, but it is difficult to generate large-scale question and answer data manually.In this chapter, the question generation method based on encoder and decoder neural network model is introduced into the question answering system of knowledge base, and the question generation model is constructed to generate question sentences automatically by triples in knowledge base.Model training for knowledge Base questions and answers.The experimental results show that using the model to generate questions is more effective than the traditional template to generate questions, which can effectively improve the accuracy of the question answering system of knowledge base.Finally, this paper introduces a knowledge fusion based Chinese knowledge base construction method.In this method, the initial knowledge base is constructed by extracting information from the information box of Baidu encyclopedia page, and then the method of entity alignment based on link word information and attribute mapping method based on Jaccard coefficient is adopted to realize the fusion of initial knowledge base and existing Freebase knowledge base.By constructing the Chinese knowledge base of people, geography and other fields, the validity of the above methods to realize the expansion of the knowledge base based on the existing ontology library is verified.
【學(xué)位授予單位】:中國科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【相似文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 詹晨迪;基于知識(shí)庫的自然語言問答方法研究[D];中國科學(xué)技術(shù)大學(xué);2017年
,本文編號(hào):1757091
本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/1757091.html
最近更新
教材專著