天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

非事實(shí)類問題的答案選取

發(fā)布時(shí)間:2018-08-07 13:50
【摘要】:隨著問答社區(qū)網(wǎng)站的興起,越來(lái)越多的用戶生成數(shù)據(jù)積累了起來(lái)。這些用戶生成數(shù)據(jù)不僅具有海量的、多樣性的等特點(diǎn),還有著極高的質(zhì)量和重用價(jià)值。為了高效的管理和利用這些數(shù)據(jù),近年來(lái)研究人員基于這些數(shù)據(jù)進(jìn)行了大量的研究和實(shí)踐,而社區(qū)問答就是一個(gè)被廣泛研究的課題。 社區(qū)問答研究基于問答社區(qū)數(shù)據(jù),與傳統(tǒng)的問答系統(tǒng)有著明顯的不同。傳統(tǒng)問答系統(tǒng)主要解決以短語(yǔ)和命名實(shí)體為答案的事實(shí)類問題,主要模塊是問題理解和答案抽取。而社區(qū)問答則沒有這樣的限制,并且其特別適合回答詢問建議或觀點(diǎn)的非事實(shí)類問題。社區(qū)問答研究涵蓋問題檢索與推薦、問題的興趣度、問題和答案的質(zhì)量、答案的排序、用戶權(quán)威性等研究方向。其中問題檢索和答案的選取作為社區(qū)問答的核心模塊更是受到了學(xué)術(shù)界和工業(yè)界的廣泛關(guān)注。 本課題主要工作為構(gòu)建一個(gè)基于大規(guī)模問答社區(qū)數(shù)據(jù)的社區(qū)問答系統(tǒng),并對(duì)其中涉及的問題分析、問題檢索和答案選取技術(shù)進(jìn)行了深入的研究。 社區(qū)問答系統(tǒng)構(gòu)建過程中,本課題收集了來(lái)自Yahoo! Answers等社區(qū)網(wǎng)站的超過1.3億問題和10億答案的大規(guī)模數(shù)據(jù),和之前的基于百萬(wàn)量級(jí)的數(shù)據(jù)的問答社區(qū)相關(guān)研究工作相比有著明顯的不同和極高的實(shí)用價(jià)值。在此數(shù)據(jù)的基礎(chǔ)上,,本課題通過查詢自動(dòng)分類方法來(lái)提高每次查詢效率和效果。 在問題檢索過程中,本課題提出了應(yīng)用查詢問句和候選問題的結(jié)構(gòu)信息和語(yǔ)義信息,并結(jié)合排序?qū)W習(xí)算法來(lái)融合多種不同類別的特征。通過訓(xùn)練數(shù)據(jù)生成排序模型來(lái)提高問題檢索的相關(guān)性和詞語(yǔ)不匹配等問題。實(shí)驗(yàn)表明,本課題應(yīng)用Ranking SVM方法來(lái)訓(xùn)練的排序模型在不同數(shù)據(jù)集上,其準(zhǔn)確率等評(píng)價(jià)指標(biāo)上都相比以往的方法有著顯著的提高。 在通過問題檢索找到與查詢問句語(yǔ)義相似的候選問題后,本課題還提出了一個(gè)基于問答對(duì)的內(nèi)容信息的新的無(wú)監(jiān)督學(xué)習(xí)方法,來(lái)判定答案的質(zhì)量以過濾低質(zhì)量的答案。本課題對(duì)問答社區(qū)中的數(shù)據(jù)有以下三個(gè)假設(shè):1、一個(gè)問題下的大部分答案都是正常的,只有少部分答案是低質(zhì)量的需要被過濾掉;2、低質(zhì)量答案可以通過對(duì)比同一問題下的其他答案而被檢測(cè)出來(lái);3、不同的答案應(yīng)該有不同的判定答案質(zhì)量高低的標(biāo)準(zhǔn);谝陨霞僭O(shè),本課題應(yīng)用基于內(nèi)容的特征,通過最小化答案特征向量的方差,同時(shí)盡可能多的保留答案的方式來(lái)對(duì)檢測(cè)低質(zhì)量答案。實(shí)驗(yàn)表明,該方法相比于基準(zhǔn)方法在ROC數(shù)值上有著明顯的提高。 在低質(zhì)量答案過濾之后,本課題還應(yīng)用問答對(duì)的文本信息和社區(qū)網(wǎng)站回答者的權(quán)威性信息,通過問答社區(qū)中的用戶選出的最佳答案數(shù)據(jù)和Ranking SVM算法訓(xùn)練了一個(gè)答案排序模型,來(lái)對(duì)答案進(jìn)行重新排序選取最佳的答案。通過以上幾個(gè)步驟,本課題構(gòu)建了一個(gè)高效、實(shí)用的社區(qū)問答系統(tǒng),通過300個(gè)商業(yè)搜索引擎查詢?nèi)罩局懈哳l問題的測(cè)試,有78.0%的問題可以給出正確的答案,并對(duì)于任意問句可在2秒中內(nèi)給出結(jié)果,該社區(qū)問答系統(tǒng)具有很好效果與實(shí)用性。
[Abstract]:With the rise of the question and answer community, more and more user generated data have been accumulated. These users generate data not only with mass, diversity, but also of high quality and reuse. In order to manage and use these data efficiently, researchers have done a lot of research on these data in recent years. And practice, and community Q & A is a widely studied subject.
The community question and answer study is based on the question and answer community data, which is obviously different from the traditional question answering system. The traditional question answering system mainly solves the fact class problem with the answer of the phrase and the named entity. The main module is the problem understanding and the answer extraction. The community question answer is not limited, and it is especially suitable for answering questions and ideas. The community question and answer research covers the search and recommendation of the problem, the degree of interest, the quality of the questions and answers, the order of the answers, the authority of the user and so on. The key module of the question and answer of the question is the attention of the academia and the industry.
The main work of this project is to build a community Q & a system based on the mass question and answer community data, and make an in-depth study of the problems involved in the problem analysis, the problem retrieval and the answer selection technology.
In the process of community Q & a system construction, this subject has collected more than 130 million questions and 1 billion answers from the community websites of Yahoo! Answers and so on. It has significant difference and high practical value compared with the previous question and answer community related research based on millions of data. On the basis of this data It improves the efficiency and effectiveness of each query by querying automatic classification.
In the process of problem retrieval, this topic puts forward the structure and semantic information of query questions and candidate questions, and combines the sorting learning algorithm to merge the characteristics of various different categories. Through training data generating sorting model to improve the correlation of problem retrieval and the mismatch of words, the experiment shows that this topic is applied to Ran The ranking model trained by King SVM has a remarkable improvement in accuracy and other evaluation indexes compared with the previous methods on different data sets.
A new unsupervised learning method based on QA based content information is proposed to find the quality of answers to filter low quality answers. This subject has three hypotheses in the question and answer community: 1, a large part under a problem. Only a few answers are normal, only a few answers are low quality needs to be filtered out; 2, low quality answers can be detected by comparing other answers to the same problem; 3, different answers should have different criteria for determining the quality of the answers. Based on the above hypothesis, the subject applies the features based on content, through the above hypothesis. The variance of the answer eigenvectors is minimized and the answers are kept as many as possible to detect low quality answers. Experiments show that the method has a significant increase in the ROC value compared to the benchmark method.
After the low quality answer filtering, the subject also uses the text information of the question answer pair and the authoritative information of the responders of the community website, and trains an answer sorting model through the best answer data selected by the user in the question and answer community and the Ranking SVM algorithm, to select the best answer to the answer by a new sort. Step, this project constructs an efficient and practical community Q & a system, and through 300 commercial search engines to test the high frequency problem in the log, 78% of the questions can give the correct answer, and the question can be given the result in 2 seconds. The community question answering system has good effect and practicability.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 賈君枝;毛海飛;;漢語(yǔ)框架網(wǎng)絡(luò)問答系統(tǒng)問句處理研究[J];圖書情報(bào)工作;2008年10期

2 王君;李舟軍;胡俠;胡必云;;一種新的復(fù)合核函數(shù)及在問句檢索中的應(yīng)用[J];電子與信息學(xué)報(bào);2011年01期

3 黨琰,張冬茉,李芳;角色反演算法在問答系統(tǒng)中的應(yīng)用[J];計(jì)算機(jī)工程與應(yīng)用;2004年36期

4 張曉孿;王西鋒;;中文問答系統(tǒng)中語(yǔ)義角色標(biāo)注的研究與實(shí)現(xiàn)[J];科學(xué)技術(shù)與工程;2008年10期

5 秦兵,劉挺,王洋,鄭實(shí)福,李生;基于常問問題集的中文問答系統(tǒng)研究[J];哈爾濱工業(yè)大學(xué)學(xué)報(bào);2003年10期

6 付鴻鵠;基于W eb的開放領(lǐng)域問答系統(tǒng)研究[J];現(xiàn)代圖書情報(bào)技術(shù);2005年09期

7 高明霞;劉椿年;;基于模糊描述邏輯的PNL網(wǎng)絡(luò)問答系統(tǒng)[J];計(jì)算機(jī)工程;2006年21期

8 王樹西;趙星秋;潘碩;;問答系統(tǒng)在教學(xué)中的應(yīng)用[J];中國(guó)教育信息化;2007年07期

9 杜瑋;邸書靈;孫樹靜;;基于互聯(lián)網(wǎng)技術(shù)的問答系統(tǒng)研究[J];微計(jì)算機(jī)信息;2007年36期

10 陳敏杰;;問答系統(tǒng)中問題分析模塊的實(shí)現(xiàn)[J];經(jīng)營(yíng)管理者;2009年13期

相關(guān)會(huì)議論文 前10條

1 何靖;陳

本文編號(hào):2170221


資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2170221.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶5fbb7***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com