交互式問(wèn)答系統(tǒng)中的待改進(jìn)問(wèn)題自動(dòng)識(shí)別方法
發(fā)布時(shí)間:2018-04-09 05:26
本文選題:問(wèn)答系統(tǒng) 切入點(diǎn):知識(shí)庫(kù)擴(kuò)充 出處:《哈爾濱工業(yè)大學(xué)》2013年碩士論文
【摘要】:隨著Internet的不斷發(fā)展,人們已經(jīng)不滿足于僅僅利用搜索引擎搜索需要的信息。如何快速方便的為用戶提供需要的信息成為人們努力研究的焦點(diǎn)。自動(dòng)問(wèn)答系統(tǒng)剛好具有既能滿足用戶對(duì)信息的需求,也能滿足獲取人性化回復(fù)這兩方面的特點(diǎn),因此能夠很好的解決這一問(wèn)題。但是傳統(tǒng)的問(wèn)答系統(tǒng)沒有對(duì)已經(jīng)存在的那些回復(fù)答案不理想的問(wèn)題自動(dòng)識(shí)別的機(jī)制,這對(duì)問(wèn)答系統(tǒng)進(jìn)行改進(jìn)或知識(shí)庫(kù)更新都是一個(gè)挑戰(zhàn)。 為了彌補(bǔ)傳統(tǒng)問(wèn)答系統(tǒng)缺乏對(duì)回復(fù)不好的問(wèn)題進(jìn)行識(shí)別的缺點(diǎn),本課題對(duì)交互式問(wèn)答系統(tǒng)中存在的待改進(jìn)問(wèn)題的自動(dòng)識(shí)別方法進(jìn)行研究, 本課題提出了一種交互式問(wèn)答系統(tǒng)中的待改進(jìn)問(wèn)題自動(dòng)識(shí)別方法,對(duì)基于用戶情感、意圖和混合特征的待改進(jìn)問(wèn)題識(shí)別效果進(jìn)行分析,,將需要通過(guò)人工審核方式識(shí)別待改進(jìn)問(wèn)題的工作轉(zhuǎn)換為使用自動(dòng)識(shí)別方法對(duì)其進(jìn)行識(shí)別,省去了人工審核的工作,提高識(shí)別效率。 為了更好地識(shí)別系統(tǒng)中的待改進(jìn)問(wèn)題,本課題設(shè)計(jì)了一種面向混合特征的知識(shí)庫(kù)擴(kuò)充方法,采用網(wǎng)絡(luò)爬蟲工具,將知識(shí)庫(kù)語(yǔ)料擴(kuò)充為39161條,這些設(shè)計(jì)多領(lǐng)域多方面的問(wèn)答語(yǔ)料基本滿足了用戶的會(huì)話需求。 在此研究基礎(chǔ)上改進(jìn)了問(wèn)答系統(tǒng)架構(gòu)和運(yùn)行平臺(tái)的可移植性,現(xiàn)在的比特機(jī)器人問(wèn)答系統(tǒng)能夠運(yùn)行于微信、QQ和網(wǎng)頁(yè)三種平臺(tái)。這種多平臺(tái)的運(yùn)行模式為問(wèn)答系統(tǒng)吸引大量使用用戶。 識(shí)別出這些待改進(jìn)問(wèn)題后,將通過(guò)人工審核的方式獲取正確答案,最后將這些改進(jìn)后的問(wèn)題和改進(jìn)后的答案更新至系統(tǒng)知識(shí)庫(kù),從而實(shí)現(xiàn)問(wèn)答系統(tǒng)知識(shí)庫(kù)的更新。 本課題實(shí)驗(yàn)過(guò)程的數(shù)據(jù)來(lái)源是問(wèn)答系統(tǒng)微信平臺(tái)獲取的真實(shí)問(wèn)答語(yǔ)料,共計(jì)3119條問(wèn)答對(duì)。通過(guò)對(duì)這些真實(shí)會(huì)話語(yǔ)料的標(biāo)注和分析,確定待改進(jìn)問(wèn)題的識(shí)別方法。最終對(duì)問(wèn)答系統(tǒng)中待改進(jìn)問(wèn)題的識(shí)別準(zhǔn)確率達(dá)到76.77%。最后的實(shí)驗(yàn)結(jié)果和系統(tǒng)實(shí)際運(yùn)行效果證明了本課題提出的問(wèn)答系統(tǒng)中待改進(jìn)問(wèn)題的自動(dòng)識(shí)別方法的可行性。
[Abstract]:With the development of Internet, people are not satisfied with the information that search engine needs.How to provide information for users quickly and conveniently has become the focus of research.The automatic Q & A system has the characteristics of not only meeting the information needs of users, but also meeting the two characteristics of obtaining humanized reply, so it can solve this problem very well.But the traditional question answering system does not have the mechanism to automatically identify the questions which are not well answered, which is a challenge to the improvement of the question answering system or the updating of the knowledge base.In order to make up for the shortcoming of the traditional question answering system, this paper studies the automatic recognition method of the problem in the interactive question answering system.In this paper, an improved problem recognition method in interactive question answering system is proposed, and the effect of problem recognition based on user emotion, intention and mixed features is analyzed.The work needed to identify the problems to be improved by means of manual auditing is transformed into the identification of the problems by automatic identification, which saves the work of manual auditing and improves the efficiency of identification.In order to better identify the problems to be improved in the system, a hybrid feature oriented knowledge base expansion method is designed in this paper. The knowledge base corpus is expanded to 39161 by using the web crawler tool.These design multi-domain and multi-faceted question and answer corpus basically satisfy the user's conversation demand.On the basis of this research, the architecture of Q & A system and the portability of running platform are improved. Now, the quizzing system of bit robot can run on three kinds of platforms: WeChat QQ and web page.This multi-platform mode of operation attracts a large number of users for the Q & A system.After identifying these questions to be improved, the correct answers will be obtained by manual examination. Finally, the improved questions and the improved answers will be updated to the system knowledge base, thus the updating of the question answering system knowledge base will be realized.The data source of the experiment process is the real question and answer corpus obtained by the Question-answering system WeChat platform, with a total of 3119 question-and-answer pairs.Through the annotation and analysis of these real conversational data, the identification method of the problem to be improved is determined.Finally, the accuracy of problem recognition in question answering system is 76.77.Finally, the experimental results and the actual operation results of the system prove the feasibility of the automatic identification method of the problem to be improved in the question and answer system proposed in this paper.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 吳友政,趙軍,段湘煜,徐波;問(wèn)答式檢索技術(shù)及評(píng)測(cè)研究綜述[J];中文信息學(xué)報(bào);2005年03期
相關(guān)博士學(xué)位論文 前1條
1 宋萬(wàn)鵬;短文本相似度計(jì)算在用戶交互式問(wèn)答系統(tǒng)中的應(yīng)用[D];中國(guó)科學(xué)技術(shù)大學(xué);2010年
本文編號(hào):1725064
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1725064.html
最近更新
教材專著