面向服務機器人的口語對話系統(tǒng)和語言模型技術研究

發(fā)布時間：2018-07-23 13:10

【摘要】：隨著語音識別技術的日漸成熟,在各個領域的應用層出不窮。對于服務機器人領域,語音技術主要用于服務機器人上的口語對話系統(tǒng),本文針對可佳機器人的具體應用場景,探究了應用于服務機器人口語對話系統(tǒng)設計與實現(xiàn)的過程。此外,本文還研究了與語音識別中語言模型相關的技術-聯(lián)合無監(jiān)督詞聚類的遞歸神經網(wǎng)絡語言模型。本文對面向服務機器人口語對話系統(tǒng)的研究主要涉及兩個方面：一是語音識別,二是對話管理。在語音識別方面,先較為詳細的介紹了語音識別相關基本原理,然后介紹面向可佳機器人應用的語料收集,隨后對模塊所需聲學模型訓練的完整步驟做了介紹,并對幾種聲學模型在本文提供的訓練集和測試集下的性能做了實驗和分析,實驗表明,使用上下文相關的三音素模型具有最好的識別效果,最佳詞識別率達到98.39%,對應的句子識別率為90.83%。針對機器人上機載計算設備計算能力有限和機器人在運行過程中能提供自身狀態(tài)信息的特點,本文設計了可以壓縮解碼時搜索空間的動態(tài)改變語言模型機制,并對最后完成的語音識別模塊做了實驗和分析,實驗中基于動態(tài)語言模型機制的語音識別模塊最佳句子識別率為87.95%,比不采用動態(tài)語言模型機制的語音識別模塊高出12.05%。在對話管理方面,針對服務機器人的特點,本文采用層疊狀態(tài)機的設計方法并使用python語言實現(xiàn)了這一對話管理框架,接著介紹了我們對話管理框架中的多模態(tài)信息加入和驗證與確認機制,并最后介紹了本文設計的對話管理在可佳機器人上具體任務cocktailparty上的應用。另外,本文還深入研究了無監(jiān)督詞聚類方法在遞歸神經網(wǎng)絡語言模型上的應用�；谶f歸神經網(wǎng)絡的語言模型被證明有領先的效果,研究表明,在遞歸神經網(wǎng)絡語言模型的輸入層加入詞性標注信息,可以顯著提高模型的效果。但使用詞性標注需要手工標注的數(shù)據(jù)訓練,耗費大量的人力物力,并且額外的標注器增加了模型的復雜性。為解決上述問題,本文嘗試將布朗詞聚類的結果代替詞性標注信息加入到遞歸神經網(wǎng)絡語言模型輸入層。實驗顯示,在Penn Treebank語料上,加入布朗詞類信息的遞歸神經網(wǎng)絡語言模型相比原遞歸神經網(wǎng)絡語言模型困惑度下降8-9%。
[Abstract]:With the maturation of speech recognition technology, the applications in various fields emerge one after another. For the field of service robot, the speech technology is mainly used in the spoken dialogue system of the service robot. In this paper, the design and implementation of the oral dialogue system for the service robot are discussed in the light of the specific application scene of the good robot. In addition, this paper also studies the language model associated with the language model in speech recognition, which combines the unsupervised word clustering with the recurrent neural network language model. In this paper, the research of Service-Oriented Robot Oral Dialogue system mainly involves two aspects: one is speech recognition, the other is dialogue management. In the aspect of speech recognition, the basic principles of speech recognition are introduced in detail, and then the collection of corpus for the application of good robot is introduced, and then the complete steps of acoustic model training for the module are introduced. The performance of several acoustic models under the training set and test set provided in this paper is tested and analyzed. The experiment shows that the use of context-dependent trichonic model has the best recognition effect. The best word recognition rate is 98.39 and the corresponding sentence recognition rate is 90.83. In view of the limited computing power of the airborne computing equipment on the robot and the ability of the robot to provide its own state information in the course of operation, this paper designs a dynamic changing language model mechanism which can compress and decode the search space. The final speech recognition module is tested and analyzed. The optimal sentence recognition rate of the speech recognition module based on dynamic language model is 87.95, which is 12.05 higher than that of the speech recognition module without dynamic language model. In the aspect of dialogue management, according to the characteristics of service robot, this paper adopts the design method of stacked state machine and implements this dialog management framework with python language. Then we introduce the mechanism of multi-modal information joining, verification and validation in our dialogue management framework. Finally, we introduce the application of the dialogue management in cocktailparty. In addition, the application of unsupervised word clustering in recurrent neural network language model is also studied. The language model based on recurrent neural network has been proved to have the leading effect. The research shows that the effect of the model can be improved significantly by adding part of speech tagging information into the input layer of the language model of recurrent neural network. However, the use of part of speech tagging requires manual tagging data training, which consumes a lot of manpower and material resources, and the extra tagger increases the complexity of the model. In order to solve the above problems, this paper attempts to add the result of Brownian word clustering to the input layer of recursive neural network language model instead of part of speech tagging information. The experimental results show that the degree of confusion of the recurrent neural network language model with Brown's part of speech information is 8-9 lower than that of the original recursive neural network language model on the Penn Treebank corpus.
【學位授予單位】：中國科學技術大學
【學位級別】：碩士
【學位授予年份】：2014
【分類號】：TP242;TN912.34

【參考文獻】

相關期刊論文前1條

1 黃寅飛,鄭方,燕鵬舉,徐明星,吳文虎;校園導航系統(tǒng)Easy Nav的設計與實現(xiàn)[J];中文信息學報;2001年04期

，

本文編號：2139515

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/wltx/2139515.html

上一篇：高精度InSAR系統(tǒng)干涉處理算法及軟件研究
下一篇：獨立光學加密域的非對稱密碼的新式攻擊

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向服務機器人的口語對話系統(tǒng)和語言模型技術研究