社區(qū)問(wèn)答系統(tǒng)中答案排序遷移學(xué)習(xí)的方法研究

發(fā)布時(shí)間：2018-02-28 05:22

本文關(guān)鍵詞： 社區(qū)問(wèn)答系統(tǒng) 用戶(hù)特征排序?qū)W習(xí) 遷移學(xué)習(xí) 排序模型　出處：《昆明理工大學(xué)》2017年碩士論文　論文類(lèi)型：學(xué)位論文

【摘要】：隨著互聯(lián)網(wǎng)技術(shù)的不斷發(fā)展使得人們獲取知識(shí)、解決問(wèn)題的方式變得越來(lái)越便捷。傳統(tǒng)的搜索引擎公司,例如雅虎、谷歌等為日益增多的互聯(lián)網(wǎng)用戶(hù)提供了更為方便的信息獲取方式,用戶(hù)可以通過(guò)在搜索對(duì)話(huà)框中輸入相關(guān)關(guān)鍵詞從而快速得到自己想要的信息。但是隨著互聯(lián)網(wǎng)的普及以及互聯(lián)網(wǎng)自身內(nèi)容的不斷豐富,人們?cè)讷@取答案的同時(shí),也對(duì)得到最佳答案的便易性提出了更高的要求�；谏鐓^(qū)問(wèn)答的個(gè)性化服務(wù)有效的彌補(bǔ)了傳統(tǒng)搜索引擎技術(shù)上的不足從而越來(lái)越受到各個(gè)互聯(lián)網(wǎng)公司的重視。社區(qū)問(wèn)答系統(tǒng)是一種新興知識(shí)共享模式,通過(guò)用戶(hù)提交問(wèn)題和答案,社區(qū)積累了大量的問(wèn)答對(duì)(question answering pairs)。當(dāng)用戶(hù)提交新問(wèn)題時(shí),如何通過(guò)排序,為用戶(hù)提供準(zhǔn)確的答案序列,是社區(qū)問(wèn)答系統(tǒng)的重要環(huán)節(jié)。傳統(tǒng)的排序算法主要利用監(jiān)督學(xué)習(xí)的方法構(gòu)建排序模型,它需要通過(guò)大量人工標(biāo)記數(shù)據(jù)來(lái)訓(xùn)練模型。目前國(guó)內(nèi)外學(xué)者提出了許多基于監(jiān)督排序?qū)W習(xí)的方法并且在實(shí)際生活中得到了很好的應(yīng)用,例如排序支持向量機(jī),它就是基于監(jiān)督學(xué)習(xí)的排序算法中的典型代表,通過(guò)大量的標(biāo)注數(shù)據(jù),輸入到指定的學(xué)習(xí)機(jī)當(dāng)中,然后自動(dòng)訓(xùn)練得到一個(gè)排序模型�；诒O(jiān)督排序?qū)W習(xí)的方法往往需要相當(dāng)規(guī)模的標(biāo)注數(shù)據(jù),保證訓(xùn)練模型的可靠性,但是在實(shí)際環(huán)境當(dāng)中由于標(biāo)注數(shù)據(jù)的不足。當(dāng)數(shù)據(jù)缺乏的時(shí)候監(jiān)督排序?qū)W算法的可靠性就會(huì)相應(yīng)的降低。某個(gè)特定領(lǐng)域訓(xùn)練好的排序模型,在新的領(lǐng)域往往不能獲得好的效果。并且互聯(lián)網(wǎng)中數(shù)據(jù)更新很快,之前標(biāo)注的數(shù)據(jù)隨著時(shí)間的推移就無(wú)法適應(yīng)當(dāng)前模型的訓(xùn)練。針對(duì)實(shí)際應(yīng)用中標(biāo)注不足的問(wèn)題借助遷移學(xué)習(xí)的思想對(duì)傳統(tǒng)的排序?qū)W習(xí)方法進(jìn)行改進(jìn)。利用基于特征選擇的遷移學(xué)習(xí)排序算法,假設(shè)源領(lǐng)域與目標(biāo)領(lǐng)域存在共享的低維特征表示,以用戶(hù)的多個(gè)興趣為源領(lǐng)域和目標(biāo)領(lǐng)域的共享特征,從而使目標(biāo)領(lǐng)域達(dá)到知識(shí)遷移的目的。我們通過(guò)分析社區(qū)問(wèn)答系統(tǒng)自身的特點(diǎn)可以觀察到它存在許多基于用戶(hù)行為的標(biāo)簽。結(jié)合基于特征的遷移學(xué)習(xí)方法將這些用戶(hù)特征融入到特征空間,通過(guò)選取社區(qū)中具體價(jià)值的用戶(hù)標(biāo)簽和用戶(hù)行為標(biāo)簽對(duì)基于特征的遷移學(xué)習(xí)排序算法進(jìn)行優(yōu)化。例如問(wèn)題回答者的擅長(zhǎng)領(lǐng)域這個(gè)特征,一個(gè)問(wèn)題的回答者可能會(huì)擅長(zhǎng)多個(gè)領(lǐng)域(比如網(wǎng)球和羽毛球)在特征向量中該特征主要以布爾類(lèi)型來(lái)表示,擅長(zhǎng)為1不擅長(zhǎng)為0。那么這個(gè)特征在羽毛球和網(wǎng)球類(lèi)別中的布爾類(lèi)型均為1,即這個(gè)特征可以作為羽毛球和網(wǎng)球兩個(gè)不同類(lèi)別共性特征來(lái)使用,從而改善了排序?qū)W習(xí)方法。通過(guò)實(shí)驗(yàn)的驗(yàn)證,證實(shí)了融入用戶(hù)特征的遷移學(xué)習(xí)答案排序算法能夠有效的提高答案排序的效果。
[Abstract]:With the development of Internet technology, it is becoming more and more convenient for people to acquire knowledge and solve problems. Google and others have provided a more convenient way to access information to a growing number of Internet users. Users can quickly get the information they want by entering relevant keywords in the search dialog box. But with the popularity of the Internet and the continuous enrichment of the content of the Internet, people get the answers at the same time. The personalized service based on community Q & A effectively makes up for the technical deficiency of traditional search engine and is paid more and more attention to by various Internet companies. Q & A system is a new knowledge sharing model. By submitting questions and answers, the community has accumulated a large number of Q & A questions answering airs.When users submit new questions, how to sort them to provide them with accurate answer sequences, It is an important part of community question answering system. Traditional sorting algorithms mainly use supervised learning method to construct sort model. It needs a lot of artificial marking data to train the model. At present, scholars at home and abroad have put forward a lot of supervised ranking learning methods and have been applied in real life, such as sort support vector machine. It is a typical representative of the sorting algorithm based on supervised learning, which is input into the designated learning machine through a large amount of annotated data. Then a sort model is obtained by automatic training. The method based on supervised ranking learning often requires a considerable scale of tagging data to ensure the reliability of the training model. But in the actual environment, due to the shortage of annotated data, the reliability of the supervised sorting algorithm will be reduced when the data is lacking. It often doesn't work well in new areas. And data updates quickly on the Internet. The previously annotated data can not adapt to the training of the current model with the passage of time. In order to solve the problem of insufficient tagging in practical application, the traditional sorting learning method is improved by the idea of transfer learning. A shift Learning sorting algorithm based on sign selection, Assuming that there is a shared low-dimensional feature representation between the source domain and the target domain, the shared feature of the source domain and the target domain is based on the user's multiple interests. By analyzing the characteristics of the community Q & A system, we can observe that there are many tags based on user behavior. User features are incorporated into the feature space, By selecting user tags and user behavior tags for specific values in the community, the feature-based migration learning sorting algorithm is optimized. The answer to a question may be good at more than one area (such as tennis and badminton) in a feature vector that is mainly represented as a Boolean type. Good at 1 is not good at 0. Then this feature has a Boolean type of 1 in both badminton and tennis classes, which means that this feature can be used as a common feature of two different categories of badminton and tennis. Through the experimental verification, it is proved that the migration learning answer sorting algorithm can effectively improve the result of the answer sorting.
【學(xué)位授予單位】：昆明理工大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類(lèi)號(hào)】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前5條

1 莊福振;羅平;何清;史忠植;;遷移學(xué)習(xí)研究進(jìn)展[J];軟件學(xué)報(bào);2015年01期

2 毛先領(lǐng);李曉明;;問(wèn)答系統(tǒng)研究綜述[J];計(jì)算機(jī)科學(xué)與探索;2012年03期

3 田久樂(lè);趙蔚;;基于同義詞詞林的詞語(yǔ)相似度計(jì)算方法[J];吉林大學(xué)學(xué)報(bào)(信息科學(xué)版);2010年06期

4 李波;高文君;邱錫鵬;;基于語(yǔ)法分析和統(tǒng)計(jì)方法的答案排序模型[J];中文信息學(xué)報(bào);2009年02期

5 游斕,周雅倩,黃萱菁,吳立德;基于最大熵模型的QA系統(tǒng)置信度評(píng)分算法[J];軟件學(xué)報(bào);2005年08期

相關(guān)博士學(xué)位論文前2條

1 程凡;基于排序?qū)W習(xí)的信息檢索模型研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2012年

2 陳德品;基于遷移學(xué)習(xí)的跨領(lǐng)域排序?qū)W習(xí)算法研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2010年

相關(guān)碩士學(xué)位論文前3條

1 李yN陽(yáng);社區(qū)問(wèn)答系統(tǒng)中融入用戶(hù)標(biāo)簽和用戶(hù)行為的列表排序方法研究[D];昆明理工大學(xué);2016年

2 楊彬;社區(qū)問(wèn)答中文問(wèn)句分類(lèi)的遷移學(xué)習(xí)方法研究[D];昆明理工大學(xué);2015年

3 宗煥云;領(lǐng)域問(wèn)答系統(tǒng)答案排序研究[D];昆明理工大學(xué);2011年

，

本文編號(hào)：1545928

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/1545928.html

上一篇：基于編碼結(jié)構(gòu)光的三維測(cè)量方法研究
下一篇：群發(fā)炮彈炸點(diǎn)自動(dòng)識(shí)別與定位技術(shù)研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

社區(qū)問(wèn)答系統(tǒng)中答案排序遷移學(xué)習(xí)的方法研究