基于語言模型的個(gè)性化信息檢索的方法與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-03-02 01:32
本文關(guān)鍵詞: 信息檢索 語言模型 查詢擴(kuò)展 用戶模型 出處:《內(nèi)蒙古大學(xué)》2013年碩士論文 論文類型:學(xué)位論文
【摘要】:由于互聯(lián)網(wǎng)的快速發(fā)展,在繁多紛雜的信息中,如何辨別用戶的真實(shí)意圖,準(zhǔn)確的從浩瀚的信息資源中找到所需的信息,成為當(dāng)前信息檢索領(lǐng)域一個(gè)較為關(guān)注的問題。在當(dāng)今技術(shù)較為成熟的搜索引擎網(wǎng)站上,查全率及響應(yīng)速度已經(jīng)做得很好,但在查準(zhǔn)率上始終難以讓用戶滿意。 信息檢索的主要目的,即:從眾多的文檔中找到符合用戶查詢需求的文檔。傳統(tǒng)的查詢擴(kuò)展重視原問句的擴(kuò)展,但是忽略了擴(kuò)展后查詢問句中存在許多不必要的詞匯,從而又阻礙了擴(kuò)展后查詢的準(zhǔn)確性,因此不能從根本上表達(dá)用戶查詢意圖。本文將從用戶的個(gè)性化角度,對(duì)查詢擴(kuò)展進(jìn)行研究。 本文為個(gè)性化的研究看出了兩種檢索方法,即:用戶查詢擴(kuò)展模型和去掉擴(kuò)展詞的停用詞表方法,兩種方法的基本思想是源于查詢優(yōu)化,對(duì)用戶的查詢進(jìn)行查詢擴(kuò)展或是查詢?cè)~的刪減。用戶模型主要是通過結(jié)合個(gè)體用戶所涉及到的主題領(lǐng)域?qū)ζ洳樵儐柧溥M(jìn)行擴(kuò)充,擴(kuò)展后的新查詢可以提高用戶的準(zhǔn)確率和查全率。而去掉擴(kuò)展詞的停用詞是將通過原始查詢進(jìn)行偽相關(guān)擴(kuò)展后的新查詢問句的研究,在不同的領(lǐng)域基礎(chǔ)上總結(jié)得出查詢問句的停用詞表,以減少新的查詢問句中詞的不必要詞,將其所分配的概率值重新分配,加大了原始查詢?cè)~的概率值。 本文在語言模型的基礎(chǔ)上,利用現(xiàn)有的成熟技術(shù),從新的角度來研究查詢問句擴(kuò)展,通過實(shí)驗(yàn),進(jìn)一步改進(jìn)查詢問句的方法,利用用戶興趣模型,提高用戶的檢索結(jié)果。我們將在文中詳細(xì)討論各種檢索模型中查詢擴(kuò)展的方法。經(jīng)過實(shí)驗(yàn)訓(xùn)練,驗(yàn)證本文提出用戶查詢擴(kuò)展和提出的不同領(lǐng)域的停用詞表。
[Abstract]:Because of the rapid development of the Internet, how to distinguish the real intention of the user and find the needed information from the vast information resources in the numerous and complicated information, It has become a more concerned problem in the field of information retrieval. Recall rate and response speed have been done well on search engine websites with more mature technology, but it is always difficult to satisfy users in recall rate. The main purpose of information retrieval is to find documents from many documents that meet the needs of users. Traditional query expansion attaches importance to the expansion of the original question, but ignores the existence of many unnecessary words in the extended query. Therefore, the accuracy of the extended query can not be expressed fundamentally. In this paper, the query expansion will be studied from the user's personalized point of view. In this paper, we find out two retrieval methods for personalized research, that is, user query extension model and the method of removing extended word table. The basic idea of the two methods is from query optimization. The user model mainly extends the query questions by combining the subject areas of the individual users. The extended new query can improve the accuracy and recall of the user. In order to reduce the unnecessary words in the new query question and redistribute the probability value, the probabilistic value of the original query word can be increased by summing up the stop word list of the query question on the basis of different fields. On the basis of the language model, this paper makes use of the existing mature technology to study the expansion of query questions from a new perspective. Through experiments, we further improve the method of querying question sentences, and use the user interest model. In this paper, we will discuss in detail the methods of query expansion in various retrieval models. Through experimental training, we verify the proposed user query expansion and the proposed discontinuation tables in different domains.
【學(xué)位授予單位】:內(nèi)蒙古大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3
【引證文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前2條
1 李云飛;基于查詢?nèi)罩镜膭?dòng)態(tài)查詢擴(kuò)展研究[D];內(nèi)蒙古大學(xué);2016年
2 丁凱朝;信息檢索中虛擬域重排技術(shù)的研究與實(shí)現(xiàn)[D];內(nèi)蒙古大學(xué);2014年
,本文編號(hào):1554484
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1554484.html
最近更新
教材專著