基于用戶意圖識別的查詢重構(gòu)研究

發(fā)布時間：2018-07-23 09:05

【摘要】：如今,搜索引擎能夠幫助用戶從網(wǎng)絡(luò)上獲取所需信息,大大地緩解用戶信息焦慮。但是用戶輸入的查詢詞簡短,容易產(chǎn)生模糊歧義性,基于查詢關(guān)鍵字匹配的搜索引擎無法識別一詞多義情況。查詢重構(gòu)技術(shù)是識別查詢詞背后真正用戶意圖的一個解決方案。然而,在查詢重構(gòu)技術(shù)中,會話切分的方法存在一定缺陷,基于會話共現(xiàn)信息生成的候選查詢,更容易偏離原查詢的用戶意圖,導(dǎo)致通過查詢重構(gòu)識別的用戶意圖存在交叉重疊問題。通過對查詢重構(gòu)相關(guān)的理論和技術(shù)進行深入研究,基于AOL查詢?nèi)罩?構(gòu)造了用戶意圖識別的查詢重構(gòu)模型。該模型主要解決如何進行用戶意圖的切分、如何識別出能表達原查詢用戶意圖的查詢重構(gòu)、如何對識別的用戶意圖進行聚類等三個問題。由于現(xiàn)有的算法存在一些不足,論文重點對構(gòu)造的模型進行改進,具體如下:該模型總共分三部分:查詢?nèi)罩镜臅捛蟹帧⒂嬎阍樵冊~查詢重構(gòu)及查詢重構(gòu)聚類。第一部分,為了解決詞匯相似度問題,融入查詢間點擊相似性特征。第二部分,為了解決查詢重構(gòu)表達不準(zhǔn)確問題,另外考慮查詢間的時間距離及查詢間點擊相似性等因素來計算原查詢與候選查詢關(guān)系。第三部分,針對查詢重構(gòu)識別的用戶意圖存在交叉重疊問題,提出查詢重構(gòu)聚類方法。然而,伴隨聚類也產(chǎn)生兩個問題:查詢重構(gòu)向量維數(shù)稀疏性和轉(zhuǎn)移概率計算不準(zhǔn)確性。為了解決查詢重構(gòu)向量維數(shù)稀疏性的問題,通過對會話中查詢重構(gòu)詞和點擊URL構(gòu)造Query-URL圖,引入吸收態(tài)的馬爾科夫隨機游走模型對圖建模。為了解決轉(zhuǎn)移概率計算不準(zhǔn)確的問題,綜合考慮URL、排序號、順序號三方面因素,參考TF-IDF思想定義了類似的CF-IQF模型計算圖中邊的權(quán)重。然后計算吸收態(tài)分布,構(gòu)建查詢重構(gòu)向量,最后利用查詢重構(gòu)向量的余弦相似度結(jié)合complete link算法實現(xiàn)聚類。通過對本模型各部分算法進行對比實驗驗證,結(jié)果表明本模型算法具有一定的優(yōu)越性。
[Abstract]:Today, search engines can help users get the information they need on the Internet, greatly easing their information anxiety. However, the query words input by users are short and easy to produce fuzzy ambiguity. The search engine based on query keyword matching can not recognize the polysemy of the word. Query refactoring is a solution to identify the real user's intention behind the query word. However, in the query refactoring technology, the method of session segmentation has some defects, and the candidate query generated based on session co-occurrence information is easier to deviate from the original query's user intention. This results in the overlapping problem of user intention identified by query refactoring. Based on AOL query log, the query refactoring model of user intention recognition is constructed by deeply studying the theory and technology of query refactoring. The model mainly solves three problems: how to segment the user's intention, how to recognize the query reconstruction that can express the original query's user's intention, and how to cluster the identified user's intention. Due to the shortcomings of the existing algorithms, this paper focuses on the improvement of the constructed model as follows: the model is divided into three parts: session segmentation of query log, query reconfiguration of original query words and clustering of query reconfiguration. In the first part, in order to solve the problem of lexical similarity, the click-similarity feature is incorporated into the query. In the second part, in order to solve the problem of inaccuracy of query reconfiguration, the relationship between original query and candidate query is calculated by considering the time distance between queries and the similarity of clicks between queries. In the third part, aiming at the overlapping problem of user intention in query refactoring identification, a query refactoring clustering method is proposed. However, there are two problems associated with clustering: sparse dimension of query reconstruction vector and inaccuracy of calculation of transition probability. In order to solve the problem of sparse dimension of query refactoring vector, Query-URL graph was constructed by query refactoring words and clicking URL in session, and an absorbing Markov random walk model was introduced to model the graph. In order to solve the problem of inaccurate calculation of transition probability, considering the three factors of URL, sort number and order number, a similar CF-IQF model is defined according to the TF-IDF idea to calculate the weights of the edges in the graph. Then the absorption state distribution is calculated and the query reconstruction vector is constructed. Finally, the cosine similarity of the query reconstruction vector and the complete link algorithm are used to realize the clustering. The experimental results show that the algorithm has some advantages.
【學(xué)位授予單位】：哈爾濱工程大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2016
【分類號】：TP391.3

【參考文獻】

相關(guān)期刊論文前9條

1 李綱;胡蓉;;信息搜尋中用戶查詢重構(gòu)研究綜述[J];圖書情報工作;2014年11期

2 付博;趙世奇;劉挺;;Web查詢?nèi)罩狙芯烤C述[J];電子學(xué)報;2013年09期

3 張曉娟;陸偉;;利用查詢重構(gòu)識別查詢意圖[J];現(xiàn)代圖書情報技術(shù);2013年01期

4 宋巍;張宇;劉挺;李生;;基于檢索歷史上下文的個性化查詢重構(gòu)技術(shù)研究[J];中文信息學(xué)報;2010年03期

5 陳琦;伍朝輝;姚芳;宋秀榮;張付志;;基于TF*IDF的垃圾郵件過濾特征選擇改進算法[J];計算機應(yīng)用研究;2009年06期

6 張磊;李亞楠;王斌;李鵬;蔣在帆;;網(wǎng)頁搜索引擎查詢?nèi)罩镜腟ession劃分研究[J];中文信息學(xué)報;2009年02期

7 盧春燕;雷景生;;基于模糊關(guān)聯(lián)的交互式Web信息檢索技術(shù)[J];廣西師范大學(xué)學(xué)報(自然科學(xué)版);2007年02期

8 張貝妮;王軍;;數(shù)字圖書館中的檢索式擴展方法研究[J];計算機應(yīng)用研究;2006年04期

9 王繼民,陳，

本文編號：2138904

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2138904.html

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于用戶意圖識別的查詢重構(gòu)研究