面向?qū)嶓w查詢的開放式信息抽取技術研究

發(fā)布時間：2018-07-09 16:01

本文選題：維基百科 + 實體抽取��；參考：《北方工業(yè)大學》2012年碩士論文

【摘要】：查詢推薦是搜索引擎系統(tǒng)中的一項重要技術,其通過推薦更合適的查詢以提高用戶的搜索體驗現(xiàn),現(xiàn)有方法能夠找到直接通過某種屬性關聯(lián)的相似查詢,卻忽略了具有間接關聯(lián)的語義相關查詢。為解決上述問題,本文采用開放式的知識庫維基百科,并以此提出了一種新型的查詢擴展系統(tǒng)。該方法通過抽取維基百科的部分結(jié)構(gòu)化信息及自然文本信息,形成了以實體為骨架,以實體特征和實體關系為網(wǎng)絡的層級語料庫,基于此語料庫完成相應的用戶查詢推薦系統(tǒng),并進一步針對用戶查詢未被收錄在維基百科時,設計輔助查詢系統(tǒng)改進查詢推薦效果。本文主要創(chuàng)新點如下：提出一種基于隨機游走模型的查詢意圖識別算法RWM。該方法能夠解決一些數(shù)據(jù)稀疏的問題,通過隨機游走過程,對未直接關聯(lián)的概念進行了擴展,從而有效的達到查詢意圖的識別。提出一種共同利用維基百科的結(jié)構(gòu)化知識和web知識的稀有查詢分類算法WWRQ,該方法利用搜索引擎得到檢索結(jié)果,通過從維基百科抽取的特征信息進行投票,得到查詢分類。實驗結(jié)果表明：與傳統(tǒng)的查詢推薦系統(tǒng)相比,隨機游走模型的查詢意圖識別算法能夠同時兼顧準確率和召回率,顯著提高查詢精度�；诰S基百科和web知識的稀有查詢算法有效解決了針對簡短查詢無法準確定位的問題。
[Abstract]:Query recommendation is an important technology in search engine system. By recommending more appropriate queries to improve the user's search experience, the existing methods can find similar queries directly associated with some attributes. The semantic correlation query with indirect association is ignored. In order to solve the above problems, an open knowledge base Wikipedia is adopted and a new query extension system is proposed. By extracting part of structured information and natural text information from Wikipedia, the method forms a hierarchical corpus based on entity skeleton and entity feature and entity relationship. Based on this corpus, the corresponding user query recommendation system is completed. Furthermore, an auxiliary query system is designed to improve the performance of query recommendation when the user query is not included in Wikipedia. The main innovations of this paper are as follows: a query intention recognition algorithm RWM based on random walk model is proposed. This method can solve the problem of sparse data. By random walk process, the concept that is not directly related is extended, so that the identification of query intention can be achieved effectively. This paper proposes a rare query classification algorithm, WWRQ, which uses the structured knowledge of Wikipedia and web knowledge together. The search engine is used to obtain the retrieval results, and the feature information extracted from Wikipedia is used to vote to obtain the query classification. The experimental results show that, compared with the traditional query recommendation system, the search intention recognition algorithm based on random walk model can improve the query accuracy and recall rate simultaneously. The rare query algorithm based on Wikipedia and web effectively solves the problem that short queries can not be located accurately.
【學位授予單位】：北方工業(yè)大學
【學位級別】：碩士
【學位授予年份】：2012
【分類號】：TP391.3

【參考文獻】

相關期刊論文前2條

1 張海粟;馬大明;鄧智龍;;基于維基百科的語義知識庫及其構(gòu)建方法研究[J];計算機應用研究;2011年08期

2 王錦;王會珍;張俐;;基于維基百科類別的文本特征表示[J];中文信息學報;2011年02期

，

本文編號：2109888

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2109888.html

上一篇：江蘇科技信息網(wǎng)指南車搜索引擎的設計與探討
下一篇：CALIS高校專題特色庫與Google搜索引擎契合度現(xiàn)狀探析

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向?qū)嶓w查詢的開放式信息抽取技術研究