天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 搜索引擎論文 >

WAF改進算法在基于語義分析的查詢擴展上的應用

發(fā)布時間:2018-04-27 00:14

  本文選題:查詢擴展 + 詞激活力; 參考:《北京郵電大學》2012年碩士論文


【摘要】:查詢擴展是信息檢索中的一項重要技術,是輔助用戶更好使用搜索引擎的有效手段。但是,隨著互聯(lián)網信息的復雜化和多遠化,尤其是微博、微信等社交方式高速發(fā)展,傳統(tǒng)的查詢擴展算法由于忽略了文檔中詞間的語義關系,已無法在不規(guī)范的短文本上推薦出有效的關鍵詞。傳統(tǒng)檢索模型的詞獨立性假設和短文本的信息缺失,導致現(xiàn)有查詢擴展算法無法獲取足夠的語義信息,進入無法解決用戶檢索時普遍存在的同義詞和多義詞問題。 本文針對以上問題對經典的信息檢索模型和查詢擴展方法展開了深入調研,分析得出引發(fā)查詢擴展問題的根本原因在于缺少行之有效的語義分析,本文創(chuàng)造性地提出將詞激活力算法WAF應用在基于話題的查詢擴展中,意在通過精準的語義分析手段為查詢擴展的提高尋找突破口。 本文通過對WAF理論的深入學習,提出一種全新的基于WAF的查詢擴展算法,主要工作如下: 第一,通過WAF與傳統(tǒng)詞關聯(lián)算法在微博語料上的大量對比實驗,證明了WAF在語義分析和詞網建模上的巨大優(yōu)勢,尤其是話題核心詞的擴展和高價值詞的挖掘。 第二,針對短文本的不規(guī)范性和信息缺失,本文通過調整WAF中詞激活力的計算方式,使其充分利用短文本特點,弱化噪聲特征對于核心語義分析的影響。為了提高WAF的詞擴展質量,本文提出在詞網模型的基礎上,通過詞親和度的整體分布對關聯(lián)詞列表的排序進行調整。 第三,本文將WAF的語義分析和話題聚類相結合,設計出一種較為完備的查詢擴展算法,并且嵌入到微博監(jiān)控項目的整體框架中,應用在微博語料的檢索上。經過與基于BM25權重機制的查詢擴展的對比實驗,證明了WAF生成的詞網模型在查詢擴展中的巨大潛力。
[Abstract]:Query expansion is an important technology in information retrieval and an effective means to assist users to use search engine better. However, with the complexity and remoteness of Internet information, especially the rapid development of Weibo, WeChat and other social methods, traditional query expansion algorithms ignore the semantic relationship between words in the document. It is no longer possible to recommend valid keywords on an irregular essay. Because of the assumption of word independence in traditional retrieval model and the lack of information in short text, the existing query expansion algorithms can not obtain enough semantic information and can not solve the problem of synonyms and polysemous words commonly existing in user retrieval. In this paper, the classical information retrieval model and query expansion method are investigated, and the basic reason of the query expansion problem is the lack of effective semantic analysis. This paper creatively proposes to apply the word activation algorithm (WAF) to the topic based query expansion in order to find a breakthrough for the improvement of query expansion by means of precise semantic analysis. In this paper, a new query extension algorithm based on WAF is proposed through the in-depth study of WAF theory. The main work is as follows: First, through a large number of comparative experiments between WAF and traditional word association algorithm in Weibo corpus, it is proved that WAF has great advantages in semantic analysis and word net modeling, especially the expansion of topic core words and the mining of high-value words. Secondly, in view of the lack of information and the irregularity of short text, this paper adjusts the calculation method of word activation force in WAF to make full use of the feature of short text, and weakens the influence of noise feature on core semantic analysis. In order to improve the word extension quality of WAF, this paper proposes to adjust the ranking of associated words through the global distribution of word affinity on the basis of word net model. Thirdly, this paper combines the semantic analysis of WAF and topic clustering to design a more complete query expansion algorithm, and embed it into the overall framework of Weibo monitoring project, which is applied to the retrieving of Weibo corpus. By comparing with the query expansion based on BM25 weight mechanism, it is proved that the word net model generated by WAF has great potential in query expansion.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP391.1

【參考文獻】

相關期刊論文 前2條

1 胡佳妮,徐蔚然,郭軍,鄧偉洪;中文文本分類中的特征選擇算法研究[J];光通信研究;2005年03期

2 林鴻飛,楊元生;用戶興趣模型的表示和更新機制[J];計算機研究與發(fā)展;2002年07期

相關碩士學位論文 前2條

1 楊海南;基于語義詞典和局部分析的查詢擴展研究[D];武漢理工大學;2010年

2 趙欣;基于雙語命名實體識別的詞匯對齊和機器翻譯研究[D];廈門大學;2009年

,

本文編號:1808313

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1808313.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶c4a20***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com