基于關(guān)聯(lián)規(guī)則的查詢擴(kuò)展技術(shù)研究

發(fā)布時(shí)間：2018-08-30 12:07

【摘要】：隨著網(wǎng)絡(luò)信息量的日益劇增，通過搜索引擎找到人們想要的確切信息還存在一定的困難，，查詢率不高和查準(zhǔn)率低，成為搜索引擎迫切需要解決的問題。針對(duì)這一問題，本文依據(jù)Van Rijsbergen學(xué)者提出的利用對(duì)原查詢進(jìn)行修改來提高檢索能力的觀點(diǎn)，對(duì)基于關(guān)聯(lián)規(guī)則的查詢擴(kuò)展技術(shù)進(jìn)行研究。主要內(nèi)容如下： 1．首先對(duì)本文研究的基礎(chǔ)內(nèi)容：數(shù)據(jù)挖掘、關(guān)聯(lián)規(guī)則、查詢擴(kuò)展，進(jìn)行詳細(xì)介紹，針對(duì)現(xiàn)有的基于關(guān)聯(lián)規(guī)則的查詢擴(kuò)展技術(shù)進(jìn)行分析，指出優(yōu)缺點(diǎn)，針對(duì)共性的缺點(diǎn)：現(xiàn)有的基于關(guān)聯(lián)規(guī)則的查詢擴(kuò)展算法都不注意關(guān)聯(lián)規(guī)則挖掘算法的挖掘效率以及采用的挖掘算法是否適合，作為本文的研究重點(diǎn)。 2．針對(duì)上述問題，本文首次提出基于最大頻繁項(xiàng)目集挖掘的查詢擴(kuò)展算法，算法采用基于向量空間模型的查詢技術(shù)，對(duì)初次檢索到的n篇文檔進(jìn)行分詞處理，將處理后的分詞以垂直數(shù)據(jù)格式進(jìn)行表示，采用求交集的方法得到項(xiàng)目集支持度，同時(shí)采用集合枚舉樹數(shù)據(jù)結(jié)構(gòu)、一定的剪枝策略進(jìn)行最大頻繁項(xiàng)目集挖掘，得到擴(kuò)展詞庫；擴(kuò)展詞和初始查詢?cè)~相結(jié)合，進(jìn)行二次檢索。實(shí)驗(yàn)證明，同以往算法相比，算法效率得到提高。 3．本文提出的基于最大頻繁項(xiàng)目集挖掘的查詢擴(kuò)展算法，是假設(shè)原查詢?cè)~和擴(kuò)展詞的重要程度一樣的基礎(chǔ)上進(jìn)行的，沒有考慮原查詢?cè)~和擴(kuò)展詞的權(quán)重問題；同時(shí)最大頻繁項(xiàng)目集挖掘，丟失了部分頻繁項(xiàng)的支持度信息。針對(duì)上述問題，本文提出基于頻繁閉合項(xiàng)目集的查詢擴(kuò)展算法。算法采用HT-struct鏈接結(jié)構(gòu)，采用深度優(yōu)先搜索策略，結(jié)合一定的剪枝技術(shù)，挖掘出頻繁閉合項(xiàng)目集，得到關(guān)聯(lián)規(guī)則，得到擴(kuò)展詞庫；算法同時(shí)根據(jù)規(guī)則置信度衡量擴(kuò)展詞的權(quán)重。實(shí)驗(yàn)證明，算法的效率得到了提高，算法具有可行性。
[Abstract]:With the rapid increase of network information, it is still difficult to find the exact information that people want through search engine, and the query rate is not high and the precision rate is low, which becomes the urgent problem that search engine needs to solve. In order to solve this problem, this paper studies the query extension technology based on association rules according to the viewpoint of Van Rijsbergen scholars to improve the retrieval ability by modifying the original query. The main contents are as follows: 1. Firstly, the basic contents of this paper: data mining, association rules, query expansion, detailed introduction, and analysis of the existing query extension technology based on association rules. Pointing out the advantages and disadvantages, aiming at the common shortcomings: the existing query expansion algorithms based on association rules do not pay attention to the mining efficiency of association rules mining algorithms and whether the mining algorithms are suitable or not. 2. Aiming at the above problems, this paper proposes a query expansion algorithm based on maximum frequent itemset mining for the first time, which adopts the query technology based on vector space model. The first retrieval of n documents is partitioned, the processed participle is represented by vertical data format, the support degree of item set is obtained by the method of intersection, and the data structure of set enumeration tree is adopted at the same time. A certain pruning strategy is used to mine the maximum frequent itemsets, and the extended lexicon is obtained, and the extended words are combined with the initial query words for secondary retrieval. Experimental results show that compared with the previous algorithms, the efficiency of the algorithm is improved. 3. The query expansion algorithm based on maximum frequent itemsets mining is proposed in this paper. It is based on the assumption that the importance of the original query word and the extension word is the same, and the weight of the original query word and the extended word is not considered. At the same time, the maximal frequent itemsets are mined, and the support degree information of some frequent items is lost. To solve the above problems, this paper proposes a query expansion algorithm based on frequently closed itemsets. The algorithm adopts HT-struct link structure, adopts depth-first search strategy, combines certain pruning technology, mining frequent closed itemsets, obtains association rules, and obtains extended lexicon. At the same time, the algorithm measures the weight of extended words according to the confidence degree of the rules. Experiments show that the efficiency of the algorithm is improved and the algorithm is feasible.
【學(xué)位授予單位】：解放軍信息工程大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2012
【分類號(hào)】：TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前5條

1 黃美璇;;基于主題發(fā)現(xiàn)的輿情分析系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];北京聯(lián)合大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年01期

2 黃名選;嚴(yán)小衛(wèi);張師超;;查詢擴(kuò)展技術(shù)進(jìn)展與展望[J];計(jì)算機(jī)應(yīng)用與軟件;2007年11期

3 崔航,文繼榮,李敏強(qiáng);基于用戶日志的查詢擴(kuò)展統(tǒng)計(jì)模型[J];軟件學(xué)報(bào);2003年09期

4 黃名選;嚴(yán)小衛(wèi);張師超;;基于矩陣加權(quán)關(guān)聯(lián)規(guī)則挖掘的偽相關(guān)反饋查詢擴(kuò)展[J];軟件學(xué)報(bào);2009年07期

5 繆裕青;金波;陳國良;;HTCLOSE:快速挖掘微陣列數(shù)據(jù)集中的頻繁閉合模式[J];小型微型計(jì)算機(jī)系統(tǒng);2008年02期

相關(guān)博士學(xué)位論文前2條

1 繆裕青;關(guān)聯(lián)規(guī)則挖掘及其在基因表達(dá)數(shù)據(jù)中的應(yīng)用[D];中國科學(xué)技術(shù)大學(xué);2007年

2 米楊;基于頂級(jí)本體整合的醫(yī)學(xué)領(lǐng)域語義標(biāo)注研究[D];吉林大學(xué);2012年

相關(guān)碩士學(xué)位論文前7條

1 周劍烽;基于語義本體的信息檢索方法的研究[D];杭州電子科技大學(xué);2010年

2 唐蓉;搜索引擎重復(fù)網(wǎng)頁檢測技術(shù)研究[D];重慶理工大學(xué);2011年

3 譚義紅;關(guān)聯(lián)規(guī)則挖掘及其在概念檢索中的應(yīng)用研究[D];湖南大學(xué);2003年

4 薛云;Internet上元搜索引擎的研究與設(shè)計(jì)[D];太原理工大學(xué);2003年

5 朱冀;以概念分層為背景知識(shí)的關(guān)聯(lián)規(guī)則挖掘算法的分析[D];電子科技大學(xué);2004年

6 黃名選;基于完全加權(quán)關(guān)聯(lián)規(guī)則挖掘的查詢擴(kuò)展研究[D];廣西師范大學(xué);2007年

7 彭程;關(guān)聯(lián)規(guī)則在搜索引擎中的應(yīng)用及研究[D];西安理工大學(xué);2010年

本文編號(hào)：2213026

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2213026.html

上一篇：基于用戶反饋的關(guān)系數(shù)據(jù)庫關(guān)鍵詞搜索技術(shù)研究
下一篇：互聯(lián)網(wǎng)免費(fèi)中文論文檢索技巧

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于關(guān)聯(lián)規(guī)則的查詢擴(kuò)展技術(shù)研究