基于多核技術(shù)的搜索結(jié)果聚類(lèi)算法研究
本文選題:搜索結(jié)果聚類(lèi) 切入點(diǎn):網(wǎng)絡(luò)聚類(lèi)引擎 出處:《廣西大學(xué)》2012年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:網(wǎng)絡(luò)聚類(lèi)引擎將聚類(lèi)技術(shù)融入到搜索引擎中,對(duì)搜索結(jié)果進(jìn)行聚類(lèi),以主題簇的形式返回給用戶(hù),用戶(hù)只需在少量的主題簇中篩選出感興趣的主題,繼而再深入查看內(nèi)容是否有價(jià)值,這樣可以極大地減輕用戶(hù)的查找任務(wù),是當(dāng)前搜索引擎的研究熱點(diǎn)。影響網(wǎng)絡(luò)聚類(lèi)引擎的用戶(hù)體驗(yàn)有兩點(diǎn):一是主題簇的最終顯示方式,二是響應(yīng)用戶(hù)請(qǐng)求的效率。本文對(duì)上述問(wèn)題進(jìn)行了研究,具體包括: (1)網(wǎng)絡(luò)聚類(lèi)引擎呈現(xiàn)主題簇的形式主要是文件夾樹(shù)或其他的圖形視圖。只有客觀地評(píng)價(jià)一個(gè)簇的重要性,才能使簇的排列合乎用戶(hù)的期望;贚ingo算法,提出了一種改進(jìn)計(jì)算簇分值的方法,不僅考慮簇標(biāo)簽的分值和簇內(nèi)文檔的數(shù)量,還利用文檔在搜索結(jié)果中的原有排名和在簇中的分值。實(shí)驗(yàn)結(jié)果表明,改進(jìn)后的簇分值可以客觀反映簇的相關(guān)性和權(quán)威性。(2)聚類(lèi)算法是比較耗時(shí)的過(guò)程,需要提高算法的效率以滿足用戶(hù)對(duì)在線聚類(lèi)的時(shí)間容忍度。隨著多核處理器的快速發(fā)展和廣為使用,針對(duì)Lingo算法的時(shí)間效率問(wèn)題,在提出的Lingo改進(jìn)算法的基礎(chǔ)上,運(yùn)用多線程技術(shù),使用并行程序設(shè)計(jì)來(lái)實(shí)現(xiàn)其并行化,以利用多核的資源優(yōu)勢(shì),從而提高改進(jìn)算法的性能。實(shí)驗(yàn)表明所設(shè)計(jì)的并行Lingo算法有較好的性能。
[Abstract]:Network clustering engine clustering technology into the search engine, cluster search results and return to the user in the form of topic clusters, users only need to select a topic of interest in the theme of small clusters, whether there is a value then look no further, this can greatly reduce the user search task is current the search engine research focus. The influence of the user experience of Web Clustering engines have two points: one is the presentation of clusters is two, the efficiency of responding to user request. This paper made a research on the above problems, including:
(1) Web Clustering engines display cluster is mainly in the form of the folder tree or other graphical view. Only the objective evaluation of the importance of a cluster, cluster to make arrangement in line with the user's expectations. Based on the Lingo algorithm, an improved method is proposed for calculating the cluster score, not only consider the number of cluster label scores and cluster documents, also use the document search results in the original ranking and in cluster scores. The experimental results show that the improved cluster score can objectively reflect the relevance and authority of clusters. (2) clustering algorithm is a time-consuming process, to improve the efficiency of the algorithm to meet the needs of users of online time clustering tolerance. With the rapid development of multi-core processor and is widely used in time, the efficiency of Lingo algorithm, an improved algorithm based on the Lingo, the use of multi threading technology, parallel program design To achieve its parallelization, the performance of the improved algorithm is improved by using the multi core resource advantage. The experiment shows that the designed parallel Lingo algorithm has good performance.
【學(xué)位授予單位】:廣西大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 邵峰晶,張斌,于忠清;多閾值BIRCH聚類(lèi)算法及其應(yīng)用[J];計(jì)算機(jī)工程與應(yīng)用;2004年12期
2 龍真真;張策;劉飛裔;張正文;;一種改進(jìn)的Chameleon算法[J];計(jì)算機(jī)工程;2009年20期
3 行小帥,潘進(jìn),焦李成;基于免疫規(guī)劃的K-means聚類(lèi)算法[J];計(jì)算機(jī)學(xué)報(bào);2003年05期
4 馬帥,王騰蛟,唐世渭,楊冬青,高軍;一種基于參考點(diǎn)和密度的快速聚類(lèi)算法[J];軟件學(xué)報(bào);2003年06期
5 雷小鋒;謝昆青;林帆;夏征義;;一種基于K-Means局部最優(yōu)性的高效聚類(lèi)算法[J];軟件學(xué)報(bào);2008年07期
6 田森平;吳文亮;;自動(dòng)獲取k-means聚類(lèi)參數(shù)k值的算法[J];計(jì)算機(jī)工程與設(shè)計(jì);2011年01期
相關(guān)碩士學(xué)位論文 前1條
1 張曉衛(wèi);Web全文信息檢索系統(tǒng)的研究與實(shí)現(xiàn)[D];蘇州大學(xué);2006年
,本文編號(hào):1605841
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1605841.html