基于聚類(lèi)的搜索可視化呈現(xiàn)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
本文選題:搜索結(jié)果聚類(lèi) 切入點(diǎn):可視化 出處:《北京郵電大學(xué)》2013年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:隨著信息技術(shù)以及互聯(lián)網(wǎng)的普及,搜索引擎技術(shù)得到了快速的發(fā)展,傳統(tǒng)的搜索引擎根據(jù)用戶輸入的檢索關(guān)鍵字為用戶提供搜索結(jié)果,并根據(jù)相關(guān)性對(duì)得到的搜索結(jié)果進(jìn)行排序。然而,由于自然語(yǔ)言中的詞語(yǔ)往往存在二義性,用戶所輸入的檢索詞概念相對(duì)模糊,導(dǎo)致搜索結(jié)果范圍相對(duì)分散,用戶需要花費(fèi)一定時(shí)間才能找到自己真正感興趣的主題。如何有效處理大量的搜索結(jié)果減少用戶的檢索時(shí)間,這促進(jìn)了元搜索引擎技術(shù)以及文本聚類(lèi)技術(shù)的發(fā)展。 本論文旨在結(jié)合元搜索和文本聚類(lèi)技術(shù)對(duì)搜索結(jié)果進(jìn)行改進(jìn)和增強(qiáng)。元搜索技術(shù)是建立在獨(dú)立搜索引擎上的一種技術(shù),該技術(shù)能夠針對(duì)各個(gè)成員搜索引擎的搜索結(jié)果進(jìn)行匯聚和篩選,并將整合的最終結(jié)果呈現(xiàn)給用戶。搜索結(jié)果之間存在著不同程度的差異性,如果利用文本聚類(lèi)技術(shù)對(duì)搜索結(jié)果進(jìn)行聚類(lèi),以層次化的形式呈現(xiàn)搜索結(jié)果,每個(gè)聚類(lèi)結(jié)果都有一個(gè)標(biāo)簽用以描述其主題和內(nèi)容,這樣可以一定程度上幫助用戶進(jìn)行搜索定位,從而將檢索范圍和用于檢索篩選的時(shí)間降低。 本論文的主要內(nèi)容是設(shè)計(jì)并實(shí)現(xiàn)一個(gè)基于聚類(lèi)的搜索結(jié)果可視化工具。為了實(shí)現(xiàn)該工具,本文首先基于原有搜索引擎得到的結(jié)果,提出一種結(jié)合用戶行為的搜索結(jié)果聚類(lèi)方法。該方法通過(guò)對(duì)搜索結(jié)果進(jìn)行二次處理,將具有相似主題的搜索結(jié)果歸并,以聚類(lèi)的形式將搜索結(jié)果呈現(xiàn)給用戶,幫助用戶快速定位到自己感興趣的信息。與此同時(shí),該方法還對(duì)用戶的訪問(wèn)行為信息加以收集分析,通過(guò)迭代的方式不斷優(yōu)化聚類(lèi)算法。本文結(jié)合需求對(duì)系統(tǒng)相關(guān)方案和總體架構(gòu)進(jìn)行說(shuō)明,包括了搜索結(jié)果獲取、搜索結(jié)果預(yù)處理、聚類(lèi)生成以及用戶行為分析等主要模塊的設(shè)計(jì)思想和工作原理。接下來(lái),本文對(duì)各個(gè)主要模塊的實(shí)現(xiàn)以及具體交互設(shè)計(jì)和工作流程進(jìn)行詳細(xì)的說(shuō)明,并展示了最終的聚類(lèi)效果以及對(duì)聚類(lèi)效果和質(zhì)量方面的測(cè)試結(jié)果。最后對(duì)本文設(shè)計(jì)實(shí)現(xiàn)的搜索結(jié)果聚類(lèi)可視化工具提出建議和下一步研究方向。
[Abstract]:With the popularity of information technology and the Internet, search engine technology has been rapidly developed. Traditional search engines provide users with search results according to the search keywords entered by users. The search results are sorted according to the correlation. However, because of the ambiguity of the words in the natural language, the concept of the search words entered by the user is relatively fuzzy, which leads to the relative dispersion of the search results. It takes a certain amount of time for users to find their own real topic of interest. How to effectively deal with a large number of search results to reduce the retrieval time of users, which promotes the development of meta-search engine technology and text clustering technology. The purpose of this thesis is to improve and enhance the search results by combining meta-search and text clustering. Meta-search is a technology based on independent search engine. This technology can aggregate and filter the search results of each member search engine, and present the integrated final results to the users. If the text clustering technology is used to cluster the search results and present the search results in a hierarchical form, each result has a label to describe its subject and content, which can help the user to locate the search to a certain extent. As a result, the retrieval range and the time used for retrieving filtering are reduced. The main content of this paper is to design and implement a clustering based search results visualization tool. A search result clustering method combining user behavior is proposed. By quadratic processing of search results, the search results with similar topics are merged and presented to users in the form of clustering. It helps users quickly locate the information they are interested in. At the same time, the method also collects and analyzes the information about users' access behavior. In this paper, the related schemes and the overall architecture of the system are explained, including the search results acquisition, search results preprocessing, and the optimization of the clustering algorithm. The design idea and working principle of the main modules, such as clustering generation and user behavior analysis, are introduced in detail. Then, the realization of the main modules and the specific interactive design and workflow are described in detail. Finally, the final clustering effect and the test results of clustering effect and quality are presented. Finally, some suggestions and further research directions are given for the design and implementation of the search result clustering visualization tool in this paper.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 王繼成,潘金貴,張福炎;Web文本挖掘技術(shù)研究[J];計(jì)算機(jī)研究與發(fā)展;2000年05期
2 王繼成,蕭嶸,孫正興,張福炎;Web信息檢索研究進(jìn)展[J];計(jì)算機(jī)研究與發(fā)展;2001年02期
3 徐偉革;;淺析數(shù)字圖書(shū)館搜索引擎[J];科技情報(bào)開(kāi)發(fā)與經(jīng)濟(jì);2010年20期
4 余慧佳;劉奕群;張敏;茹立云;馬少平;;基于大規(guī)模日志分析的搜索引擎用戶行為分析[J];中文信息學(xué)報(bào);2007年01期
5 趙華軍;鐘才明;李文;王睿智;苗奪謙;;網(wǎng)頁(yè)搜索結(jié)果聚類(lèi)與可視化[J];南京大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年05期
6 趙宇;;計(jì)算機(jī)檢索工具的發(fā)展與應(yīng)用[J];中小企業(yè)管理與科技(上旬刊);2011年02期
相關(guān)碩士學(xué)位論文 前4條
1 蘇力華;基于向量空間模型的文本分類(lèi)技術(shù)研究[D];西安電子科技大學(xué);2006年
2 周登朋;搜索引擎搜索結(jié)果的聚類(lèi)研究[D];上海交通大學(xué);2007年
3 馮超;K-means聚類(lèi)算法的研究[D];大連理工大學(xué);2007年
4 馮冰潔;后綴樹(shù)算法在元搜索引擎中的應(yīng)用[D];暨南大學(xué);2010年
,本文編號(hào):1613569
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1613569.html