元搜索關鍵技術的研究及實現(xiàn)
發(fā)布時間:2018-11-05 12:04
【摘要】:通過對主流的搜索引擎技術及相關產(chǎn)品的分析不難看出,目前的搜索引擎在一定程度上滿足了用戶高效地檢索有效信息的需求。同時,從現(xiàn)有的搜索引擎產(chǎn)品的用戶滿意度的調查不難發(fā)現(xiàn),目前的搜索引擎技術還存在諸多不足。其主要體現(xiàn)在兩個方面:第一、搜索結果的全面性,即通常所說的查全率問題。第二、搜索結果的有效性,即通常所說的查準率。 針對目前獨立搜索引擎技術中查準率與查全率兩大問題的解決方法多種多樣,其中具有代表性的為元搜索技術。該技術在一定程度上解決了獨立搜索引擎的不足,但就元搜索結果而言,仍然存在諸多需要完善的方面。 目前理論界針對元搜索的研究同樣圍繞這三部分展開。1)對于用戶輸入處理方面而言,其存在的主要問題集中于用戶對查詢目標的模型性,用戶輸入內容的多義性等問題。為解決上述問題,出現(xiàn)了關鍵詞提示,分詞技術,基于Agent技術的用戶興趣模型的建立,基于知網(wǎng)的歧義消除及關鍵詞選擇以及基于本體理論的用戶輸入處理等;2)候選引擎調度方面,針對候選搜索引擎搜索調度方面存在的候選搜索引擎選擇以及調度策略等問題,提出了諸如粗略信息代表法,詳細信息代表法,定量算法,靜態(tài)學習法,動態(tài)學習法,動靜態(tài)結合的混合學習法以及基于本體的個性化調度方法等;3)返回結果處理方面而言,主要存在兩大問題,即結果的去重與排序。針對結果的去重問題,目前的相關的理論和方法主要有基于URL,基于文檔標題,基于文檔摘要或三個方面相結合等處理方法;而針對結果的排序,主要基于位置信息的排序方法,基于相關度的排序方法以及基于本體的個性化排序方法等。 本文以元搜索引擎的高查準率、高查全率以及系統(tǒng)的快速反應為基本目標,以元搜索系統(tǒng)的三大主要組成部分為依據(jù),對元搜索引擎現(xiàn)存的相關問題進行分析與研究:1)用戶輸入處理部分,采用了正向分詞與逆向分詞相結合的方法,保證用戶關鍵詞集合能完全反應用戶的搜索意圖,并采用用戶長期興趣樹與短期興趣樹相結合的方式,既可保證用戶興趣類別的時效性,又使得用戶的興趣類別搜索的性能。2)候選搜索引擎調度方面,本文采用用戶興趣類別與用戶檢索行為分析相結合的方式來進行搜索引擎的選擇,力求挑選出最可能提供用戶搜索結果的若干候選搜索引擎,并且采用內存緩沖與本地數(shù)據(jù)庫緩沖相結合的雙緩沖機制,以提高系統(tǒng)對用戶查詢請求的響應時間。3)候選搜索引擎返回的結果方面,在結果去重時,并非單純地過濾各候選搜索引擎的返回結果,而是在去重過程當中便對返回結果賦予相應的權值,以便提高相應結果的排序得分,進而提高相應的返回結果在所有返回結果中的排名。 通過對元搜索各重要組成部分的改進,系統(tǒng)無論是結果的查準率以及查全率均有所提高。GD-FNN與用戶興趣索引樹的結合,使得用戶無論是初次使用本系統(tǒng)還是已經(jīng)多次使用,其搜索結果的滿意度均有所改善;另一方面,雙緩沖調度技術的應用,使得系統(tǒng)的單次搜索時間縮短至十毫秒級。
[Abstract]:Through the analysis of the mainstream search engine technology and related products, it is not easy to see that the search engine at present meets the need of efficient retrieval of effective information by users. At the same time, the research on the user satisfaction of the existing search engine products is not easy to find, and the present search engine technology still has many shortcomings. It is mainly embodied in two aspects: the first, the comprehensiveness of the search results, that is, what is usually referred to as the full rate problem. Second, the validity of the search results, that is, the check rate generally referred to. According to the present independent search engine technology, the method of solving the two big problems of checking rate and full rate is diverse, among which, it is representative of meta-search. Technology. This technology solves the shortage of independent search engines to some extent, but there are still many needs to be improved in terms of meta-search results. In terms of user input processing, the main problems in the research of meta-search are focused on user's model of query object and user input content. In order to solve the above problems, keyword prompt, word segmentation technology, agent-based user interest model establishment, knowledge-based disambiguation and keyword selection and ontology-based user input processing are presented. In order to solve the problems of candidate search engine selection and scheduling strategy, such as rough information representative method, detailed information representative method, quantitative algorithm and static learning method are proposed in this paper. Dynamic learning method, dynamic static combined learning method and ontology-based personalized scheduling method, etc. The related theories and methods are mainly based on URL, document title, document abstract or three aspects based on URL, which is based on location information. Ordering method, ranking method based on correlation degree and personalization based on ontology In this paper, based on the three major components of meta-search system, the existing problems of meta-search engine are analyzed and studied. the user input processing part adopts a method combining the forward word segmentation and the reverse word segmentation so as to ensure that the user keyword set can fully react to the search intention of the user and adopts the mode of combining the long-term interest tree of the user and the short-term interest tree, so that the user can not only guarantee the user, in the aspect of candidate search engine scheduling, the article adopts a way of combining user interest category and user search behavior analysis to select the search engine so as to select the most likely user search result. a plurality of candidate search engines and a double buffering mechanism combining the memory buffer and the local database buffer to improve the response time of the system to the user query request. 3) the candidate search engine returns the result aspect, the candidate search engine is not simply filtered when the result is de-heavy, the return result of the cable engine is returned, but the corresponding weight value is given to the return result during the de-heavy process so as to improve the ranking score of the corresponding result and further improve the corresponding return result in all Returns the ranking in the results. By improving the meta-search for each important component, the system determines whether the results are accurate The combination of GD-FNN and the user's interest index tree has improved the satisfaction of the users whether to use the system for the first time or have been used multiple times, and the application of the double buffering scheduling technology makes the system single time
【學位授予單位】:南京農(nóng)業(yè)大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP391.3
[Abstract]:Through the analysis of the mainstream search engine technology and related products, it is not easy to see that the search engine at present meets the need of efficient retrieval of effective information by users. At the same time, the research on the user satisfaction of the existing search engine products is not easy to find, and the present search engine technology still has many shortcomings. It is mainly embodied in two aspects: the first, the comprehensiveness of the search results, that is, what is usually referred to as the full rate problem. Second, the validity of the search results, that is, the check rate generally referred to. According to the present independent search engine technology, the method of solving the two big problems of checking rate and full rate is diverse, among which, it is representative of meta-search. Technology. This technology solves the shortage of independent search engines to some extent, but there are still many needs to be improved in terms of meta-search results. In terms of user input processing, the main problems in the research of meta-search are focused on user's model of query object and user input content. In order to solve the above problems, keyword prompt, word segmentation technology, agent-based user interest model establishment, knowledge-based disambiguation and keyword selection and ontology-based user input processing are presented. In order to solve the problems of candidate search engine selection and scheduling strategy, such as rough information representative method, detailed information representative method, quantitative algorithm and static learning method are proposed in this paper. Dynamic learning method, dynamic static combined learning method and ontology-based personalized scheduling method, etc. The related theories and methods are mainly based on URL, document title, document abstract or three aspects based on URL, which is based on location information. Ordering method, ranking method based on correlation degree and personalization based on ontology In this paper, based on the three major components of meta-search system, the existing problems of meta-search engine are analyzed and studied. the user input processing part adopts a method combining the forward word segmentation and the reverse word segmentation so as to ensure that the user keyword set can fully react to the search intention of the user and adopts the mode of combining the long-term interest tree of the user and the short-term interest tree, so that the user can not only guarantee the user, in the aspect of candidate search engine scheduling, the article adopts a way of combining user interest category and user search behavior analysis to select the search engine so as to select the most likely user search result. a plurality of candidate search engines and a double buffering mechanism combining the memory buffer and the local database buffer to improve the response time of the system to the user query request. 3) the candidate search engine returns the result aspect, the candidate search engine is not simply filtered when the result is de-heavy, the return result of the cable engine is returned, but the corresponding weight value is given to the return result during the de-heavy process so as to improve the ranking score of the corresponding result and further improve the corresponding return result in all Returns the ranking in the results. By improving the meta-search for each important component, the system determines whether the results are accurate The combination of GD-FNN and the user's interest index tree has improved the satisfaction of the users whether to use the system for the first time or have been used multiple times, and the application of the double buffering scheduling technology makes the system single time
【學位授予單位】:南京農(nóng)業(yè)大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP391.3
【參考文獻】
相關期刊論文 前10條
1 李紅梅;丁振國;周水生;周利華;;元搜索引擎結果合成算法[J];北京郵電大學學報;2008年05期
2 劉續(xù);王q,
本文編號:2312030
本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/2312030.html
最近更新
教材專著