天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

基于推薦技術(shù)的個(gè)性化搜索引擎方案的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-05-27 09:19

  本文選題:搜索引擎 + 數(shù)據(jù)挖掘。 參考:《中國(guó)地質(zhì)大學(xué)(北京)》2012年碩士論文


【摘要】:隨著互聯(lián)網(wǎng)信息的爆炸性增長(zhǎng),搜索引擎用戶對(duì)信息獲取的質(zhì)量提出了更高要求。為了幫助用戶更快更好的找到所需,搜索引擎需要深入分析用戶行為數(shù)據(jù),挖掘行為模式,改善檢索相關(guān)性。本文研究?jī)?nèi)容源于某公司核心部門一項(xiàng)目小組,該項(xiàng)目組致力于挖掘用戶行為數(shù)據(jù),以提升用戶的搜索體驗(yàn)。 本文通過(guò)數(shù)據(jù)挖掘技術(shù),在海量的用戶行為數(shù)據(jù)中挖掘有用的用戶行為模式,借助于全文檢索引擎Lucene,設(shè)計(jì)并實(shí)現(xiàn)了個(gè)性化搜索,并與未實(shí)現(xiàn)個(gè)性化搜索的系統(tǒng)作對(duì)比,結(jié)果表明個(gè)性化搜索給出的結(jié)果更能滿足用戶需求。 為達(dá)成目標(biāo),本文首先深入分析信息檢索的相關(guān)理論,完整描述了搜索引擎各模塊組成及其功能,著重指出了搜索引擎測(cè)評(píng)的重要意義;并詳細(xì)敘述了數(shù)據(jù)挖掘的基礎(chǔ)理論,以及建立在其之上的推薦技術(shù)的基本工作原理。 其次,本文從Query個(gè)性化、排序個(gè)性化以及產(chǎn)品個(gè)性化三個(gè)維度對(duì)個(gè)性化搜索的需求做了深入探討,并構(gòu)建了個(gè)性化搜索的模型以及評(píng)估體系,對(duì)個(gè)性化搜索的潛在風(fēng)險(xiǎn)亦作了簡(jiǎn)要分析。在這些工作的基礎(chǔ)上,提出了實(shí)現(xiàn)個(gè)性化搜索的總體規(guī)劃。 再次,為了表明用戶行為數(shù)據(jù)可用于個(gè)性化搜索,本文從基礎(chǔ)數(shù)據(jù)的角度出發(fā),提出了五個(gè)基本假設(shè),并從統(tǒng)計(jì)學(xué)的角度充分論證了用戶行為數(shù)據(jù)對(duì)對(duì)個(gè)性化搜索的理論支持。為了保存海量的用戶行為數(shù)據(jù),本文還設(shè)計(jì)了數(shù)據(jù)倉(cāng)庫(kù)系統(tǒng),以支撐后端的推薦技術(shù)系統(tǒng)。 最后,本文提出三種實(shí)現(xiàn)個(gè)性化搜索的詳細(xì)方案以及流程圖,并對(duì)核心的推薦系統(tǒng)以及線下挖掘模塊給出了詳細(xì)架構(gòu):第一種方案通過(guò)修改相關(guān)性排序算法,以加入個(gè)性化因子;第二種方案不需要修改現(xiàn)有搜索引擎的核心算法,僅需要在現(xiàn)有檢索結(jié)果的基礎(chǔ)上進(jìn)行個(gè)性化排序;第三中方案根據(jù)用戶的個(gè)性化需求,對(duì)用戶檢索的Query進(jìn)行改寫,這種方案不需要修改原有排序算法。綜合考慮成本以及對(duì)現(xiàn)有系統(tǒng)的耦合度,本文拋棄第一種方案,借助于全文檢索引擎Lucene的,集成第二、第三種方案,實(shí)現(xiàn)了個(gè)性化搜索,并通過(guò)“個(gè)性化環(huán)境”和“對(duì)比環(huán)境”的搜索結(jié)果對(duì)比,,證實(shí)了個(gè)性化搜索更能滿足用戶需求。
[Abstract]:With the explosive growth of Internet information, the users of search engines have put forward higher requirements for the quality of information acquisition. In order to help users to find more quickly and better, search engines need to analyze user behavior data, mining behavior patterns, and improve retrieval relevance. The content of this paper is based on a small project of a company's core department. Group, the project team is committed to mining user behavior data to enhance user search experience.
Through data mining technology, this paper excavate useful user behavior patterns in massive user behavior data, designed and implemented personalized search with the help of full text search engine Lucene, and compared with the system that did not realize personalized search. The results show that the personalized search results can meet the user needs more.
In order to achieve the goal, this paper first deeply analyzes the relevant theory of information retrieval, describes the components and functions of each module of the search engine, points out the significance of the search engine evaluation, and describes the basic theory of data mining and the basic principle of the recommendation technology based on it.
Secondly, this paper makes an in-depth discussion on the requirements of personalized search from three dimensions of Query personalization, sorting, individualization and product individualization, and constructs a personalized search model and evaluation system, and gives a brief analysis of the potential risk of personalized search. On the basis of these work, the general search is put forward. Body planning.
Thirdly, in order to show that user behavior data can be used for personalized search, this paper puts forward five basic hypotheses from the perspective of basic data, and fully demonstrates the theoretical support of user behavior data to personalized search from the statistical point of view. In order to save massive user behavior data, this paper also designs a data warehouse system. A recommended technical system to support the back end.
Finally, this paper puts forward three detailed schemes and flow charts for personalized search, and gives a detailed framework for the core recommendation system and the offline mining module. The first scheme can add personalized factors by modifying the correlation sorting algorithm, and the second schemes need not modify the core algorithms of the existing search engines. It is necessary to make personalized sorting on the basis of the existing retrieval results; thirdly, the third scheme rewrites the user's Query based on the user's personalized requirements. This scheme does not need to modify the original sorting algorithm. The first scheme is abandoned and the full text retrieval engine Luce is abandoned. NE, which integrates second and third schemes, implements personalized search, and compares the search results of "personalized environment" and "contrast environment" to confirm that personalized search can meet the needs of users more.
【學(xué)位授予單位】:中國(guó)地質(zhì)大學(xué)(北京)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.3

【參考文獻(xiàn)】

相關(guān)博士學(xué)位論文 前2條

1 孫小華;協(xié)同過(guò)濾系統(tǒng)的稀疏性與冷啟動(dòng)問(wèn)題研究[D];浙江大學(xué);2005年

2 郁雪;基于協(xié)同過(guò)濾技術(shù)的推薦方法研究[D];天津大學(xué);2009年

相關(guān)碩士學(xué)位論文 前2條

1 何克勤;基于標(biāo)簽的推薦系統(tǒng)模型及算法研究[D];華東師范大學(xué);2011年

2 李慧;基于用戶評(píng)論信息的商品推薦技術(shù)[D];揚(yáng)州大學(xué);2007年



本文編號(hào):1941423

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/sousuoyinqinglunwen/1941423.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶0bf1e***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com