天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Mahout、Hadoop的推薦系統(tǒng)研究與實現(xiàn)

發(fā)布時間:2018-03-11 05:05

  本文選題:推薦系統(tǒng) 切入點:協(xié)同過濾 出處:《長江大學》2016年碩士論文 論文類型:學位論文


【摘要】:隨著以電子商務為代表的互聯(lián)網(wǎng)近年來的飛速發(fā)展,數(shù)據(jù)量、信息量爆發(fā)式的增加,使得在龐大數(shù)量的商品中選擇出目標用戶真正需要商品的難度增大。為了滿足這一需求,對在當今社會之中扮演著越來越重要的角色的推薦系統(tǒng)進行細致的研究便有著較大的現(xiàn)實意義。提高推薦系統(tǒng)推薦的準確度,既能為使用其的企業(yè)獲取巨額經(jīng)濟效益,同時也為使用其的用戶提供更加人性化的便捷服務。協(xié)同過濾算法在推薦系統(tǒng)中有著眾多成功應用,可是該類算法在稀疏數(shù)據(jù)場景下的表現(xiàn)并不盡如人意。本文從推薦算法的基本概念入手,討論若干種不同相似度計算方式的協(xié)同過濾算法,提出基于巴氏系數(shù)的相似度計算方式,通過MovieLens、Netflix和Yahoo Music開源數(shù)據(jù)進行實驗驗證其有效性。推薦系統(tǒng)作為一個數(shù)據(jù)密集型的系統(tǒng),很容易出現(xiàn)數(shù)據(jù)爆炸式地增長,本文還針對海量數(shù)據(jù)情景,分析了Hadoop分布式計算平臺的計算原理,以及著名的機器學習框架Mahout中的推薦算法部分進行了詳細的介紹,并介紹了其對所提出的基于巴氏系數(shù)的協(xié)同過濾算法的具體實現(xiàn)所帶來的便利,以及其能Hadoop結合使用的原理。最后本文進行了系統(tǒng)原型的設計與實現(xiàn)。具體的介紹了所提出的基于巴氏系數(shù)的相似度的協(xié)同過濾算法在Mahout中的實現(xiàn)過程,并給出了源代碼,然后根據(jù)系統(tǒng)長時間運行的必然需求,給出了將單機計算環(huán)境中的系統(tǒng)遷移至Hadoop分布式計算平臺的具體方案及步驟,用Mahout結合Hadoop的方式解決海量數(shù)據(jù)帶來的計算和儲存瓶頸?偨Y說來,本文的創(chuàng)新點主要體現(xiàn)在以下兩點:1)針對協(xié)同過濾算法過于依賴共同評分數(shù)據(jù)的缺陷,在稀疏數(shù)據(jù)場景下所做出的推薦結果并不準確,為解決這一問題,本文提出了一種新的基于巴氏系數(shù)的相似度計算方式,用于協(xié)同過濾算法之中,并通過開源數(shù)據(jù)的實驗結果分析,證明了該方式在稀疏場景下的有效性;2)為了實際應用,對Mahout庫進行了擴展,增加了本文所研究的基于巴氏系數(shù)的協(xié)同過濾算法,并給出關鍵部分的源代碼。
[Abstract]:With the rapid development of the Internet represented by electronic commerce in recent years, the amount of data and information explosively increases, which makes it more difficult to select the target user in a large number of commodities. It is of great practical significance to study the recommendation system which plays a more and more important role in today's society. At the same time, it also provides more humanized and convenient service for the users who use it. The collaborative filtering algorithm has many successful applications in the recommendation system. However, the performance of this kind of algorithm in sparse data scene is not satisfactory. This paper starts with the basic concept of recommendation algorithm, and discusses several collaborative filtering algorithms with different similarity calculation methods. The similarity calculation method based on pasteurian coefficient is proposed, and the validity of this method is verified by experiments of Movie Lenser Netflix and Yahoo Music open source data. As a data-intensive system, recommendation system is prone to explosive growth of data. This paper also analyzes the computing principle of Hadoop distributed computing platform and the recommendation algorithm in the famous machine learning framework Mahout. It also introduces the convenience of the proposed collaborative filtering algorithm based on pasteurian coefficient. Finally, the design and implementation of the prototype of the system are given. The implementation process of the proposed similarity filtering algorithm based on pasteurian coefficient in Mahout is introduced in detail, and the source code is given. Then according to the inevitable demand of the system running for a long time, the concrete scheme and steps of migrating the system in the single-machine computing environment to the Hadoop distributed computing platform are given. This paper uses Mahout and Hadoop to solve the bottleneck of computing and storage brought about by massive data. In conclusion, the innovation of this paper is mainly reflected in the following two points: 1) aiming at the defects of collaborative filtering algorithm relying too much on common score data. In order to solve this problem, a new similarity calculation method based on pasteurian coefficient is proposed, which is used in collaborative filtering algorithm. Through the analysis of the experimental results of open source data, it is proved that this method is effective in sparse scenario. In order to practical application, the Mahout library is extended, and the cooperative filtering algorithm based on pasteurian coefficient is added, which is studied in this paper. And gives the key part of the source code.
【學位授予單位】:長江大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP391.3


本文編號:1596692

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/jingjilunwen/dianzishangwulunwen/1596692.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶3b6e2***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com