基于MapReduce框架的混合推薦算法
本文選題:協(xié)同過濾 + 混合推薦系統(tǒng) ; 參考:《長春工業(yè)大學(xué)》2017年碩士論文
【摘要】:互聯(lián)網(wǎng)信息的爆炸式增長、信息的種類變得紛繁復(fù)雜以及新興電子商務(wù)服務(wù)的出現(xiàn)使得信息過載的情況變得越來越嚴(yán)重。因而在信息過濾工具中,推薦系統(tǒng)的地位也變得越來越重要。在實際使用的系統(tǒng)中,使用最多的個性化推薦方法就是協(xié)同過濾算法。但隨著推薦系統(tǒng)規(guī)模的不斷擴大,傳統(tǒng)的推薦算法大多都會遇到嚴(yán)重的計算瓶頸,且大量的數(shù)據(jù)并未顯著提高推薦算法的精度。因此,為了應(yīng)對不斷增長的數(shù)據(jù)規(guī)模,對協(xié)同過濾推薦算法的并行化改造是十分必要的。本文對基于MapReduce并行計算框架的協(xié)同過濾推薦算法的設(shè)計及實現(xiàn)進行了研究。首先使用MapReduce框架對算法進行并行化,之后針對不同算法進行優(yōu)化。對于基于物品的協(xié)同過濾算法,使用共現(xiàn)矩陣替換相似度矩陣,降低計算相似度矩陣所消耗的時間;在計算推薦結(jié)果的時候,使用Top-N的方法選擇最近鄰進行計算,降低算法的計算量。對于基于用戶的協(xié)同過濾算法,將數(shù)據(jù)使用聚類的方法進行分組。對每個分組的數(shù)據(jù),將同一分組的用戶作為最近鄰,計算組內(nèi)推薦值;使用所有的中心用戶作為近鄰,計算出組間推薦值。將這三個推薦結(jié)果作為訓(xùn)練數(shù)據(jù),實際評分作為輸出結(jié)果,使用線性回歸的方法進行建模。針對這個模型,定義損失函數(shù)后,使用梯度下降的方法求出最優(yōu)的混合比例。具體來說,通過將數(shù)據(jù)進行十折交叉,劃分出多個數(shù)據(jù)分組,通過不同的Top-N值及數(shù)據(jù)分組,可以訓(xùn)練出不同的混合參數(shù),再使用這個參數(shù)對所有的數(shù)據(jù)分組計算出MAE值和RMSE值的均值。通過比較計算出的均值,選擇最優(yōu)的混合系數(shù)和Top-N值。在實驗中通過對前述兩個算法所產(chǎn)生的三份推薦結(jié)果進行混合來產(chǎn)生最終的推薦結(jié)果,并對推薦結(jié)果的精度進行了驗證。同時針對程序的運行時間,評估了改進后的算法的性能。實驗結(jié)果表明,修改后的協(xié)同過濾算法,不僅提高了協(xié)同過濾算法對大規(guī)模數(shù)據(jù)的處理能力,同時通過對不同結(jié)果的混合,提高了算法的精度。與基于物品的協(xié)同過濾算法相比,算法的準(zhǔn)確率有明顯提升,且程序運行時間有明顯的下降;與基于用戶的協(xié)同過濾算法相比,算法的準(zhǔn)確率提升明顯,而通過分組的方式也降低了算法在計算相似度矩陣和計算結(jié)果所消耗的時間,效率有明顯提升。
[Abstract]:With the explosive growth of Internet information, the variety of information becomes complicated and the emergence of new e-commerce services makes the situation of information overload more and more serious. Therefore, the status of recommendation system has become more and more important in information filtering tools. In the practical system, collaborative filtering algorithm is the most popular personalized recommendation method. However, with the continuous expansion of the scale of recommendation system, most of the traditional recommendation algorithms will encounter serious computational bottlenecks, and a large number of data have not significantly improved the accuracy of the recommendation algorithm. Therefore, in order to cope with the growing data scale, the parallel transformation of collaborative filtering recommendation algorithm is very necessary. This paper studies the design and implementation of collaborative filtering recommendation algorithm based on MapReduce parallel computing framework. Firstly, the algorithm is parallelized by MapReduce framework, and then optimized for different algorithms. For the collaborative filtering algorithm based on articles, the co-occurrence matrix is used to replace the similarity matrix to reduce the time consumed in calculating the similarity matrix. When calculating the recommended results, Top-N is used to select the nearest neighbor for calculation. Reduce the computational complexity of the algorithm. For the user-based collaborative filtering algorithm, the data is grouped by clustering method. For the data of each packet, the user of the same packet is taken as the nearest neighbor to calculate the recommended value in the group, and all the central users are used as the nearest neighbor to calculate the recommended value between the groups. The three recommended results are taken as training data and the actual score is taken as the output result. The linear regression method is used to model the model. For this model, the optimal mixing ratio is obtained by gradient descent after the loss function is defined. Specifically, the data can be divided into several data groups by ten fold crossing, and different mixed parameters can be trained by different Top-N values and data grouping. Then we use this parameter to calculate the mean values of MAE and RMSE for all the data groups. By comparing the calculated mean value, the optimal mixing coefficient and Top-N value are selected. In the experiment, the three recommended results are mixed to produce the final recommendation results, and the accuracy of the recommended results is verified. At the same time, the performance of the improved algorithm is evaluated according to the running time of the program. Experimental results show that the modified collaborative filtering algorithm not only improves the ability of collaborative filtering algorithm to deal with large-scale data, but also improves the accuracy of the algorithm by mixing different results. Compared with the collaborative filtering algorithm based on articles, the accuracy of the algorithm is obviously improved, and the running time of the program is obviously decreased; compared with the collaborative filtering algorithm based on users, the accuracy of the algorithm is obviously improved. By grouping, the efficiency of the algorithm is greatly improved by reducing the time consumed in computing the similarity matrix and the results.
【學(xué)位授予單位】:長春工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.3
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 徐義峰;徐云青;劉曉平;;一種基于時間序列性的推薦算法[J];計算機系統(tǒng)應(yīng)用;2006年10期
2 余小鵬;;一種基于多層關(guān)聯(lián)規(guī)則的推薦算法研究[J];計算機應(yīng)用;2007年06期
3 張海玉;劉志都;楊彩;賈松浩;;基于頁面聚類的推薦算法的改進[J];計算機應(yīng)用與軟件;2008年09期
4 張立燕;;一種基于用戶事務(wù)模式的推薦算法[J];福建電腦;2009年03期
5 王晗;夏自謙;;基于蟻群算法和瀏覽路徑的推薦算法研究[J];中國科技信息;2009年07期
6 周珊丹;周興社;王海鵬;倪紅波;張桂英;苗強;;智能博物館環(huán)境下的個性化推薦算法[J];計算機工程與應(yīng)用;2010年19期
7 王文;;個性化推薦算法研究[J];電腦知識與技術(shù);2010年16期
8 張愷;秦亮曦;寧朝波;李文閣;;改進評價估計的混合推薦算法研究[J];微計算機信息;2010年36期
9 夏秀峰;代沁;叢麗暉;;用戶顯意識下的多重態(tài)度個性化推薦算法[J];計算機工程與應(yīng)用;2011年16期
10 楊博;趙鵬飛;;推薦算法綜述[J];山西大學(xué)學(xué)報(自然科學(xué)版);2011年03期
相關(guān)會議論文 前10條
1 王韜丞;羅喜軍;杜小勇;;基于層次的推薦:一種新的個性化推薦算法[A];第二十四屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集(技術(shù)報告篇)[C];2007年
2 唐燦;;基于模糊用戶心理模式的個性化推薦算法[A];2008年計算機應(yīng)用技術(shù)交流會論文集[C];2008年
3 秦國;杜小勇;;基于用戶層次信息的協(xié)同推薦算法[A];第二十一屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集(技術(shù)報告篇)[C];2004年
4 周玉妮;鄭會頌;;基于瀏覽路徑選擇的蟻群推薦算法:用于移動商務(wù)個性化推薦系統(tǒng)[A];社會經(jīng)濟發(fā)展轉(zhuǎn)型與系統(tǒng)工程——中國系統(tǒng)工程學(xué)會第17屆學(xué)術(shù)年會論文集[C];2012年
5 蘇日啟;胡皓;汪秉宏;;基于網(wǎng)絡(luò)的含時推薦算法[A];第五屆全國復(fù)雜網(wǎng)絡(luò)學(xué)術(shù)會議論文(摘要)匯集[C];2009年
6 梁莘q,
本文編號:1953702
本文鏈接:http://www.sikaile.net/jingjilunwen/dianzishangwulunwen/1953702.html