基于Spark的混合推薦系統(tǒng)
發(fā)布時間:2018-03-31 20:42
本文選題:混合推薦 切入點:Spark 出處:《中國科學技術大學》2017年碩士論文
【摘要】:隨著信息技術的快速發(fā)展,信息過載已經(jīng)成為互聯(lián)網(wǎng)領域面臨的重要挑戰(zhàn)。為了緩解互聯(lián)網(wǎng)用戶與海量數(shù)據(jù)間日益加劇的矛盾,研究人員提出了推薦系統(tǒng)的概念。作為推薦系統(tǒng)的一個重要分支,混合推薦系統(tǒng)通過組合多種推薦算法提高系統(tǒng)性能,目前廣泛應用于電子商務、社交網(wǎng)絡和視頻網(wǎng)站等領域。然而,用戶量與數(shù)據(jù)量的急速增長對混合推薦系統(tǒng)的性能提出了更高的要求。例如,視頻網(wǎng)站要求混合推薦系統(tǒng)為用戶精準推薦各類視頻,并根據(jù)用戶行為的變化訓練新的模型,及時更新推薦結果。由于數(shù)據(jù)量的增加,開發(fā)人員難以利用經(jīng)驗確定各推薦算法對最終結果的影響程度。因此,粗粒度權重計算方法影響混合推薦系統(tǒng)的精度,增加開發(fā)難度。此外,由于系統(tǒng)基于大規(guī)模數(shù)據(jù)訓練特征模型,訓練過程包含大量迭代計算,使得訓練一次模型的時間為一天甚至幾天,難以滿足用戶對推薦系統(tǒng)效率的需求。本文通過分析不同的數(shù)據(jù)集、推薦算法以及權重計算方法的特點,引入適用于迭代計算的通用大規(guī)模數(shù)據(jù)處理平臺Spark,設計并實現(xiàn)了基于Spark的混合推薦系統(tǒng),以提高推薦系統(tǒng)的精度、多樣性和效率。本文的主要工作及創(chuàng)新點如下:1.首先,本文提出一種細粒度權重計算方法,將各推薦算法的權值擴展為權重向量。該方法提高了評分預測推薦的精度,并有效緩解數(shù)據(jù)稀疏帶來的冷啟動問題:2.其次,本文基于大規(guī)模數(shù)據(jù)處理框架Spark,以細粒度權重計算方法為核心,設計實現(xiàn)細粒度權重混合子系統(tǒng)。該子系統(tǒng)基于分布式計算框架Spark降低模型訓練時間,并利用細粒度權重計算方法提高推薦精度。實驗結果表明,細粒度權重混合推薦比單一推薦算法的精度提高5%~30%,比粗粒度權重混合推薦的精度提高1.5%~3%。同時,該系統(tǒng)的模型訓練速度比單機推薦系統(tǒng)提高了 90%,比基于Hadoop框架的推薦系統(tǒng)的訓練時間提高了 2倍左右;3.最后,本文設計實現(xiàn)基于Spark的交叉調(diào)和推薦系統(tǒng)。該系統(tǒng)以細粒度權重混合子系統(tǒng)為核心,引入基于內(nèi)容的推薦算法,實現(xiàn)了一個高精度、高效率、多樣性和可擴展的混合推薦系統(tǒng)。
[Abstract]:With the rapid development of information technology, information overload has become an important challenge in the field of Internet. Researchers put forward the concept of recommendation system. As an important branch of recommendation system, hybrid recommendation system improves system performance by combining multiple recommendation algorithms, and is widely used in electronic commerce. However, the rapid growth in the number of users and the amount of data put higher demands on the performance of hybrid recommendation systems. For example, video sites require hybrid recommendation systems to recommend all kinds of videos to users accurately. According to the change of user behavior, the new model is trained to update the recommended results in time. Because of the increase of data volume, it is difficult for developers to use experience to determine the impact of each recommendation algorithm on the final result. The coarse-grained weight calculation method affects the precision of the hybrid recommendation system and makes it more difficult to develop. In addition, because the system is based on the large-scale data training feature model, the training process includes a large number of iterative calculations. This paper analyzes the characteristics of different data sets, recommendation algorithms and weight calculation methods, because the training time of a model is one day or even a few days, and it is difficult to meet the needs of users for the efficiency of recommendation system. A universal large-scale data processing platform, Spark, which is suitable for iterative computation, is introduced, and a hybrid recommendation system based on Spark is designed and implemented in order to improve the accuracy, diversity and efficiency of the recommendation system. The main work and innovations of this paper are as follows: 1. In this paper, a fine-grained weight calculation method is proposed, in which the weight of each recommendation algorithm is extended to a weight vector. This method improves the accuracy of prediction recommendation and effectively alleviates the cold start problem: 2. 2, which is caused by sparse data. Based on Spark-based large-scale data processing framework, a hybrid fine-grained weight subsystem is designed and implemented with fine-grained weight calculation method as the core. The subsystem is based on the distributed computing framework Spark to reduce the training time of the model. The experimental results show that the precision of the hybrid recommendation is increased by 5% than that of the single recommendation algorithm, and the accuracy of the mixed recommendation is 1.5% higher than that of the coarse-grained weight. The model training speed of the system is 90 times faster than that of the single machine recommendation system, and the training time of the recommendation system based on the Hadoop framework is about 2 times higher than that of the single machine recommendation system. In this paper, we design and implement a hybrid recommendation system based on Spark, which is based on the fine-grained weight hybrid subsystem, and introduces the content-based recommendation algorithm to realize a hybrid recommendation system with high precision, high efficiency, diversity and expansibility.
【學位授予單位】:中國科學技術大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.3
【參考文獻】
相關期刊論文 前2條
1 化柏林;Google搜索引擎技術實現(xiàn)探究[J];現(xiàn)代圖書情報技術;2004年S1期
2 陳笑輝,范曉虹;Yahoo的分類體系結構及原理探微[J];圖書情報工作;1999年09期
相關碩士學位論文 前6條
1 葉敬寧;引入策略偏好的個性化推薦技術研究[D];東南大學;2016年
2 王峰;基于新浪微博輿情采集與傾向性分析系統(tǒng)[D];南京信息工程大學;2016年
3 宋光曉;基于Mahout、Hadoop的推薦系統(tǒng)研究與實現(xiàn)[D];長江大學;2016年
4 聶帥華;基于內(nèi)容推薦/協(xié)同過濾推薦算法的智能交友網(wǎng)站的設計&實現(xiàn)[D];華中師范大學;2015年
5 楊卓犖;數(shù)據(jù)倉庫分布式列存儲技術研究與實現(xiàn)[D];昆明理工大學;2012年
6 王麗莎;基于隨機游走模型的個性化信息推薦[D];大連理工大學;2011年
,本文編號:1692361
本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/1692361.html
最近更新
教材專著