基于SAP HANA數(shù)據(jù)庫的推薦方法研究
發(fā)布時間:2018-06-09 22:03
本文選題:SAP + HANA。 參考:《北京林業(yè)大學》2016年碩士論文
【摘要】:隨著電子商務在互聯(lián)網(wǎng)時代長達二十年的發(fā)展,電子商務的學術研究也一直在進步,針對消費者行為的研究也越來越多?焖偬幚泶罅繑(shù)據(jù)和進行實時分析的能力,將決定公司能否快速響應市場變化,從而獲得優(yōu)勢。在這樣的背景下,提升分析速度顯得更為急迫,SAP HANA(SAP High-Performance Analytic Appliance)由此而生,它具有實時分析、存儲和處理大數(shù)據(jù)的能力,并充分發(fā)揮其商業(yè)數(shù)據(jù)的價值,幫助企業(yè)抓住機遇,進行實時決策。本研究以HANA數(shù)據(jù)庫以及其上安裝的相應組件為基礎,利用大數(shù)據(jù)競賽平臺kaggle網(wǎng)站中,日本領導團購網(wǎng)站Ponpare在該網(wǎng)站提供的一年交易信息,進行預測分析研究。本論文進行的研究工作主要如下:1.完成本文中系統(tǒng)整體架構的設計,保證在HANA中實現(xiàn)整體功能的順利運行。主要包括數(shù)據(jù)抽取層,數(shù)據(jù)倉庫層,數(shù)據(jù)處理和分析層。本文中數(shù)據(jù)最開始儲存在Oracle數(shù)據(jù)庫中作為數(shù)據(jù)源,E1M(企業(yè)信息管理)作為抽數(shù)工具將數(shù)據(jù)抽取到HANA中,PAL和基于HANA的R語言作為算法實現(xiàn)工具完成數(shù)據(jù)的預處理和分析。數(shù)據(jù)在幾個組件中可實現(xiàn)無障礙的流通,滿足系統(tǒng)的連貫性。2.利用HANA PAL(預測分析庫)與AFM結合的工具來實現(xiàn)數(shù)據(jù)融合、缺失值填補以及數(shù)值歸一化的操作,從而得到可以用于研究的數(shù)據(jù)。在數(shù)據(jù)挖掘之前,針對客戶的瀏覽購物信息和個人信息,以及優(yōu)惠券的原始信息進行介紹分析,對網(wǎng)站提供的初始數(shù)據(jù)進行數(shù)據(jù)預處理,以提高數(shù)據(jù)挖掘效率,降低挖掘所需要的時間。3.在HANA數(shù)據(jù)庫的環(huán)境中,采用基于HANA的R語言環(huán)境,完成推薦系統(tǒng)算法的實現(xiàn)。首先,利用cbind函數(shù)把向量和矩陣拼成一個新矩陣;其次,對屬性賦予不同的權重;最后,計算用戶屬性與優(yōu)惠券間的cosine相似度并進行排序,得到客戶最有可能購買的10個優(yōu)惠券ID。通過對比用戶實際購買產(chǎn)品與推薦產(chǎn)品的類型及所在區(qū)域,得到推薦結果的正確率。本文將最近流行的數(shù)據(jù)挖掘與SAP近幾年新推出的數(shù)據(jù)庫HANA相結合。通過最新組件EIM、PAL完成數(shù)據(jù)的遷移、數(shù)據(jù)預處理以及數(shù)據(jù)預測分析。
[Abstract]:With the development of e-commerce in the Internet age for twenty years, the academic research of electronic commerce has also been progressing, and more and more research on consumer behavior. The ability to quickly deal with large amounts of data and carry out real-time analysis will determine whether the company can respond quickly to market changes and gain advantages. In this context, The speed of the rise analysis is more urgent, and SAP HANA (SAP High-Performance Analytic Appliance) is born. It has the ability to analyze, store and process large data in real time, and give full play to the value of its commercial data, help the enterprise to seize the opportunity to make real time decision. This research is based on the HANA database and the corresponding components installed on it. Based on the kaggle website of the big data competition platform, the Japanese leader group purchase website Ponpare provides the one year transaction information provided by the website for the prediction analysis. The main research work of this paper is as follows: 1. complete the design of the overall system architecture in this paper, and ensure the smooth operation of the whole function in the HANA. Data extraction layer, data warehouse layer, data processing and analysis layer. In this paper, data is first stored in Oracle database as data source. E1M (Enterprise Information Management) is used as a pumping tool to extract data into HANA. PAL and HANA based R language are used as algorithm implementation tools to complete data preprocessing and analysis. Data are in several components. .2. can achieve data fusion, missing value filling and numerical normalization, which can be used to achieve data fusion, missing value filling and numerical normalization, so as to get data that can be used for research. Before data mining, the customers' browsing and shopping information and personal information, as well as preferential treatment for customers, and preferential treatment. The original information of the voucher is introduced and analyzed. The initial data provided by the website is preprocessed to improve the efficiency of the data mining and reduce the time needed by the mining. In the environment of the HANA database, the HANA based R language environment is used to complete the implementation of the recommendation system algorithm. First, the vector and matrix of the cbind function are spelled together with the cbind function. A new matrix is given; secondly, the attributes are given different weights; finally, the cosine similarity between the user attributes and coupons is calculated and the 10 coupon ID. is most likely to be purchased by the customer to get the correct rate of the recommended results by comparing the types and areas where the user actually buys the product and the recommended product. Popular data mining is combined with the new SAP database HANA in recent years. Through the latest component EIM, PAL has completed the migration of data, data preprocessing and data prediction analysis.
【學位授予單位】:北京林業(yè)大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP311.13;TP391.3
【相似文獻】
相關期刊論文 前1條
1 ;SUSE助力SAP HANA實現(xiàn)高可用性[J];辦公自動化;2014年13期
相關碩士學位論文 前1條
1 黃佳琪;基于SAP HANA數(shù)據(jù)庫的推薦方法研究[D];北京林業(yè)大學;2016年
,本文編號:2000978
本文鏈接:http://www.sikaile.net/jingjilunwen/dianzishangwulunwen/2000978.html
最近更新
教材專著