基于Hadoop的數(shù)據(jù)挖掘在電商環(huán)境的研究與應(yīng)用
本文選題:數(shù)據(jù)挖掘 切入點(diǎn):關(guān)聯(lián)規(guī)則算法 出處:《湖南大學(xué)》2016年碩士論文
【摘要】:隨著便攜式網(wǎng)絡(luò)接入設(shè)備的飛速發(fā)展以及互聯(lián)網(wǎng)技術(shù)的迭代更新,使得網(wǎng)絡(luò)生態(tài)系統(tǒng)逐漸壯大、活躍,這也使得依托于互聯(lián)網(wǎng)技術(shù)的電子商務(wù)發(fā)展迅速。相較于傳統(tǒng)線下的購物方式,線上電子商務(wù)無疑是一種快捷、高效和便利的購物方式。近年來井噴的電商購物平臺也很好的印證了這一點(diǎn)。對于電子商務(wù)平臺的運(yùn)營者來說,如何鞏固現(xiàn)有客戶、拓展?jié)撛诳蛻羰侵刂兄;诨ヂ?lián)網(wǎng)時(shí)代快速、海量數(shù)據(jù)的特點(diǎn),本文設(shè)計(jì)將數(shù)據(jù)挖掘技術(shù)應(yīng)用于電商平臺數(shù)據(jù),一方面,深度發(fā)掘現(xiàn)有客戶的瀏覽、購物習(xí)慣,鞏固現(xiàn)有用戶;另一方面,分析潛在用戶行為,獲取其興趣點(diǎn),進(jìn)行定向推送,拓展更多的客戶;陔娚唐脚_用戶購物數(shù)據(jù)之間存在較強(qiáng)的關(guān)聯(lián)性,本文設(shè)計(jì)采用關(guān)聯(lián)規(guī)則算法進(jìn)行數(shù)據(jù)挖掘與分析,達(dá)到鞏固現(xiàn)有用戶,發(fā)掘新用戶的目的。數(shù)據(jù)挖掘的過程就是發(fā)現(xiàn)隱藏在各種尚沒有處理的原始數(shù)據(jù)集合中的各種相關(guān)聯(lián)系,并從這些聯(lián)系中提取知識的過程。數(shù)據(jù)挖掘是多種計(jì)算機(jī)相關(guān)學(xué)科相結(jié)合的產(chǎn)物,其包含了數(shù)據(jù)庫技術(shù)、計(jì)算機(jī)機(jī)器自主學(xué)習(xí)、數(shù)據(jù)統(tǒng)計(jì)分析、行為模式識別、人工神經(jīng)網(wǎng)絡(luò)等等學(xué)科。由于其具有很高的商業(yè)使用價(jià)值,同時(shí)適合應(yīng)用的范圍極為廣泛,所以目前數(shù)據(jù)挖掘的相關(guān)研究已成為研究的重點(diǎn)之一。本文以現(xiàn)今互聯(lián)網(wǎng)、大數(shù)據(jù)時(shí)代下的電商平臺為切入點(diǎn),對電商平臺現(xiàn)狀進(jìn)行分析,得出其弊端,即無法應(yīng)對大數(shù)據(jù)時(shí)代海量無序數(shù)據(jù)的沖擊,容易使平臺積累無效數(shù)據(jù),造成資源使用率低下,平臺電商有效轉(zhuǎn)化率低。其次,作者對某知名電商平臺的服飾賣家以及家電賣家進(jìn)行了匿名訪談,得出了服裝買家購買物品具有較高關(guān)聯(lián)度的結(jié)論。技術(shù)上,本文基于數(shù)據(jù)挖掘技術(shù)提出了一套基于Aprior i的關(guān)聯(lián)規(guī)則算法,并利用Hadoop數(shù)據(jù)庫集群進(jìn)行數(shù)據(jù)處理,相較于傳統(tǒng)的關(guān)系型數(shù)據(jù)庫,Hadoop集群能同時(shí)對數(shù)據(jù)進(jìn)行處理,大大提高算法工作效率。本文還基于Angular JS、Bootstrap以及Html搭建了一套前端數(shù)據(jù)可視化系統(tǒng)。
[Abstract]:With the rapid development of portable network access devices and the iterative updating of Internet technology, the network ecosystem is gradually expanding and active. This also makes e-commerce based on Internet technology develop rapidly. Compared with traditional offline shopping, online e-commerce is undoubtedly a kind of fast. Efficient and convenient shopping methods. In recent years, the blowout e-commerce shopping platform is also very good proof of this. For e-commerce platform operators, how to consolidate existing customers, Expanding potential customers is the most important thing. Based on the characteristics of fast and massive data in the Internet era, this paper designs and applies data mining technology to e-commerce platform data. On the one hand, it deeply excavates the browsing and shopping habits of existing customers. Consolidation of existing users; on the other hand, analysis of potential user behavior, access to their points of interest, directed push, expand the number of customers. Based on e-commerce platform, there is a strong correlation between user shopping data, In this paper, the association rule algorithm is used for data mining and analysis to consolidate existing users and discover new users. The process of data mining is to discover all kinds of related connections hidden in all kinds of raw data sets that have not yet been processed. The process of extracting knowledge from these links. Data mining is a combination of many computer related disciplines, including database technology, computer machine autonomous learning, data statistical analysis, behavior pattern recognition, Artificial neural network and other disciplines. Because of its high commercial value, and suitable for a wide range of applications, the current data mining related research has become one of the focus of research. Based on the analysis of the current situation of the e-commerce platform in big data's time, the author finds out its disadvantages, that is, it can not cope with the impact of the massive disordered data in the era of big data, which easily makes the platform accumulate invalid data, resulting in the low utilization rate of resources. Secondly, the author conducted anonymous interviews with clothing sellers and home appliance sellers of a well-known e-commerce platform, and drew the conclusion that clothing buyers have a high degree of correlation. In this paper, a set of association rules algorithm based on Aprior I is proposed based on data mining technology, and the data is processed by using Hadoop database cluster. Compared with the traditional relational database cluster, it can process the data at the same time. This paper also builds a front-end data visualization system based on Angular JS bootstrap and Html.
【學(xué)位授予單位】:湖南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:F724.6;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 何建華;;大數(shù)據(jù)對企業(yè)戰(zhàn)略決策的影響分析[J];當(dāng)代經(jīng)濟(jì)管理;2014年10期
2 王裕;;基于云平臺的大數(shù)據(jù)處理流程的關(guān)鍵技術(shù)研究[J];信息技術(shù);2014年09期
3 陶雪嬌;胡曉峰;劉洋;;大數(shù)據(jù)研究綜述[J];系統(tǒng)仿真學(xué)報(bào);2013年S1期
4 程瑩;張?jiān)朴?徐雷;房秉毅;;基于Hadoop及關(guān)系型數(shù)據(jù)庫的海量數(shù)據(jù)分析研究[J];電信科學(xué);2010年11期
5 舒正渝;;淺談數(shù)據(jù)挖掘技術(shù)及其應(yīng)用[J];中國西部科技;2010年05期
6 石軍;;“感知中國”促進(jìn)中國物聯(lián)網(wǎng)加速發(fā)展[J];通信管理與技術(shù);2009年05期
7 胡天濡;;淺談數(shù)據(jù)挖掘與知識發(fā)現(xiàn)發(fā)展[J];科教文匯(上旬刊);2009年10期
8 鄒艷;;歐洲物流業(yè)發(fā)展趨勢分析[J];商場現(xiàn)代化;2009年05期
9 洪光英;;數(shù)據(jù)挖掘與商業(yè)決策[J];中國科技信息;2009年03期
10 胡冰;胡東軍;馬文超;;文本挖掘研究及發(fā)展[J];電腦知識與技術(shù);2008年31期
,本文編號:1661198
本文鏈接:http://www.sikaile.net/jingjilunwen/dianzishangwulunwen/1661198.html