點擊流數(shù)據(jù)倉庫在電子商務中的研究與應用

發(fā)布時間：2018-11-18 13:25

【摘要】：隨著數(shù)據(jù)庫技術的發(fā)展，企業(yè)的辦公效率大大地提高了。數(shù)據(jù)庫的廣泛應用，，使得企業(yè)存儲的業(yè)務數(shù)據(jù)急劇的增長。存儲于企業(yè)之中的大量數(shù)據(jù)無法轉化成有效的信息，導致了“數(shù)據(jù)豐富，信息貧乏”的局面，這種情況使得企業(yè)對于數(shù)據(jù)庫的投資無法轉化成收益。數(shù)據(jù)倉庫可以存儲大量的歷史數(shù)據(jù)，它的出現(xiàn)很好地解決了這個問題。傳統(tǒng)的數(shù)據(jù)倉庫只是從各類業(yè)務數(shù)據(jù)庫之中加載數(shù)據(jù)，隨著Internet的發(fā)展，Web數(shù)據(jù)日益成為人們所關注的重要數(shù)據(jù)來源。在這些數(shù)據(jù)中，Web日志是十分重要的一種行為數(shù)據(jù)，它可以幫助決策者理解用戶習慣，進而做出有針對性的部署。本文就是在這樣一種背景下，構建了點擊流數(shù)據(jù)倉庫、實施了基于隱式關聯(lián)頁面的用戶聚類算法，并描述了用戶聚類算法如何在電子商務中的應用。本文所構建的點擊流數(shù)據(jù)倉庫以電子商務環(huán)境為應用背景，以Web日志為重要數(shù)據(jù)源。數(shù)據(jù)倉庫設計采用了Inmon所倡導的數(shù)據(jù)倉庫+從屬數(shù)據(jù)集市的構架，數(shù)據(jù)倉庫采用關系模型構建，維度數(shù)據(jù)集市采用維度模型構建。數(shù)據(jù)倉庫作為企業(yè)管理人員做出決策的數(shù)據(jù)基礎，它以第三范式的形式存儲了大量的、低粒度的業(yè)務歷史數(shù)據(jù)。從屬數(shù)據(jù)集市基于用戶的需求而構造。采用數(shù)據(jù)倉庫+從屬數(shù)據(jù)集市架構可以很好的平衡訪問效率和結構調整的靈活性。在所構建的點擊流數(shù)據(jù)倉庫的基礎上，本文給出了一種基于向量的點擊流用戶聚類算法。算法將用戶的點擊流數(shù)據(jù)映射為向量數(shù)據(jù)，根據(jù)向量之間夾角的大小程度來判斷用戶之間的相似程度。論文將隱式關聯(lián)頁面挖掘算法所得到的關聯(lián)頁面組作為向量的維度。隱式關聯(lián)頁面可以很好地反映用戶的訪問習慣，更好的突出感興趣的主題性。論文所屬算法在所構建的實驗性數(shù)據(jù)倉庫上進行了驗證。實驗表明，算法能夠有效地識別用戶目標頁面，發(fā)現(xiàn)兩項以上的隱式關聯(lián)頁面。用戶聚類亦可以更好地適應復雜的互聯(lián)網(wǎng)環(huán)境。
[Abstract]:With the development of database technology, the office efficiency of enterprises has been greatly improved. With the wide application of database, the business data stored by enterprises increase rapidly. The large amount of data stored in the enterprise can not be converted into effective information, which leads to the situation of "rich data, poor information", which makes the enterprise's investment in the database can not be converted into income. Data warehouse can store a lot of historical data, and it solves this problem well. Traditional data warehouse only loads data from all kinds of business databases. With the development of Internet, Web data is becoming an important data source that people pay more and more attention to. Among these data, Web logging is a very important behavior data, it can help decision makers understand user habits, and then make targeted deployment. In this paper, we construct the click-stream data warehouse, implement the user clustering algorithm based on implicit association pages, and describe the application of user clustering algorithm in e-commerce. The click-stream data warehouse constructed in this paper is based on electronic commerce environment and Web log as important data source. The design of data warehouse adopts the framework of data warehouse subordinate data Mart advocated by Inmon. The data warehouse is constructed by relational model and dimension data Mart is constructed by dimension model. As a data base for enterprise managers to make decisions, data Warehouse stores a large amount of low granularity business history data in the form of the third normal form. Dependent data marts are constructed based on user needs. Using data warehouse subordinate data Mart architecture can balance access efficiency and flexibility of structure adjustment. Based on the click-stream data warehouse, a vector-based click-stream user clustering algorithm is presented in this paper. The algorithm maps the user's click-stream data to vector data and judges the similarity between users according to the magnitude of the angle between vectors. In this paper, the association page group obtained by the implicit association page mining algorithm is regarded as the dimension of the vector. Implicit association pages can well reflect the user's visiting habits and better highlight the theme of interest. The algorithm is verified on the experimental data warehouse. Experiments show that the algorithm can effectively identify user target pages and find more than two implicit association pages. User clustering can also better adapt to the complex Internet environment.
【學位授予單位】：遼寧工業(yè)大學
【學位級別】：碩士
【學位授予年份】：2014
【分類號】：TP311.13

【參考文獻】

相關期刊論文前10條

1 郭曉淳;馬冬梅;;點擊流數(shù)據(jù)倉庫中基于事件驅動的星型ER模型[J];信息技術;2012年06期

2 褚紅丹;焦素云;馬威;;用戶訪問興趣路徑挖掘方法[J];計算機工程與應用;2008年35期

3 林文龍;劉業(yè)政;余智學;;用頁組拓撲平均距離改善頁面聚類算法[J];計算機科學;2008年10期

4 劉嘉;祁奇;陳振宇;惠成峰;;ESSK:一種計算點擊流相似度的新方法[J];計算機科學;2012年06期

5 馬超;沈微;;基于閉合有間隔頻繁子序列的點擊流聚類[J];計算機工程;2010年23期

6 周勇,鮑鈺;互聯(lián)網(wǎng)目標頁面間隱式關聯(lián)規(guī)則的發(fā)現(xiàn)[J];計算機應用;2004年08期

7 黎客來;湯震;;點擊流數(shù)據(jù)倉庫系統(tǒng)應用研究[J];計算機與現(xiàn)代化;2008年02期

8 楊怡玲,管旭東,尤晉元;基于頁面內容和站點結構的頁面聚類挖掘算法[J];軟件學報;2002年03期

9 李曉明;夏秀峰;張斌;;一種具有增量挖掘功能的Web點擊流聚類算法[J];沈陽大學學報;2010年03期

10 曾陳萍;;點擊流數(shù)據(jù)倉庫的維度建模設計與實現(xiàn)[J];統(tǒng)計與決策;2008年08期

相關博士學位論文前1條

1 鮑鈺;WEB日志挖掘及其應用研究[D];華東師范大學;2010年

本文編號：2340145

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/jingjilunwen/dianzishangwulunwen/2340145.html

上一篇：關于邀請出席“2017年中國國際貿易學會年會暨國際貿易發(fā)展論壇”的通知
下一篇：情報學的創(chuàng)新與發(fā)展——第五屆全國情報學博士生論壇會議綜述

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

點擊流數(shù)據(jù)倉庫在電子商務中的研究與應用