大規(guī)模多維網(wǎng)絡(luò)分析模型的研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-10-24 11:40
【摘要】:隨著信息技術(shù)的發(fā)展,存儲(chǔ)設(shè)備成本的降低,企業(yè)都根據(jù)自己的業(yè)務(wù)需求建立了大量的數(shù)據(jù)庫(kù)并存儲(chǔ)了海量的數(shù)據(jù)。如何利用這些數(shù)據(jù)為業(yè)務(wù)決策提供指引與建議是企業(yè)決策分析人員需要解決的一個(gè)難題。聯(lián)機(jī)分析處理(OLAP)被公認(rèn)為是一個(gè)有效的解決方案。OLAP能夠高效快速地對(duì)海量數(shù)據(jù)進(jìn)行多維度、跨粒度的分析并提供決策支持。經(jīng)過(guò)二十多年的研究與發(fā)展,OLAP技術(shù)已經(jīng)相對(duì)成熟規(guī)范,很多商用的數(shù)據(jù)庫(kù)以及數(shù)據(jù)倉(cāng)庫(kù)系統(tǒng)都有OLAP功能的實(shí)現(xiàn)。近些年來(lái),社交網(wǎng)絡(luò)、生物信息、多源信息融合等新興領(lǐng)域高速發(fā)展,在現(xiàn)實(shí)應(yīng)用中涌現(xiàn)出大量的多維異質(zhì)網(wǎng)絡(luò),網(wǎng)絡(luò)的規(guī)模也在不斷增大。傳統(tǒng)OLAP分析的數(shù)據(jù)是以事實(shí)表與維表組織的,事實(shí)之間沒(méi)有關(guān)聯(lián)。使用傳統(tǒng)的OLAP技術(shù)無(wú)法有效的對(duì)多維網(wǎng)絡(luò)進(jìn)行分析。面對(duì)這一問(wèn)題,Graph OLAP技術(shù)逐漸發(fā)展起來(lái),這一技術(shù)相比于傳統(tǒng)的OLAP技術(shù),改進(jìn)了信息模型,使用圖立方體代替數(shù)據(jù)立方體,支持網(wǎng)絡(luò)數(shù)據(jù)的多維多角度分析。但是Graph OLAP的研究目前仍還處于起步階段,模型分析能力有限,大多的模型不支持對(duì)多維異質(zhì)網(wǎng)絡(luò)以及海量數(shù)據(jù)進(jìn)行有效和高效的分析。本文針對(duì)現(xiàn)有Graph OLAP模型的不足,提出了新的分析模型,支持大規(guī)模多維異質(zhì)網(wǎng)絡(luò)的多維度分析,本文的主要研究?jī)?nèi)容如下:1.設(shè)計(jì)了新型的多維異質(zhì)網(wǎng)絡(luò)信息模型,定義了異質(zhì)網(wǎng)絡(luò)中的二元關(guān)系元路徑,n元關(guān)系元路徑,并對(duì)這些元路徑的關(guān)系進(jìn)行了研究,作為指導(dǎo)網(wǎng)絡(luò)聚集的新方式。2.設(shè)計(jì)了 TSMH Graph Cube,將傳統(tǒng)的圖立方體擴(kuò)展為實(shí)體超立方體和維度立方體這樣的兩階段立方體。在立方體模型的基礎(chǔ)上,賦予了傳統(tǒng)操作新的語(yǔ)義,并提出了更多的Graph OLAP操作,使得網(wǎng)絡(luò)分析更加多樣。3.對(duì)實(shí)體超立方體,本文提出了并行化的聚集算法并給出了物化策略。對(duì)維度立方體,本文對(duì)節(jié)點(diǎn)以及維度屬性進(jìn)行編碼,設(shè)計(jì)了節(jié)點(diǎn)的編碼算法,使得節(jié)點(diǎn)做維度OLAP操作時(shí)無(wú)需進(jìn)行實(shí)體表與維度表的連接操作,大大提高了維度OLAP操作的效率。4.為支持海量的數(shù)據(jù)規(guī)模,模型的Graph OLAP操作算法使用并行計(jì)算框架實(shí)現(xiàn)。通過(guò)對(duì)大規(guī)模真實(shí)以及模擬數(shù)據(jù)的實(shí)驗(yàn),驗(yàn)證了模型對(duì)大規(guī)模多維異質(zhì)網(wǎng)絡(luò)能夠進(jìn)行有效和高效的分析。
[Abstract]:With the development of information technology and the reduction of storage equipment cost, enterprises have established a large number of databases and stored huge amounts of data according to their own business requirements. How to use these data to provide guidance and advice for business decision making is a difficult problem that enterprise decision analysts need to solve. On-Line Analytical processing (OLAP) is recognized as an effective solution. OLAP can efficiently and quickly analyze large amounts of data in multiple dimensions, cross-granularity and provide decision support. After more than 20 years of research and development, OLAP technology has been relatively mature specification, many commercial databases and data warehouse systems have the implementation of OLAP function. In recent years, social networks, biological information, multi-source information fusion and other emerging areas of rapid development, in the practical application of a large number of multi-dimensional heterogeneous networks, network size is also increasing. The data of traditional OLAP analysis is organized by fact table and dimension table, and there is no correlation between facts. Using the traditional OLAP technology can not effectively analyze the multidimensional network. In the face of this problem, Graph OLAP technology is gradually developed. Compared with the traditional OLAP technology, this technology improves the information model, uses graph cube instead of data cube, and supports multi-dimensional and multi-angle analysis of network data. However, the research of Graph OLAP is still in its infancy, the ability of model analysis is limited, and most of the models do not support the analysis of multi-dimensional heterogeneous networks and massive data effectively and efficiently. In this paper, a new analysis model is proposed to support the multi-dimensional analysis of large-scale multi-dimensional heterogeneous networks. The main contents of this paper are as follows: 1. A new multi-dimensional heterogeneous network information model is designed, and the binary relational meta-path and n-element relational meta-path in heterogeneous network are defined, and the relationship of these meta-paths is studied as a new way to guide network aggregation. 2. TSMH Graph Cube, is designed to extend the traditional graph cubes to two-stage cubes such as solid hypercube and dimensional cube. Based on the cube model, new semantics of traditional operations are given, and more Graph OLAP operations are proposed, which makes network analysis more diverse. In this paper, we propose a parallel aggregation algorithm for solid hypercubes and present a materialization strategy. For dimension cube, this paper encodes nodes and dimension attributes, designs the coding algorithm of nodes, makes nodes do not need to join entity table and dimension table when they do dimension OLAP operation, and greatly improves the efficiency of dimension OLAP operation. 4. In order to support massive data scale, the Graph OLAP operation algorithm of the model is implemented by parallel computing framework. Experiments on large scale real and simulated data show that the model can effectively and efficiently analyze large scale multi-dimensional heterogeneous networks.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:TP311.13
[Abstract]:With the development of information technology and the reduction of storage equipment cost, enterprises have established a large number of databases and stored huge amounts of data according to their own business requirements. How to use these data to provide guidance and advice for business decision making is a difficult problem that enterprise decision analysts need to solve. On-Line Analytical processing (OLAP) is recognized as an effective solution. OLAP can efficiently and quickly analyze large amounts of data in multiple dimensions, cross-granularity and provide decision support. After more than 20 years of research and development, OLAP technology has been relatively mature specification, many commercial databases and data warehouse systems have the implementation of OLAP function. In recent years, social networks, biological information, multi-source information fusion and other emerging areas of rapid development, in the practical application of a large number of multi-dimensional heterogeneous networks, network size is also increasing. The data of traditional OLAP analysis is organized by fact table and dimension table, and there is no correlation between facts. Using the traditional OLAP technology can not effectively analyze the multidimensional network. In the face of this problem, Graph OLAP technology is gradually developed. Compared with the traditional OLAP technology, this technology improves the information model, uses graph cube instead of data cube, and supports multi-dimensional and multi-angle analysis of network data. However, the research of Graph OLAP is still in its infancy, the ability of model analysis is limited, and most of the models do not support the analysis of multi-dimensional heterogeneous networks and massive data effectively and efficiently. In this paper, a new analysis model is proposed to support the multi-dimensional analysis of large-scale multi-dimensional heterogeneous networks. The main contents of this paper are as follows: 1. A new multi-dimensional heterogeneous network information model is designed, and the binary relational meta-path and n-element relational meta-path in heterogeneous network are defined, and the relationship of these meta-paths is studied as a new way to guide network aggregation. 2. TSMH Graph Cube, is designed to extend the traditional graph cubes to two-stage cubes such as solid hypercube and dimensional cube. Based on the cube model, new semantics of traditional operations are given, and more Graph OLAP operations are proposed, which makes network analysis more diverse. In this paper, we propose a parallel aggregation algorithm for solid hypercubes and present a materialization strategy. For dimension cube, this paper encodes nodes and dimension attributes, designs the coding algorithm of nodes, makes nodes do not need to join entity table and dimension table when they do dimension OLAP operation, and greatly improves the efficiency of dimension OLAP operation. 4. In order to support massive data scale, the Graph OLAP operation algorithm of the model is implemented by parallel computing framework. Experiments on large scale real and simulated data show that the model can effectively and efficiently analyze large scale multi-dimensional heterogeneous networks.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 王會(huì)舉;覃雄派;王珊;張延松;李芙蓉;;面向大規(guī)模機(jī)群的可擴(kuò)展OLAP查詢技術(shù)[J];計(jì)算機(jī)學(xué)報(bào);2015年01期
2 陳湘濤;丁平尖;王晶;;異構(gòu)信息網(wǎng)中基于元路徑的動(dòng)態(tài)相似性搜索[J];計(jì)算機(jī)應(yīng)用;2014年09期
3 黃立威;李德毅;馬于濤;鄭思儀;張海粟;付鷹;;一種基于元路徑的異質(zhì)信息網(wǎng)絡(luò)鏈路預(yù)測(cè)模型[J];計(jì)算機(jī)學(xué)報(bào);2014年04期
4 古曉艷;王偉平;孟丹;楊秀峰;周江;;高效支持多維網(wǎng)絡(luò)OLAP的數(shù)據(jù)立方體模型CI-DCG[J];高技術(shù)通訊;2013年10期
5 孟小峰;慈祥;;大數(shù)據(jù)管理:概念、技術(shù)與挑戰(zhàn)[J];計(jì)算機(jī)研究與發(fā)展;2013年01期
6 王珊;王會(huì)舉;覃雄派;周p,
本文編號(hào):2291299
本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2291299.html
最近更新
教材專(zhuān)著