基于Hadoop的聯(lián)機分析處理系統(tǒng)關(guān)鍵技術(shù)研究
發(fā)布時間:2018-05-18 23:15
本文選題:聯(lián)機分析處理 + HOLAP ; 參考:《電子科技大學》2016年碩士論文
【摘要】:近年來,多維數(shù)據(jù)查詢聯(lián)機分析處理技術(shù)(Online Analytical Processing,OLAP)越來越重要。基于OLAP的多維分析技術(shù)成為企業(yè)管理人員重要的決策依據(jù)。目前,針對OLAP的研究都是面向單一數(shù)據(jù)模型的存儲處理和相應OLAP查詢性能上的優(yōu)化。單一數(shù)據(jù)組織模式的基于關(guān)系數(shù)據(jù)庫的ROLAP(Relational-OLAP)和基于多維數(shù)據(jù)庫的MOLAP(Multidimensional-OLAP),無法滿足在不同規(guī)模級別數(shù)據(jù)集下異構(gòu)數(shù)據(jù)模型和低延遲的多維查詢需求。針對以上問題,本文從不同數(shù)據(jù)組織模型的查詢規(guī)劃、查詢解釋、緩存查詢優(yōu)化機制等方面改進,設計和實現(xiàn)了一個可擴展性和高效性的分布式混合型聯(lián)機分析處理(Hybrid-OLAP,HOLAP)系統(tǒng)。該系統(tǒng)旨在解決不同規(guī)模級別數(shù)據(jù)集的多維查詢,根據(jù)不同多維組織的實現(xiàn)模式作出高效合理的查詢處理;谠撓到y(tǒng)下的研究主要包括以下四個方面的內(nèi)容:第一,針對傳統(tǒng)ROLAP系統(tǒng)無法高效地解決大規(guī)模數(shù)據(jù)集的多維分析問題,提出了一個能夠在Hadoop環(huán)境下,滿足不同規(guī)模級別數(shù)據(jù)集進行快速多維查詢分析,同時支持Hive的MDX(Multidimensional Expressions)查詢解釋和聚集方法,以及基于Hbase預計算緩存機制的多維查詢優(yōu)化方法的HOLAP系統(tǒng)架構(gòu)。第二,針對大規(guī)模數(shù)據(jù)集上的Hive多維查詢優(yōu)化,通過一種構(gòu)建Hbase立方體緩存的分段逐層降維聚集算法(S-Redu-D-A),研究了從類似關(guān)系型數(shù)據(jù)庫Hive到Nosql數(shù)據(jù)庫中,Hbase數(shù)據(jù)模型的形式化方法(Hsql-To-Nosql Formalized Model,Hs-Nos-FM)。提出并驗證了滿足HOLAP高效地形式化多維立方體(Format Multi Cube,F-M-Cube)數(shù)據(jù)存儲模型,在大規(guī)模數(shù)據(jù)集多維查詢上表現(xiàn)出良好的性能。第三,針對兩種查詢計劃,通過實時性要求、數(shù)據(jù)規(guī)模、維度基數(shù)、存儲空間、多表連接、查詢頻率等指標進行查詢規(guī)劃計算分析;提出了包含權(quán)限控制、查詢監(jiān)聽、查詢分析和查詢分配的查詢規(guī)劃工作流程。通過對不同規(guī)模數(shù)據(jù)、不同多維查詢的執(zhí)行時間對比分析,有效地驗證了基于HOLAP系統(tǒng)架構(gòu)的查詢規(guī)劃方法,在常見OLAP多維查詢中表現(xiàn)出良好的性能。最后,本文通過HOLAP系統(tǒng)架構(gòu)下的查詢規(guī)劃方法、查詢解釋機制、形式化多維立方體構(gòu)建方法、聚集緩存機制、支持Hive的MDX查詢,并嵌入形式化方法的構(gòu)建算法進行詳細設計和實現(xiàn)。經(jīng)過測試,本系統(tǒng)具有良好的性能,達到了預期的設計目標。
[Abstract]:In recent years, online Analytical processing technology (OLAP) is becoming more and more important. Multidimensional analysis technology based on OLAP has become an important decision basis for enterprise managers. At present, the research of OLAP is focused on the storage and processing of single data model and the optimization of OLAP query performance. The single data organization model based on relational database relation al-OLAP) and the multidimensional database based model Multidimensional-OLAPP can not meet the requirements of heterogeneous data model and low latency multidimensional query under different scale data sets. Aiming at the above problems, this paper improves the query planning, query interpretation and cache query optimization mechanism of different data organization models, and designs and implements a distributed hybrid on-line analytical processing system named hybrid-OLAPHLAPP. The purpose of the system is to solve the multi-dimensional query of data sets of different scales and to make efficient and reasonable query processing according to the implementation mode of different multidimensional organizations. The research based on this system mainly includes the following four aspects: first, aiming at the traditional ROLAP system can not solve the multidimensional analysis problem of large-scale data sets efficiently, a new method is proposed, which can be used in the Hadoop environment. At the same time, it supports the MDX(Multidimensional expressions of Hive query interpretation and aggregation method, and the HOLAP system architecture based on the Hbase prediction cache mechanism of multidimensional query optimization method. Second, for Hive multidimensional query optimization on large data sets, In this paper, we study the formal method of Hbase data model from similar relational database (Hive) to Nosql database (Nosql) through a piecewise hierarchical dimensionality reduction aggregation algorithm (S-Redu-D-An), which is used to construct Hbase cube cache. The formal method is Hsql-To-Nosql Formalized Model-Hs-Nos-FMN. This paper presents and verifies the efficient formative data storage model of multi-dimensional cube format Multi F-M-Cubesatisfying HOLAP, and shows good performance on multidimensional query of large data sets. Third, for two query plans, through real-time requirements, data size, dimensional cardinality, storage space, multi-table join, query frequency and other indicators for query planning and calculation analysis; proposed including authority control, query monitoring, Query analysis and query allocation of query planning workflow. By comparing and analyzing the execution time of different scale data and multidimensional query, the query planning method based on HOLAP system architecture is validated effectively, and it shows good performance in common OLAP multidimensional query. Finally, through the query planning method, query interpretation mechanism, formalization of multidimensional cube construction method, gathering cache mechanism, this paper supports MDX query of Hive. And embed formal method to build the algorithm for detailed design and implementation. After testing, the system has good performance and achieves the expected design goal.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP311.13
【參考文獻】
相關(guān)期刊論文 前6條
1 熊寧;;大數(shù)據(jù)分析的分布式MOLAP技術(shù)[J];信息技術(shù)與信息化;2015年02期
2 王會舉;覃雄派;王珊;張延松;李芙蓉;;面向大規(guī)模機群的可擴展OLAP查詢技術(shù)[J];計算機學報;2015年01期
3 郭朝鵬;王智;韓峰;張一川;宋杰;;HaoLap:基于Hadoop的海量數(shù)據(jù)OLAP系統(tǒng)[J];計算機研究與發(fā)展;2013年S1期
4 張延松;焦敏;王占偉;王珊;周p,
本文編號:1907639
本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/1907639.html
最近更新
教材專著