基于云計算的海量時空數(shù)據(jù)存儲及挖掘方法的研究和應(yīng)用
發(fā)布時間:2018-06-02 13:14
本文選題:數(shù)據(jù)挖掘 + 云計算; 參考:《杭州電子科技大學(xué)》2014年碩士論文
【摘要】:近年來,越來越多的應(yīng)用程序收集和存儲大量時空數(shù)據(jù)在分布式數(shù)據(jù)庫中,使得時空數(shù)據(jù)挖掘的需求不斷增加。在公安交通管理領(lǐng)域,由于交通流數(shù)據(jù)急劇增加,加上其數(shù)據(jù)具有顯著的時空特性,,使得在處理海量的時空數(shù)據(jù)上面臨著嚴(yán)重的挑戰(zhàn)。針對日益增長的海量數(shù)據(jù)分析,傳統(tǒng)的處理方法在存儲空間和計算效率上已不能滿足用戶需求,需要有支持海量數(shù)據(jù)存儲和分析的平臺來適應(yīng)新的需求。 時空異常探測是時空數(shù)據(jù)挖掘領(lǐng)域中一個重要分支。本文針對傳統(tǒng)處理方法在時空異常探測方面的局限性,設(shè)計實現(xiàn)了一個大數(shù)據(jù)存儲及分析平臺。主要研究內(nèi)容和創(chuàng)新如下: (1)本文分析和研究云平臺下Hadoop、HBase、Hive及Zookeeper的技術(shù)原理,研究了Hadoop框架的HDFS原理及MapReduce編程模型,重點研究了HBase分布式數(shù)據(jù)庫的數(shù)據(jù)存儲架構(gòu)底層實現(xiàn)原理及HBase表的數(shù)據(jù)模型。在此基礎(chǔ)上,本文構(gòu)建了基于Hadoop、HBase、Hive及Zookeeper的云平臺,并搭建了HBase+Hive系統(tǒng)擴(kuò)展架構(gòu)。 (2)對時空異常探測方法進(jìn)行了深入研究,分析研究了現(xiàn)有的一些時空異常模式,通過挖掘預(yù)先定義的時空異常模式得到有價值的知識。提出了基于云平臺的四步驟時空異常探測方法(數(shù)據(jù)預(yù)處理、分布式異常探測方法、知識規(guī)則應(yīng)用、結(jié)果驗證)來挖掘預(yù)先定義的時空異常模式,使用交通數(shù)據(jù)流中的一個真實應(yīng)用來驗證該方法。實驗表明該方法具有較高的運行效率和正確性。 (3)研究了HBase行鍵設(shè)計,提出了基于行鍵的數(shù)據(jù)模型。在明確設(shè)計目標(biāo)的基礎(chǔ)上,利用行鍵來設(shè)計輔助索引表和副本恢復(fù)表,實現(xiàn)了一種基于HBase的分布式輔助索引并應(yīng)用于交通流過車數(shù)據(jù)應(yīng)用中。實驗表明該索引機(jī)制可以高效地實現(xiàn)海量數(shù)據(jù)的查詢。 (4)結(jié)合上述的研究內(nèi)容,本文設(shè)計實現(xiàn)了大數(shù)據(jù)存儲及分析平臺,包括云平臺、后臺程序和前臺展示系統(tǒng)。將時空異常探測的真實應(yīng)用集成到該平臺中,給用戶提供方便操作及結(jié)果展示。
[Abstract]:In recent years, more and more applications collect and store a large amount of spatio-temporal data in distributed databases, which makes the demand of spatio-temporal data mining increasing. In the field of public security traffic management, due to the sharp increase of traffic flow data and the remarkable spatio-temporal characteristics of traffic flow data, there are serious challenges in dealing with massive spatio-temporal data. For the growing mass data analysis, the traditional processing methods can not meet the needs of users in terms of storage space and computing efficiency, and need a platform to support mass data storage and analysis to meet the new needs. Spatiotemporal anomaly detection is an important branch of spatiotemporal data mining. In this paper, a big data storage and analysis platform is designed and implemented in view of the limitation of the traditional processing methods in the detection of space-time anomalies. The main research contents and innovations are as follows: 1) this paper analyzes and studies the technical principle of Hadoop HBaseHive and Zookeeper under the cloud platform, studies the HDFS principle and MapReduce programming model of Hadoop framework, and emphatically studies the underlying realization principle of HBase distributed database data storage architecture and the data model of HBase table. On this basis, this paper constructs a cloud platform based on Hadoop HBaseHive and Zookeeper, and builds a HBase Hive system extension architecture. 2) the methods of detecting space-time anomalies are deeply studied, and some existing spatio-temporal anomaly patterns are analyzed and studied, and valuable knowledge is obtained by mining predefined spatio-temporal anomaly patterns. A four-step spatio-temporal anomaly detection method based on cloud platform (data preprocessing, distributed anomaly detection, knowledge rule application and result verification) is proposed to mine predefined spatio-temporal anomaly patterns. Use a real application in traffic data flow to verify the method. Experiments show that the method has high efficiency and correctness. The design of HBase row key is studied, and the data model based on line key is proposed. On the basis of clear design goal, the auxiliary index table and replica recovery table are designed by using row key, and a distributed auxiliary index based on HBase is implemented and applied to traffic passing vehicle data application. Experiments show that the indexing mechanism can efficiently realize the query of massive data. This paper designs and implements big data storage and analysis platform, including cloud platform, background program and foreground display system. The real application of space-time anomaly detection is integrated into the platform to provide users with convenient operation and display of results.
【學(xué)位授予單位】:杭州電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP333;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 舒紅;陳軍;史文中;;時空數(shù)據(jù)模型研究綜述[J];計算機(jī)科學(xué);1998年06期
2 柴曉路;曹晶;施伯樂;;時空信息的層次存儲和管理[J];計算機(jī)科學(xué);2000年07期
3 王珊;王會舉;覃雄派;周p
本文編號:1968888
本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/1968888.html
最近更新
教材專著