基于Spark的空間數據平臺系統(tǒng)的設計與實現(xiàn)

發(fā)布時間：2018-01-10 10:17

本文關鍵詞：基于Spark的空間數據平臺系統(tǒng)的設計與實現(xiàn)　出處：《山東大學》2017年碩士論文　論文類型：學位論文

【摘要】：空間數據,也被稱為地理數據�？臻g數據是可以由地理坐標系位置表示的感衛(wèi)星監(jiān)測產生的地理信息,如河流,湖泊,城鎮(zhèn)。移動通信網絡中的手機通話信息,城交通網絡中的安裝有GPS的車輛位置信息,社交網絡中產生的帶有位置的信息。充分分析和利用這些空間數據將會在環(huán)境處理,通信安全和交通規(guī)劃等領域具有重要作用。物理對象的信息。當前,眾多行業(yè)持續(xù)不斷地產生了大量的空間數據。隨著大量有價值的空間數據的產生,使用適應于大規(guī)模空間數據處理的工具對空間數據進行分析與處理的需求越來越迫切。但是,當前的關系型數據庫技術和分布式計算系統(tǒng)卻并不適合于處理空間數據�？臻g數據索引結構不適合用關系數據庫表達,從而導致關系型數據庫處理空間數據查詢操作效率低下。由于MapReduce編程模型的缺點,現(xiàn)有的基于HDFS和MapReduce的分布式數據分析框架處理交互式查詢和迭代操作時速度較慢。MapReduce模型使用如下方式對數據進行處理:首先從集群磁盤中讀取數據到內存,對執(zhí)行計算,然后將結果從內存寫到集群磁盤,作為下次計算的輸入。每次計算過程產生的冗余磁盤讀寫開銷使得基于MapReduce的算法實現(xiàn)存在嚴重的性能問題,無法滿足用戶對大規(guī)�？臻g數據實時分析的要求。Apache Spark是一個新興的集群計算框架,與MapReduce框架相比,Spark提供內存迭代計算功能。計算數據可以常駐內存而省去磁盤I/O時間。在交互式查詢環(huán)境中,比目前最流行的并行計算工具Hadoop快100多倍。隨著Spark框架不斷的更新與發(fā)展,研究人員開始通過擴展Spark實現(xiàn)對空間數據的分布式查詢處理。GeoSpark和SpatialSpark是目前為止最先進的系統(tǒng)。他們通過擴展Spark實現(xiàn)了空間數據的分布式存儲的查詢操作。這兩個系統(tǒng)的系統(tǒng)框架類似,都主要由三層組成:空間數據存儲層,數據索引層和查詢處理層,空間數據存儲層實現(xiàn)對大規(guī)�？臻g數據的分布式存儲。數據索引層將傳統(tǒng)的空間索引技術應用于分布式存儲的空間數據集群。查詢處理層對用戶提供空間查詢操作接口,通過索引層和存儲層,實現(xiàn)空間數據分析。提供的查詢操作包含區(qū)域查詢,空間關聯(lián)查詢和空間k最近鄰查詢。但是GeoSpark和SpatialSpark在設計上仍然存在一系列缺點,導致最終的查詢性能不高。本文,我們通過全面改進上述系統(tǒng)架構,分別使用了新的空間數據分區(qū)策略,索引結構和查詢處理技術,設計并實現(xiàn)了一個新的基于Spark的空間數據計算系統(tǒng)Spark-GIS,全面的實驗表明,Spark-GIS比上述系統(tǒng)具有更高的查詢性能。Spark-GIS的主要創(chuàng)新包括以下三個方面:1.在空間數據存儲層,設計并實現(xiàn)了一個新的空間數據分區(qū)策略,使用新的分區(qū)策略實現(xiàn)的空間數據分布式存儲為上層的空間數據查詢提供了更好的支持,確�？臻g數據查詢時避免工作負載均衡問題。2.在空間數據索引層,設計并實現(xiàn)了一種基于Voronoi圖的R樹空間索引結構,與R樹相比,在未降低系統(tǒng)空間查詢性能同時,大大減少生成空間索引結構的時間和空間索引結構的大小。3.在空間數據分析層,通過結合改進的空間數據分布式存儲策略,空間索引技術,實現(xiàn)了基于Spark的并行空間數據查詢算法,能夠為用戶提供海量高并發(fā)的空間數據交互式查詢。包括空間區(qū)域查詢,空間聯(lián)接查詢和空間k最近鄰詢。最后,我們對Spark-GIS,Spark和GeoSpark進行了全面的對比測試。測試數據是數量為億級別的移動電話通話記錄數據。實驗結果顯示Spark-GIS空間查詢操作性能全面優(yōu)于目前為止最先進的系統(tǒng)——GeoSpark,尤其在空間區(qū)域查詢和空間聯(lián)接查詢方面,性能比GeoSpark改善了多個數量級。
[Abstract]:Spatial data, also known as geographic data. Spatial data is from the geographical coordinates of the position of said sense satellite monitoring produces geographic information, such as rivers, lakes, cities and towns. In the mobile communication network of mobile phone call information, city traffic network is installed on the vehicle position information of GPS, produced with location information in a social network. The full analysis and use of the spatial data in the environment will play an important role in the field of communication, security and traffic planning. The physical object information. At present, many industries continue to produce a large number of spatial data. Spatial data with a large number of valuable production needs, suitable for use in large scale spatial data tools the processing of spatial data analysis and processing become more and more urgent. However, relational database technology and distributed computing system currently is not suitable for the treatment of air Among the data. Spatial data index structure is not suitable for expression in relational database, resulting in relational database processing spatial data query efficiency. The MapReduce programming model, the existing HDFS and MapReduce distributed data analysis framework based on the interactive processing model of.MapReduce slow speed of query and iterative operation when using the following method for data processing: first to read data into memory from the cluster disk, to perform a calculation, then the results from the cluster disk memory writes, as the next calculation input. Each calculation process produces redundant disk read and write overhead that implements MapReduce algorithm based on serious performance problems, unable to meet user requirements for real-time analysis of large scale spatial data.Apache Spark is an emerging cluster computing framework, compared with the MapReduce framework, Spark provides internal storage The iterative calculation function. The calculation data can be saved to disk I/O memory resident time. In the interactive query environment, calculation tool Hadoop 100 times faster than the parallel current most popular Spark framework. With the constantly updated and development, researchers began by extending Spark to realize distributed spatial data query processing on.GeoSpark and SpatialSpark is the current system so far the most advanced. They through extending Spark to realize distributed data storage query system framework of these two systems are similar, mainly consists of three layers: the spatial data storage layer, data layer index and query processing layer, realize the distributed storage of large scale spatial data spatial data index data storage layer. The layer will be traditional spatial indexing technology used in distributed storage of spatial data. Cluster processing layer provides the user with the query spatial query operation In the index layer and storage layer, realize spatial data analysis. The query contains range queries, nearest neighbor queries of spatial query and spatial correlation of K. But GeoSpark and SpatialSpark still has a series of shortcomings in design, leading to final query performance is not high. In this paper, we improved the system through a comprehensive architecture. Using spatial data partition strategy, index structure and query processing technology, the design and implementation of a new computing system Spark-GIS based on Spark spatial data, comprehensive experiments to show the main innovation of Spark-GIS has a better performance than the.Spark-GIS query of the system includes the following three aspects: 1. in spatial data the storage layer, the design and implementation of a new spatial data partitioning strategy, spatial data distributed storage using the partition strategy of new implementation for the upper spatial data query To provide better support, to avoid the problem of work load balance in.2. spatial data index that spatial data query, the design and implementation of a R tree spatial index structure based on Voronoi, compared with the R tree, the query performance and reduce system in space, greatly reduce the generation time and the spatial index structure of spatial index the size of the structure of.3. in spatial data analysis layer, by combining spatial data distributed storage strategy improved, spatial indexing technology, realize the parallel query algorithm based on Spark spatial data, high concurrency can provide massive spatial data interactive query for users. Including spatial query, spatial join query and nearest neighbor query. Finally K space we, on Spark-GIS, Spark and GeoSpark are tested comprehensively. The test data is the mobile phone number to billion level call records data. The experimental results It shows that the performance of Spark-GIS spatial query operation is much better than the most advanced system so far -- GeoSpark, especially in spatial area query and spatial join query, its performance is improved by more than GeoSpark.

【學位授予單位】：山東大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：P208;TP311.52

【相似文獻】

相關期刊論文前10條

1 劉瑜,張毅,鄔倫;空間數據工程理論框架研究[J];地理與地理信息科學;2003年01期

2 陳俊杰,鄒友峰;GIS空間數據質量評價軟件設計探討[J];礦山測量;2005年03期

3 洪志全,葉琳,辛俊,張于峰;GIS空間數據索引技術研究與實現(xiàn)[J];物探化探計算技術;2005年01期

4 胡圣武;張光勝;王宏濤;;空間數據建庫研究[J];地球科學與環(huán)境學報;2007年02期

5 李偉芬;丁靜;苗卿;;空間數據多尺度研究綜述[J];電腦知識與技術(學術交流);2007年13期

6 王慶光;;GIS空間數據質量研究[J];水利科技與經濟;2007年05期

7 丁濱;夏洪山;;GIS空間數據索引技術研究[J];江蘇航空;2007年04期

8 譚紅霞;;GIS空間數據的質量探討[J];山東國土資源;2009年06期

9 廖俊國,劉興權;淺析GIS空間數據的誤差來源及處理方法[J];江蘇測繪;1998年03期

10 廖俊國,劉興權;淺析GIS空間數據的誤差來源及處理方法[J];四川測繪;1998年03期

相關會議論文前10條

1 汪建光;;空間數據自動批處理技術研究[A];2009全國測繪科技信息交流會暨首屆測繪博客征文頒獎論文集[C];2009年

2 楊成韞;榮芳;彭子風;;基于客戶/服務器結構的空間數據分布式處理研究[A];新世紀新機遇新挑戰(zhàn)——知識創(chuàng)新和高新技術產業(yè)發(fā)展（上冊）[C];2001年

3 陳良剛;王海兵;王宇君;施伯樂;;基于約束的空間數據查詢[A];第十七屆全國數據庫學術會議論文集（研究報告篇）[C];2000年

4 陳犖;劉云翔;唐宇;景寧;;基于優(yōu)先圖的空間數據應用服務鏈建模方法[A];第二十屆全國數據庫學術會議論文集（研究報告篇）[C];2003年

5 杜紅悅;宮輝力;馮克忠;賈建坤;權忠生;;應急救災空間數據中心建設技術與策略研究[A];第十七屆中國遙感大會摘要集[C];2010年

6 李諾夫;黎雷;;網絡環(huán)境下空間數據的管理[A];地理空間信息技術與應用——中國科協(xié)2002年學術年會測繪論文集[C];2002年

7 劉新貴;黃雅娟;;空間數據網絡化獲取與管理初探[A];中國地理信息系統(tǒng)協(xié)會第三次代表大會暨第七屆年會論文集[C];2003年

8 廖佳;;基礎空間數據生命周期管理[A];地理空間信息技術及其應用論壇論文集[C];2005年

9 葉榮青;吳曉玲;;福建省基礎空間數據管理技術研究[A];第四屆海峽兩岸GIS發(fā)展研討會暨中國GIS協(xié)會第十屆年會論文集[C];2006年

10 方金云;;空間數據虛擬化的實現(xiàn)技術研究[A];中國地理信息系統(tǒng)協(xié)會第九屆年會論文集[C];2005年

相關重要報紙文章前10條

1 寧津生陳軍晁定波;空間數據質量的主要內涵[N];中國測繪報;2002年

2 李豐丹;“國家地質空間數據網格服務系統(tǒng)”獲發(fā)明專利[N];中國礦業(yè)報;2009年

3 記者胡其峰;多項空間數據成果向社會開放[N];光明日報;2013年

4 孫昭榮;GIS圖窮數字見[N];中國計算機報;2002年

5 寧津生陳軍晁定波;空間數據的質量控制方法[N];中國測繪報;2002年

6 劉榮梅;中國1∶100萬地質圖空間數據實現(xiàn)國際共享[N];中國國土資源報;2014年

7 陳拂曉;空間數據：“數字城市”建設的基礎[N];中國計算機報;2002年

8 王東華邋羅建軍;美國空間數據一站式服務系統(tǒng)[N];中國測繪報;2007年

9 深圳商報記者　董超文;一部手機將可“裝”下一座城市[N];深圳商報;2006年

10 中國工程院院士劉先林;航測為智慧城市建設提供空間數據[N];中國信息化周報;2014年

相關博士學位論文前10條

1 劉義;大規(guī)模空間數據的高性能查詢處理關鍵技術研究[D];國防科學技術大學;2013年

2 范建永;基于Hadoop的云GIS若干關鍵技術研究[D];解放軍信息工程大學;2013年

3 馬伯寧;空間數據多尺度建模關鍵技術研究[D];國防科學技術大學;2014年

4 劉偉;基于地理本體的空間數據服務發(fā)現(xiàn)與集成[D];中國礦業(yè)大學;2010年

5 陳曉斌;基于網格中間件的空間數據訪問與集成技術[D];解放軍信息工程大學;2012年

6 李世明;林業(yè)空間數據平臺技術的應用示范研究[D];中國林業(yè)科學研究院;2008年

7 郭加樹;空間數據倉的構建及應用[D];中國石油大學;2007年

8 桑永勝;空間數據分析的神經計算方法[D];電子科技大學;2010年

9 劉丹;對等計算環(huán)境中的空間數據查詢定位研究[D];武漢大學;2011年

10 胡茂勝;基于數據中心模式的分布式異構空間數據無縫集成技術研究[D];中國地質大學;2009年

相關碩士學位論文前10條

1 張明佳;空間數據地圖模型的原型設計[D];中國地質大學(北京);2015年

2 于海濤;基于AE的油氣田地理信息系統(tǒng)的設計與實現(xiàn)[D];中國地質大學(北京);2015年

3 付悅華;基于概念格的空間數據規(guī)則提取[D];江西理工大學;2015年

4 項天宋;非洲綜合資源環(huán)境信息空間可視化系統(tǒng)設計與應用研究[D];福建師范大學;2015年

5 梁杰超;空間數據的訪問控制技術研究[D];浙江大學;2015年

6 崔洪博;重慶配電網基礎地理信息系統(tǒng)設計與實現(xiàn)[D];電子科技大學;2015年

7 黃正中;空間環(huán)境數據處理及可視化交互技術研究[D];電子科技大學;2014年

8 李青巖;Android下的移動空間數據存取方法研究[D];江西理工大學;2015年

9 何拴;基于ArcGIS的黑河中游水資源信息化平臺研究[D];蘭州大學;2015年

10 李真;海防雷達實訓數據管理系統(tǒng)的設計與實現(xiàn)[D];大連海事大學;2015年

，

本文編號：1404873

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/shoufeilunwen/xixikjs/1404873.html

上一篇：新輿論環(huán)境中醫(yī)生群體的媒介形象重塑
下一篇：基于網絡編碼的無線多跳網絡路由算法研究

論文發(fā)表

·知網|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Spark的空間數據平臺系統(tǒng)的設計與實現(xiàn)