天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

MapReduce下區(qū)間連接方法研究

發(fā)布時間:2018-05-08 20:16

  本文選題:區(qū)間連接 + 集合分類。 參考:《華中科技大學》2016年碩士論文


【摘要】:隨著網(wǎng)絡技術的飛速發(fā)展,全球數(shù)據(jù)倍增,為大數(shù)據(jù)的分析和處理帶來了困難。Map Reduce作為新興的數(shù)據(jù)密集型計算編程模型,在大數(shù)據(jù)分析與處理方面發(fā)揮了重要的作用。而區(qū)間連接是屬性取值在一個范圍內的連接運算,是大數(shù)據(jù)分析和處理的重要運算,如何利用Map Reduce編程平臺提升區(qū)間連接的效率具有重要的意義。在Allen提出的區(qū)間元組概念、區(qū)間元組關系的基礎上,設計了一種基于集合分類實現(xiàn)二路區(qū)間和多路區(qū)間的連接算法。首先將參與運算的區(qū)間元組根據(jù)區(qū)間范圍均勻劃分成若干個分區(qū),根據(jù)元組與分區(qū)是否有交集,將元組映射到相應的分區(qū)集合,對每個元組在分區(qū)中的位置進行分類,定義了四種類型的集合分類,并分析了每個分區(qū)中四種類型集合分類占分區(qū)數(shù)據(jù)總量的比例。其次用Map Reduce分布式編程框架編程實現(xiàn)二路區(qū)間和多路區(qū)間連接算法。通過四種集合分類構建的鍵值對可以過濾掉不需要參與連接的元組,減少Map端數(shù)據(jù)傳輸量和Reduce端數(shù)據(jù)計算量,提升區(qū)間連接的效率。最后,根據(jù)各個集合分類占各個分區(qū)數(shù)據(jù)總量的比例,分別制定二路區(qū)間和多路區(qū)間的負載均衡策略,重新組合各個分區(qū)之間的集合分類生成新的鍵值對,均衡各個Reduce節(jié)點收到的數(shù)據(jù),以進一步提高區(qū)間連接作業(yè)的完成效率。在搭建的分布式Hadoop平臺下分別對二路區(qū)間連接和多路區(qū)間連接方法進行了有效性的驗證。實驗結果表明,基于集合分類的區(qū)間連接方法能適用于多種情況,相比已有二路區(qū)間連接和多路區(qū)間連接方法具有一定的優(yōu)勢,并且制定的負載均衡策略能進一步提升效率。
[Abstract]:With the rapid development of network technology, the global data is multiplying, which brings difficulties to the analysis and processing of large data..Map Reduce is a new data intensive programming model, which plays an important role in the analysis and processing of large data. And the important operation of processing, how to use Map Reduce programming platform to improve the efficiency of the interval connection is of great significance. Based on the concept of interval tuples and interval tuples proposed by Allen, a connection algorithm based on set classification is designed to realize the connection between the two path interval and the multipath interval. First, the interval tuples involved in the operation are based on the algorithm. The interval range is divided into several partitions. According to whether the tuple and the partition have intersection, the tuples are mapped to the corresponding partition sets, the positions of each tuple in the partition are classified, four types of set classification are defined, and the proportion of the four types of set classification in each partition is analyzed. Secondly, Ma is used. P Reduce distributed programming framework programming two road interval and multipath interval connection algorithm. Through four sets of set of key values, we can filter the tuples that do not need to join, reduce the amount of data transmission in the Map end and the amount of data in the Reduce end, and improve the efficiency of the interval connection. Finally, according to each set classification, each partition occupies each partition. In the proportion of total data, the load balancing strategy of two roads and multiple intervals is formulated respectively, and the set classification between each partition is recombined to generate a new key value pair, and the data received by each Reduce node is balanced to further improve the completion efficiency of the interval connection operation. In the distributed Hadoop platform, the two road intervals are respectively set up. The effectiveness of connection and multiple interval connection method is verified. The experimental results show that the interval connection method based on the set classification can be applied to a variety of situations. Compared with the existing two way interval connection and multipath interval connection method, the proposed load balancing strategy can further improve the efficiency.

【學位授予單位】:華中科技大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP311.13

【參考文獻】

相關期刊論文 前4條

1 張延松;;數(shù)據(jù)庫與MapReduce融合的大數(shù)據(jù)管理技術探索[J];科研信息化技術與應用;2013年01期

2 孟小峰;慈祥;;大數(shù)據(jù)管理:概念、技術與挑戰(zhàn)[J];計算機研究與發(fā)展;2013年01期

3 覃雄派;王會舉;杜小勇;王珊;;大數(shù)據(jù)分析——RDBMS與MapReduce的競爭與共生[J];軟件學報;2012年01期

4 姜素芳;陳天滋;;空間連接優(yōu)化方法的研究[J];計算機工程;2007年02期

相關博士學位論文 前1條

1 黃繼先;基于R-樹的空間數(shù)據(jù)庫查詢技術研究[D];中南大學;2005年

相關碩士學位論文 前2條

1 孫惠;基于Hadoop框架的大數(shù)據(jù)集連接優(yōu)化算法[D];南京郵電大學;2013年

2 李俊潔;空間數(shù)據(jù)庫中空間連接和查詢優(yōu)化研究[D];哈爾濱理工大學;2008年



本文編號:1862908

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/1862908.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶b146e***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com