HDFS平臺上以能效為考量的小文件合并
發(fā)布時間:2018-10-18 09:40
【摘要】:為了解決Hadoop分布式文件系統(tǒng)(HDFS)平臺上小文件的存在帶來MapReduce程序運行能耗成本偏高問題,建立Hadoop節(jié)點集群的能耗模型進行分析推導,證明了在Hadoop平臺上,存在能使程序運行能耗成本最低的最優(yōu)文件大小,并在此基礎上結合經濟學邊際分析理論提出一種基于能耗成本和訪問成本考慮的最優(yōu)文件大小判定策略.此策略可以對存放在HDFS上的小文件合并進行效益計算,將小文件合并為成本最優(yōu)文件大小以獲得最佳收益.通過實驗證明了能效最優(yōu)數據塊大小的存在,并證明了成本和效益相結合利用邊際分析理論來確定數據塊大小的合理性和有效性.
[Abstract]:In order to solve the problem that the existence of small files on the (HDFS) platform of Hadoop distributed file system leads to the high running energy cost of MapReduce program, the energy consumption model of Hadoop node cluster is established to analyze and deduce, which is proved on Hadoop platform. There is an optimal file size which can minimize the cost of running the program, and based on this, a decision strategy of optimal file size based on energy cost and access cost is proposed based on the economic marginal analysis theory. This strategy can calculate the benefits of merging small files stored on HDFS, and merge small files into the cost optimal file size to obtain the best income. The existence of the optimal data block size for energy efficiency is proved by experiments, and the rationality and effectiveness of using the marginal analysis theory to determine the size of the data block are proved by the combination of cost and benefit.
【作者單位】: 中南大學軟件學院;河南大學軟件學院;北京信息科技大學計算機學院;
【基金】:國家自然科學基金項目(61272148;61301136) 高等學校博士學科點專項科研基金項目(20120162110061;20120162120091)
【分類號】:TP333
,
本文編號:2278736
[Abstract]:In order to solve the problem that the existence of small files on the (HDFS) platform of Hadoop distributed file system leads to the high running energy cost of MapReduce program, the energy consumption model of Hadoop node cluster is established to analyze and deduce, which is proved on Hadoop platform. There is an optimal file size which can minimize the cost of running the program, and based on this, a decision strategy of optimal file size based on energy cost and access cost is proposed based on the economic marginal analysis theory. This strategy can calculate the benefits of merging small files stored on HDFS, and merge small files into the cost optimal file size to obtain the best income. The existence of the optimal data block size for energy efficiency is proved by experiments, and the rationality and effectiveness of using the marginal analysis theory to determine the size of the data block are proved by the combination of cost and benefit.
【作者單位】: 中南大學軟件學院;河南大學軟件學院;北京信息科技大學計算機學院;
【基金】:國家自然科學基金項目(61272148;61301136) 高等學校博士學科點專項科研基金項目(20120162110061;20120162120091)
【分類號】:TP333
,
本文編號:2278736
本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/2278736.html
最近更新
教材專著