面向Hadoop的應用特性分析及系統(tǒng)性能優(yōu)化
發(fā)布時間:2018-09-16 21:48
【摘要】:Hadoop是目前使用最為廣泛的大數據處理系統(tǒng)。盡管Hadoop為大規(guī)模分布式數據處理提供了高效的解決方案,但是Hadoop系統(tǒng)仍然面臨著一系列的挑戰(zhàn):1)Hadoop對外提供的抽象編程接口隱藏了底層具體的實現(xiàn)細節(jié),難以對應用程序進行性能分析;2)Hadoop系統(tǒng)配置參數對系統(tǒng)性能有重要的影響,但默認配置模式不能保證所有應用程序獲得最佳的性能,需要有針對性地進行配置參數調優(yōu);3)數據的頻繁移動嚴重制約大數據系統(tǒng)的性能,需要尋求新的解決方案以降低數據移動對大數據系統(tǒng)性能造成的不利影響。本文主要針對Hadoop系統(tǒng)中應用程序的性能特性分析和性能優(yōu)化方案加以研究。首先,本文基于二進制字節(jié)碼動態(tài)追蹤技術設計并實現(xiàn)了一個輕量級、非侵入式的分布式Hadoop應用性能分析框架,能夠動態(tài)獲取應用程序的運行時狀態(tài)并進行性能分析,幫助用戶了解應用程序在Hadoop系統(tǒng)中運行時的性能特性,進而為應用程序的優(yōu)化指明方向。其次,本文提出了一種針對動態(tài)資源分配場景的Hadoop應用程序性能模型,并以該性能模型為基礎使用遺傳算法對全局的高維配置參數空間進行搜索,從而解決Hadoop系統(tǒng)配置參數的調優(yōu)問題。本文提出的Hadoop應用程序性能模型的預測錯誤率低于6%;相比于默認配置,使用本文方案優(yōu)化后平均可以獲得9.52倍的性能提升,最高可獲得18.76倍的性能提升。最后,本文針對Hadoop系統(tǒng)中MapReduce應用的數據并行處理特性提出了一種近數據處理系統(tǒng),提供了完整的軟硬件接口、動態(tài)任務遷移機制和運行時環(huán)境,并實現(xiàn)了 一個輕量級的MapReduce框架,支持將Map任務和Reduce任務遷移至近數據處理單元中完成。相比于不采用近數據處理的基準系統(tǒng),本文提出的近數據處理系統(tǒng)獲得了4.83倍性能提升,系統(tǒng)功耗可以降低26%;相比于采用近數據處理但不支持數據并行處理的SMC系統(tǒng),本文提出的近數據處理系統(tǒng)功耗增加了37%,但獲得了2.32倍的性能提升。
[Abstract]:Hadoop is the most widely used big data processing system. Although Hadoop provides an efficient solution for large-scale distributed data processing, Hadoop systems still face a series of challenges: 1) the abstract programming interface provided by Hadoop hides the underlying implementation details. Hadoop system configuration parameters have a significant impact on system performance, but default configuration mode does not guarantee optimal performance for all applications. In order to reduce the adverse effect of data mobility on the performance of big data system, the frequent movement of configuration parameters is needed to restrict the performance of big data system seriously, and a new solution is needed to reduce the adverse effect caused by data mobility on the performance of big data system. In this paper, the performance characteristic analysis and performance optimization scheme of application program in Hadoop system are studied. Firstly, this paper designs and implements a lightweight, non-intrusive distributed Hadoop application performance analysis framework based on binary bytecode dynamic tracing technology, which can dynamically obtain the runtime state of the application and analyze its performance. To help users understand the performance characteristics of applications running in Hadoop systems, and then point out the direction of application optimization. Secondly, this paper proposes a Hadoop application performance model for dynamic resource allocation scenarios. Based on the performance model, genetic algorithm is used to search the global high-dimensional configuration parameter space. In order to solve the Hadoop system configuration parameters optimization problem. The prediction error rate of the Hadoop application performance model proposed in this paper is less than 6. Compared with the default configuration, the optimized scheme can achieve an average performance improvement of 9.52 times and a maximum performance improvement of 18.76 times. Finally, this paper presents a near data processing system based on the data parallel processing characteristics of MapReduce application in Hadoop system, which provides complete hardware and software interface, dynamic task migration mechanism and runtime environment. A lightweight MapReduce framework is implemented to support the migration of Map and Reduce tasks to near data processing units. Compared with the reference system without near data processing, the proposed near data processing system has achieved a 4.83 times performance improvement, and the power consumption of the system can be reduced by 26. Compared with the SMC system which uses near data processing but does not support data parallel processing, the proposed near data processing system can improve the performance of the system by 4.83 times and reduce the power consumption of the system by 26%. The power consumption of the proposed near data processing system is increased by 37 times, but the performance is improved by 2.32 times.
【學位授予單位】:浙江大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13
本文編號:2244911
[Abstract]:Hadoop is the most widely used big data processing system. Although Hadoop provides an efficient solution for large-scale distributed data processing, Hadoop systems still face a series of challenges: 1) the abstract programming interface provided by Hadoop hides the underlying implementation details. Hadoop system configuration parameters have a significant impact on system performance, but default configuration mode does not guarantee optimal performance for all applications. In order to reduce the adverse effect of data mobility on the performance of big data system, the frequent movement of configuration parameters is needed to restrict the performance of big data system seriously, and a new solution is needed to reduce the adverse effect caused by data mobility on the performance of big data system. In this paper, the performance characteristic analysis and performance optimization scheme of application program in Hadoop system are studied. Firstly, this paper designs and implements a lightweight, non-intrusive distributed Hadoop application performance analysis framework based on binary bytecode dynamic tracing technology, which can dynamically obtain the runtime state of the application and analyze its performance. To help users understand the performance characteristics of applications running in Hadoop systems, and then point out the direction of application optimization. Secondly, this paper proposes a Hadoop application performance model for dynamic resource allocation scenarios. Based on the performance model, genetic algorithm is used to search the global high-dimensional configuration parameter space. In order to solve the Hadoop system configuration parameters optimization problem. The prediction error rate of the Hadoop application performance model proposed in this paper is less than 6. Compared with the default configuration, the optimized scheme can achieve an average performance improvement of 9.52 times and a maximum performance improvement of 18.76 times. Finally, this paper presents a near data processing system based on the data parallel processing characteristics of MapReduce application in Hadoop system, which provides complete hardware and software interface, dynamic task migration mechanism and runtime environment. A lightweight MapReduce framework is implemented to support the migration of Map and Reduce tasks to near data processing units. Compared with the reference system without near data processing, the proposed near data processing system has achieved a 4.83 times performance improvement, and the power consumption of the system can be reduced by 26. Compared with the SMC system which uses near data processing but does not support data parallel processing, the proposed near data processing system can improve the performance of the system by 4.83 times and reduce the power consumption of the system by 26%. The power consumption of the proposed near data processing system is increased by 37 times, but the performance is improved by 2.32 times.
【學位授予單位】:浙江大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13
【參考文獻】
相關期刊論文 前3條
1 程學旗;靳小龍;王元卓;郭嘉豐;張鐵贏;李國杰;;大數據系統(tǒng)和分析技術綜述[J];軟件學報;2014年09期
2 宮學慶;金澈清;王曉玲;張蓉;周傲英;;數據密集型科學與工程:需求和挑戰(zhàn)[J];計算機學報;2012年08期
3 王鵬;孟丹;詹劍鋒;涂碧波;;數據密集型計算編程模型研究進展[J];計算機研究與發(fā)展;2010年11期
,本文編號:2244911
本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/2244911.html
最近更新
教材專著