天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 碩博論文 > 信息類碩士論文 >

Storm環(huán)境下基于資源感知的任務(wù)調(diào)度研究

發(fā)布時(shí)間:2018-07-27 15:58
【摘要】:隨著大數(shù)據(jù)應(yīng)用程序中數(shù)據(jù)創(chuàng)建速度的不斷提高,需要及時(shí)實(shí)時(shí)處理大量的數(shù)據(jù),Apache Storm是一個(gè)流處理系統(tǒng),具有實(shí)時(shí)、分布式、可擴(kuò)展和高可靠的數(shù)據(jù)處理優(yōu)勢(shì),在學(xué)術(shù)界和產(chǎn)業(yè)界備受關(guān)注。在一個(gè)復(fù)雜的流事件處理引擎中,數(shù)據(jù)是必須被快速分析處理的事件實(shí)時(shí)流,這種形式主要用于大數(shù)據(jù)中,不斷產(chǎn)生的數(shù)據(jù)流被加工使用和處理結(jié)果為進(jìn)一步生成新事件數(shù)據(jù)流做準(zhǔn)備。為了評(píng)估資源分配策略是否成功,三個(gè)性能指標(biāo)用來(lái)檢查其在資源調(diào)度時(shí)資源波動(dòng)的適應(yīng)性,這些性能指標(biāo)包括處理延遲,資源吞吐量和用戶滿意度。執(zhí)行調(diào)度相關(guān)的元件,被定義為基本計(jì)算組件,聚合到單個(gè)topology結(jié)構(gòu)中執(zhí)行。不同到達(dá)率的實(shí)時(shí)數(shù)據(jù)流以及不斷變化的操作條件對(duì)數(shù)據(jù)處理提出了新挑戰(zhàn),因此,提高調(diào)度效率成為本文解決的主要問(wèn)題,也成為在活躍的物理節(jié)點(diǎn)間查找Strom優(yōu)化布置的關(guān)鍵環(huán)節(jié)。然而,像許多其他大數(shù)據(jù)處理系統(tǒng)一樣,Storm沒(méi)有智能調(diào)度機(jī)制。目前在Storm中默認(rèn)循環(huán)調(diào)度機(jī)制沒(méi)有充分考慮資源需求和可用性,導(dǎo)致了資源不能被充分使用或過(guò)度利用。設(shè)計(jì)出可以應(yīng)對(duì)輸入數(shù)據(jù)流突然波動(dòng)的彈性解決方案是最近熱門的研究領(lǐng)域。傳統(tǒng)的調(diào)度方案在很大程度上依賴于一組性能指標(biāo)的測(cè)量,通過(guò)將其與另一組預(yù)定閾值進(jìn)行比較來(lái)做出適當(dāng)?shù)恼{(diào)度。這種方案缺乏對(duì)可用資源量的實(shí)時(shí)變化的適應(yīng)性。本文提出了一個(gè)用于Storm框架的基于CPU、內(nèi)存、網(wǎng)絡(luò)帶寬的資源自適應(yīng)調(diào)度器,能更有效地分配資源并提高性能,并且考慮了Storm的任務(wù)間的數(shù)據(jù)傳輸速率和負(fù)載均衡,將高度通信的任務(wù)對(duì)分配給同一組計(jì)算節(jié)點(diǎn)。同Storm提供的默認(rèn)調(diào)度相比,本文的調(diào)度算法具有顯著的改進(jìn),它將整個(gè)任務(wù)分布在集群中,感知CPU、內(nèi)存、網(wǎng)絡(luò)帶寬的變化來(lái)進(jìn)行任務(wù)調(diào)度。通過(guò)分析Storm默認(rèn)任務(wù)調(diào)度策略的特點(diǎn)和性能,本文設(shè)計(jì)并實(shí)現(xiàn)了基于Storm資源感知的流數(shù)據(jù)處理系統(tǒng)。與默認(rèn)的Storm調(diào)度相比,改進(jìn)后的Storm調(diào)度具有以下理想特征:(1)基于運(yùn)行時(shí)狀態(tài),通過(guò)有效的資源感知調(diào)度來(lái)動(dòng)態(tài)地分配或重新分配任務(wù)來(lái)加速數(shù)據(jù)處理,從而最小化節(jié)點(diǎn)間和進(jìn)程間資源開(kāi)銷的同時(shí)確保沒(méi)有工作節(jié)點(diǎn)過(guò)載;(2)它能夠?qū)ぷ鞴?jié)點(diǎn)進(jìn)行資源整合,從而進(jìn)行細(xì)粒度的控制,使改進(jìn)后Storm能夠以更少的工作節(jié)點(diǎn)實(shí)現(xiàn)更好的性能;(3)它允許調(diào)度算法通過(guò)代碼實(shí)現(xiàn)模塊化管理,也允許調(diào)度參數(shù)的調(diào)整;(4)它對(duì)Storm用戶是透明的,Storm應(yīng)用程序可以被移植到改進(jìn)后Strom調(diào)度的平臺(tái)上。本文在SOL、RollingSort和WordCount這三種Benchmark流數(shù)據(jù)處理應(yīng)用程序的基礎(chǔ)上添加感知CPU、內(nèi)存、網(wǎng)絡(luò)帶寬的監(jiān)控程序代碼,將監(jiān)控信息存入數(shù)據(jù)庫(kù)中,調(diào)度器根據(jù)改進(jìn)后的算法程序從數(shù)據(jù)庫(kù)中獲取數(shù)據(jù)并替換默認(rèn)的調(diào)度策略,自動(dòng)生成對(duì)topology節(jié)點(diǎn)的吞吐量和節(jié)點(diǎn)間的時(shí)間延遲的統(tǒng)計(jì)表以進(jìn)行性能評(píng)估。多次的實(shí)驗(yàn)結(jié)果表明,與Storm默認(rèn)調(diào)度程序相比,改進(jìn)后的Storm在SOL、RollingSort和WordCount上的性能更優(yōu)。
[Abstract]:With the increasing speed of data creation in large data applications, a lot of data need to be processed in time. Apache Storm is a flow processing system. It has the advantages of real-time, distributed, scalable and high reliable data processing. It is paid much attention in the academia and industry. In a complex flow event processing engine, data is necessary. The event real-time flow that must be quickly analyzed and processed is mainly used in large data, and the generated data streams are processed and processed to prepare for the further generation of new event data streams. In order to assess whether the resource allocation strategy is successful, three performance metrics are used to check the adaptability of the resource volatility during resource scheduling. These performance metrics include processing latency, resource throughput, and user satisfaction. Executing scheduling related components are defined as basic computing components, aggregated into a single topology structure. Real time data streams with different arrival rates and changing operating conditions pose new challenges to data handling. Therefore, scheduling efficiency is improved. As the main problem solved in this article, it is also the key link to find the optimal Strom arrangement between active physical nodes. However, like many other large data processing systems, Storm has no intelligent scheduling mechanism. At present, the default cyclic scheduling mechanism in Storm does not fully consider the resource requirements and availability, resulting in the failure of the resources to be filled. An elastic solution that can cope with the sudden fluctuation of the input data flow is a recent hot research field. The traditional scheduling scheme, to a large extent, relies on the measurement of a set of performance metrics and makes appropriate scheduling by comparing it with another set of predetermined thresholds. In this paper, a resource adaptive scheduler based on CPU, memory, network bandwidth is proposed for Storm framework, which can allocate resources and improve performance more effectively, and consider the data transmission rate and load balance between tasks of Storm, and assign the task pairs of high communication to the same group. Compared with the default scheduling provided by Storm, the scheduling algorithm in this paper has a significant improvement. It distributes the whole task in the cluster, perceiving the changes of CPU, memory, and network bandwidth to perform task scheduling. By analyzing the characteristics and performance of the Storm default task scheduling strategy, this paper designs and implements a flow based on the Storm resource perception. The data processing system. Compared with the default Storm scheduling, the improved Storm scheduling has the following ideal features: (1) to dynamically allocate or reassign tasks to speed up data processing based on the runtime state, dynamically allocate or reassign tasks through the efficient resource aware scheduling, thus minimizing the inter node and inter process resource overhead while ensuring no working nodes. Overload; (2) it can integrate the resources of the work node to make fine-grained control so that the improved Storm can achieve better performance with fewer work nodes; (3) it allows the scheduling algorithm to implement modularized management through the code and allow the adjustment of the scheduling parameters; (4) it is transparent to the Storm user, and the Storm application can On the platform of the improved Strom scheduling. Based on the three Benchmark stream data processing applications of SOL, RollingSort and WordCount, this article adds the monitoring program code that perceiving CPU, memory, network bandwidth, storing the monitoring information into the database, and the scheduler obtains data from the database based on the improved algorithm program and Instead of the default scheduling policy, a statistical table of throughput and time delay between the topology nodes is automatically generated for performance evaluation. Several experimental results show that the improved Storm is better than the Storm default scheduler on SOL, RollingSort and WordCount.
【學(xué)位授予單位】:新疆大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP301.6

【參考文獻(xiàn)】

相關(guān)期刊論文 前7條

1 陳伯雄;艾中良;;差異化作業(yè)調(diào)度在Storm上的實(shí)現(xiàn)[J];軟件;2017年01期

2 熊安萍;王賢穩(wěn);鄒洋;;基于Storm拓?fù)浣Y(jié)構(gòu)熱邊的調(diào)度算法[J];計(jì)算機(jī)工程;2017年01期

3 黃容;王賢穩(wěn);;基于Storm slot使用率低優(yōu)先的動(dòng)態(tài)負(fù)載均衡策略[J];電腦知識(shí)與技術(shù);2016年36期

4 楊秋吉;于俊清;莫斌生;何云峰;;面向Storm的數(shù)據(jù)流編程模型與編譯優(yōu)化方法研究[J];計(jì)算機(jī)工程與科學(xué);2016年12期

5 孫大為;;大數(shù)據(jù)流式計(jì)算:應(yīng)用特征和技術(shù)挑戰(zhàn)[J];大數(shù)據(jù);2015年03期

6 孫大為;張廣艷;鄭緯民;;大數(shù)據(jù)流式計(jì)算:關(guān)鍵技術(shù)及系統(tǒng)實(shí)例[J];軟件學(xué)報(bào);2014年04期

7 孟小峰;慈祥;;大數(shù)據(jù)管理:概念、技術(shù)與挑戰(zhàn)[J];計(jì)算機(jī)研究與發(fā)展;2013年01期

相關(guān)碩士學(xué)位論文 前3條

1 談杰;基于storm的實(shí)時(shí)物流數(shù)據(jù)查詢系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[D];南京郵電大學(xué);2016年

2 李萍;基于SLA感知的Hadoop YARN節(jié)能調(diào)度策略研究[D];山東大學(xué);2016年

3 王冬;基于Storm的鐵道供電監(jiān)控信息實(shí)時(shí)流計(jì)算處理研究[D];華東交通大學(xué);2016年



本文編號(hào):2148344

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/shoufeilunwen/xixikjs/2148344.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶dde4f***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com