基于云平臺的集群故障監(jiān)控的研究與實現(xiàn)
發(fā)布時間:2018-06-30 05:21
本文選題:云平臺 + 監(jiān)控系統(tǒng); 參考:《北京郵電大學》2014年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)技術普及和信息化技術的不斷提高,社會上各個領域對信息化的要求越來越高,處理的數(shù)據(jù)也不斷增加。云計算已從概念落實到實際應用中,發(fā)展已臻成熟,已發(fā)展為可個性化定制、伸縮可擴展、面向服務的公有云或私有云。云平臺的服務質量對于云平臺有著重要的意義,監(jiān)控是云計算平臺的重要組成部分,它是云計算平臺中很多諸如網(wǎng)絡分析、系統(tǒng)管理、作業(yè)調度、負載均衡、事件預測、故障檢測以及恢復操作的前提,可以幫助云計算平臺動態(tài)量化資源使用、檢測服務缺陷、發(fā)現(xiàn)用戶使用模式、輔助資源調度模塊決策,可以提高云計算平臺的服務質量。 BC-PDM (Big Cloud of Parallel Data Mining)是全球最大的電信運營企業(yè)的商務智能應用需求背景,旨在針對海量數(shù)據(jù)提供高效、準確、便捷的數(shù)據(jù)分析服務。本系統(tǒng)是基于Hadoop集群開發(fā)的,本論文主要介紹了Hadoop集群的故障監(jiān)控的研究與實現(xiàn)過程。 本文首先介紹了研究背景和研究現(xiàn)狀,然后針對項目本身的需求,給出總體功能設計和各模塊設計。本文使用Ganglia和Nagios這兩個開源監(jiān)控工具,通過對工具的深入調研,總結了其工作原理及優(yōu)勢、缺點等,將Ganglia和Nagios優(yōu)勢結合,同時優(yōu)化Ganglia的容錯機制,實現(xiàn)故障監(jiān)控和資源監(jiān)控的功能。Ganglia和Nagios的監(jiān)控數(shù)據(jù)在存儲方面都存在一些問題,系統(tǒng)通過持久化存儲工具將監(jiān)控數(shù)據(jù)轉存到Mysql數(shù)據(jù)庫中,進行監(jiān)控數(shù)據(jù)統(tǒng)一管理和分析,優(yōu)化監(jiān)控數(shù)據(jù)存儲問題。 本系統(tǒng)利用開源監(jiān)控工具Ganglia和Nagios,通過系統(tǒng)需求分析、系統(tǒng)關鍵點研究,最后完成了資源監(jiān)控和故障監(jiān)控功能。實現(xiàn)了對云平臺中的物理資源、虛擬資源、服務資源等的全面監(jiān)控和資源利用率的分析,并根據(jù)分析實現(xiàn)郵件、短信等多種方式的故障監(jiān)控,以達到資源監(jiān)控和故障監(jiān)控的目的,保證云平臺的正常運行。 最后應用以上的研究實現(xiàn)了一個云平臺監(jiān)控系統(tǒng),其運行效果表明本文的策略是有效可行的。
[Abstract]:With the popularization of Internet technology and the continuous improvement of information technology, the requirements of information technology in various fields of society are becoming higher and higher, and the number of data processed is also increasing. Cloud computing has been implemented from the concept to practical applications, the development has matured, has developed into personalized customization, scalable and scalable, service-oriented public or private cloud. Monitoring is an important part of cloud computing platform. It is a lot of cloud computing platform such as network analysis, system management, job scheduling, load balancing, event prediction. The premise of fault detection and recovery operation can help cloud computing platform to dynamically quantify resource usage, detect service defects, discover user usage patterns, and assist resource scheduling module decision-making. BC-PDM (Big Cloud of parallel data Mining) is the business intelligence application requirement background of the world's largest telecom operators, aiming at providing efficient, accurate and convenient data analysis services for mass data. This system is based on Hadoop cluster. This paper mainly introduces the research and implementation of Hadoop cluster fault monitoring. This paper first introduces the research background and research status, then according to the requirements of the project itself, gives the overall function design and each module design. This paper uses ganglia and Nagios, two open source monitoring tools, through the in-depth investigation of the tool, summarizes its working principle and advantages, shortcomings, etc., combines ganglia and Nagios advantages, and optimizes the fault-tolerant mechanism of ganglia. There are some problems in storing the monitoring data of Ganglia and Nagios, which can realize the functions of fault monitoring and resource monitoring. The system transfers the monitoring data to MySQL database through persistent storage tools, and manages and analyzes the monitoring data uniformly. Optimization of monitoring data storage problem. This system uses open source monitoring tools ganglia and Nagios, through system requirement analysis, system key points research, finally completed the resource monitoring and fault monitoring functions. It realizes the overall monitoring of physical resources, virtual resources and service resources in the cloud platform and the analysis of resource utilization. According to the analysis, it realizes the malfunction monitoring of mail, short message, etc. In order to achieve the purpose of resource monitoring and fault monitoring, ensure the normal operation of cloud platform. Finally, a cloud platform monitoring system is implemented by using the above research. The results show that the strategy is effective and feasible.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.09
【參考文獻】
相關期刊論文 前3條
1 劉家良;孫俊麗;姜利群;;一種面向云計算的QoS評價模型[J];電腦知識與技術;2010年31期
2 崔建群;吳黎兵;彭熙;肖德寶;施輝;;支持QoS屬性的Web服務PFS模型研究[J];計算機工程;2006年21期
3 劉進軍;陳桂林;胡成祥;;基于負載特征的虛擬機遷移調度策略[J];計算機工程;2011年17期
,本文編號:2085085
本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/2085085.html
最近更新
教材專著