服務器集群故障預警技術(shù)的研究與實現(xiàn)
發(fā)布時間:2019-01-01 17:02
【摘要】:隨著互聯(lián)網(wǎng)規(guī)模的發(fā)展,服務器集群不斷平滑擴展,但眾多的服務器組件數(shù)量導致故障的概率大大增加,從而對服務器集群的網(wǎng)絡管理和可用性提出了嚴峻的挑戰(zhàn)。對服務器集群進行實時狀態(tài)監(jiān)控,同時爭取在故障發(fā)生前進行相應預警是用戶迫切需要的,也是綜合網(wǎng)絡管理系統(tǒng)中一個極其重要的組成部分。 本文在分析了服務器集群預警現(xiàn)狀并研究了現(xiàn)有故障監(jiān)測模型和技術(shù)的基礎(chǔ)上,結(jié)合IPMI規(guī)范,設(shè)計并實現(xiàn)了一個用于Linux服務器集群的故障預警系統(tǒng)。首先,設(shè)計了服務器設(shè)備狀態(tài)監(jiān)測模型,實現(xiàn)了對設(shè)備硬件、系統(tǒng)資源、系統(tǒng)服務以及應用服務等信息的監(jiān)測;其次,設(shè)計并實現(xiàn)了采用SNMP和AgentX協(xié)議的管理端和代理端通信機制;最后,設(shè)計并實現(xiàn)了集群故障預警模型,對其中基于預警關(guān)聯(lián)關(guān)系的預警過濾模型、故障預警判定模型、故障預警通知模型和設(shè)備資源管理模型分別進行了詳細的設(shè)計和實現(xiàn)。本系統(tǒng)不僅對系統(tǒng)軟件資源信息進行故障預警監(jiān)測,而且將服務器硬件資源信息納入預警監(jiān)測體系。經(jīng)測試表明,本文實現(xiàn)的服務器集群故障預警系統(tǒng)能夠滿足綜合網(wǎng)絡管理系統(tǒng)對于服務器集群的故障預警需求,,很好地實現(xiàn)了對服務器集群的實時監(jiān)控和故障預警。
[Abstract]:With the development of the Internet, the server cluster is expanding smoothly, but the probability of failure is greatly increased due to the large number of server components, which poses a severe challenge to the network management and availability of the server cluster. It is urgent for users to monitor the real-time status of the server cluster and to make corresponding warning before the failure. It is also an extremely important part of the integrated network management system. On the basis of analyzing the current situation of server cluster warning and studying the existing fault monitoring model and technology, this paper designs and implements a fault early warning system for Linux server cluster based on IPMI specification. Firstly, the monitoring model of server equipment status is designed to monitor the equipment hardware, system resources, system services and application services. Secondly, the communication mechanism between management and agent is designed and implemented using SNMP and AgentX protocol. Finally, the cluster fault early warning model is designed and implemented, including the early warning filtering model based on early warning correlation, the fault early warning decision model, the fault warning notification model and the equipment resource management model, respectively. This system not only carries on the fault early warning monitoring to the system software resource information, but also brings the server hardware resource information into the early warning monitoring system. The test results show that the server cluster fault warning system realized in this paper can meet the needs of the integrated network management system for the server cluster fault early warning, and the real-time monitoring and fault early warning of the server cluster are well realized.
【學位授予單位】:西安電子科技大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.07
本文編號:2397860
[Abstract]:With the development of the Internet, the server cluster is expanding smoothly, but the probability of failure is greatly increased due to the large number of server components, which poses a severe challenge to the network management and availability of the server cluster. It is urgent for users to monitor the real-time status of the server cluster and to make corresponding warning before the failure. It is also an extremely important part of the integrated network management system. On the basis of analyzing the current situation of server cluster warning and studying the existing fault monitoring model and technology, this paper designs and implements a fault early warning system for Linux server cluster based on IPMI specification. Firstly, the monitoring model of server equipment status is designed to monitor the equipment hardware, system resources, system services and application services. Secondly, the communication mechanism between management and agent is designed and implemented using SNMP and AgentX protocol. Finally, the cluster fault early warning model is designed and implemented, including the early warning filtering model based on early warning correlation, the fault early warning decision model, the fault warning notification model and the equipment resource management model, respectively. This system not only carries on the fault early warning monitoring to the system software resource information, but also brings the server hardware resource information into the early warning monitoring system. The test results show that the server cluster fault warning system realized in this paper can meet the needs of the integrated network management system for the server cluster fault early warning, and the real-time monitoring and fault early warning of the server cluster are well realized.
【學位授予單位】:西安電子科技大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.07
【參考文獻】
相關(guān)期刊論文 前10條
1 李源;;計算機網(wǎng)絡故障管理智能化的研究[J];才智;2008年11期
2 黃曉明;;計算機網(wǎng)絡故障管理技術(shù)的智能化研究[J];電腦知識與技術(shù);2008年25期
3 馮瞻;吳新軍;;服務器的遠程管理的發(fā)展及其應用前景[J];電腦知識與技術(shù);2011年30期
4 李波;劉軍萬;;UNIX服務器集中監(jiān)控的設(shè)計與實現(xiàn)[J];電腦編程技巧與維護;2013年02期
5 楊洪濤;王繼龍;;網(wǎng)絡事件管理系統(tǒng)中關(guān)聯(lián)技術(shù)的選擇及實現(xiàn)[J];計算機工程;2006年04期
6 彭熙,李艷,肖德寶;網(wǎng)絡故障管理中幾種事件關(guān)聯(lián)技術(shù)的分析與比較[J];計算機應用研究;2003年09期
7 易曼,郭成城,晏蒲柳;Linux下網(wǎng)絡故障定位與診斷的實現(xiàn)技術(shù)[J];計算機應用研究;2003年11期
8 吳鐘琴;潘蔭榮;胡幼華;;小規(guī)模機群的遠程自動監(jiān)控系統(tǒng)[J];計算機應用與軟件;2009年01期
9 婁山林;;淺談IPMI標準[J];科技浪潮;2007年Z1期
10 黃明輝;;基于SNMP的網(wǎng)絡故障管理系統(tǒng)的設(shè)計與實現(xiàn)[J];遼寧大學學報(自然科學版);2012年03期
本文編號:2397860
本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/2397860.html
最近更新
教材專著