復(fù)雜計算機(jī)系統(tǒng)可用性評測技術(shù)研究
發(fā)布時間:2018-02-26 11:24
本文關(guān)鍵詞: 容錯計算 可用性 相關(guān)性分析 故障分布 Copula函數(shù) 可用性評測 出處:《哈爾濱工業(yè)大學(xué)》2013年博士論文 論文類型:學(xué)位論文
【摘要】:應(yīng)用于金融業(yè)、電信業(yè)、能源、交通、航空等關(guān)系到國家經(jīng)濟(jì)安全和社會安全的關(guān)鍵行業(yè)中的復(fù)雜計算機(jī)系統(tǒng)不僅要求具有極強(qiáng)的事務(wù)處理能力,還要求具有極高的可用性,能夠提供高速、穩(wěn)定的信息處理服務(wù)。這類系統(tǒng)一旦發(fā)生延誤和失效將會造成不可估量的經(jīng)濟(jì)損失,還有可能產(chǎn)生負(fù)面的社會影響。開展面向此類復(fù)雜計算機(jī)系統(tǒng)的可用性測試研究,將有助于提高其可用性,對于保障國民經(jīng)濟(jì)平穩(wěn)運(yùn)行具有重要意義。 以往的一些研究曾提出計算機(jī)系統(tǒng)的硬件部件之間以及軟件故障之間存在著某種模式的相關(guān)性,并且相關(guān)性的存在會影響系統(tǒng)的可用性,但是大多是從理論角度出發(fā),并未給出實(shí)際系統(tǒng)中相關(guān)性存在的直接證據(jù),這使得圍繞相關(guān)性的討論往往缺乏實(shí)際系統(tǒng)的支撐而說服力不足。為此論文針對某銀行計算機(jī)系統(tǒng)故障記錄和高端服務(wù)器運(yùn)行日志進(jìn)行分析,指出了系統(tǒng)級部件和元件級部件之間可能存在相關(guān)性的證據(jù)。為了更好的建模系統(tǒng)可用性,論文對銀行計算機(jī)系統(tǒng)故障記錄和LANL故障數(shù)據(jù)集進(jìn)行了故障分布對比分析,發(fā)現(xiàn)基于對稱多處理機(jī)架構(gòu)的計算系統(tǒng)發(fā)生硬件故障的故障間隔時間分布屬于Weibull族。 為了達(dá)到高可用性要求,應(yīng)用于關(guān)鍵行業(yè)中的復(fù)雜計算機(jī)系統(tǒng)往往采用k-out-of-n系統(tǒng)架構(gòu),論文重點(diǎn)討論了考慮相關(guān)性因素的均分負(fù)載k-out-of-n系統(tǒng)的建模問題。首先利用隨機(jī)過程理論為均分負(fù)載型的k-out-of-n系統(tǒng)建立系統(tǒng)模型,指出了該類系統(tǒng)自第i-1次部件失效起到發(fā)生第i次部件失效的時間的分布函數(shù)服從一個兩參數(shù)威布爾分布,且系統(tǒng)在不同狀態(tài)的停留時間存在相關(guān)性。本文引入copula理論,,提出了利用Gumbel Copula函數(shù)來捕捉系統(tǒng)在不同狀態(tài)的停留時間之間右尾相關(guān)性的變化,給出了指定失效序列的k-out-of-n系統(tǒng)的部件相關(guān)系數(shù)矩陣計算算法。分析結(jié)果表明,使用考慮相關(guān)性的均分負(fù)載k-out-of-n系統(tǒng)模型,比不考慮相關(guān)性的模型更貼近系統(tǒng)實(shí)際運(yùn)行情況。 為了直觀的描述系統(tǒng)部件相關(guān)性問題,論文介紹了一種基于可靠性框圖發(fā)展而來的系統(tǒng)描述模型DRBD(Dynamic Reliability Block Diagram)。論文介紹了DRBD的優(yōu)點(diǎn),并應(yīng)用DRBD模型的思想描述了串聯(lián)可靠性模型、共因/共模故障模型、冗余模型、RAID磁盤陣列模型等多種常見的系統(tǒng)構(gòu)架方式,提出了基于DRBD模型評測系統(tǒng)可用性的方法,并針對上述多種系統(tǒng)部件連接方式分析了向廣義隨機(jī)Petri網(wǎng)(Generalized Stochastic Petri Net,GSPN)轉(zhuǎn)化并求解可用性的方法。 傳統(tǒng)的可用性測試方法,利用長時間在線運(yùn)行多臺、同配置的目標(biāo)系統(tǒng)進(jìn)行在線測試。但是應(yīng)用于關(guān)鍵行業(yè)中的復(fù)雜計算機(jī)系統(tǒng)可用性較高,這導(dǎo)致在線跟蹤測試將耗費(fèi)相當(dāng)長的時間才能得到準(zhǔn)確的結(jié)果。論文針對這一問題提出了一種基于MTBF(mean time between failure)閾值的k-out-of-n系統(tǒng)可用性測試方法,將系統(tǒng)級可用性測試轉(zhuǎn)化為面向冗余部件的可用性測試。本文面向事務(wù)處理型容錯計算機(jī)系統(tǒng)設(shè)計并實(shí)現(xiàn)了一個可用性評測系統(tǒng),由故障注入平臺,可用性評測套件以及可用性評測系統(tǒng)數(shù)據(jù)庫組成。利用HP Superdome服務(wù)器仿照銀行業(yè)務(wù)系統(tǒng)搭建了一個模擬的雙模應(yīng)用系統(tǒng)環(huán)境,在線測試表明,評測結(jié)果與官方公布的結(jié)果處于同一數(shù)量級,本文提出的可用性測試系統(tǒng)可以在較短時間內(nèi)測試目標(biāo)系統(tǒng)是否達(dá)到要求的可用性級別。
[Abstract]:Used in finance, telecommunications, energy, transportation, aviation and other related to complex computer systems in key industries of national economic security and social security in the transaction requires not only have strong ability, also has high availability, can provide high-speed, stable information processing service. Once this kind of system delays and failure will cause incalculable economic losses, and may have a negative social impact. Research on usability test for such complex computer systems, will help to improve its usability, is of great significance to ensure the stable operation of the national economy.
Some previous studies have suggested a correlation exists between the model between the hardware components of a computer system and software faults, and the correlation will affect the availability of the system, but mostly from a theoretical perspective, direct evidence does not give the actual correlation existing in the system, which makes the discussion on the relationship of the lack of actual system support and convincing enough. This thesis focuses on a bank computer system fault records and high-end server log analysis, there may be a correlation between the evidence points to a system level components and component level components. In order to better usability modeling system, the fault data recording and LANL bank computer system fault fault sets are analyzed the distribution of contrast, found fault fault calculation system based on the hardware architecture of the symmetric multiprocessor The spaced time distribution belongs to the Weibull family.
In order to achieve high availability requirements of complex computer systems used in key industries often use k-out-of-n system architecture, this paper focused on the modeling problem considering the relationship between average load k-out-of-n system. Establish the system model of k-out-of-n system is the first to use the theory of stochastic process for load sharing type, points out the system from the I-1 component the failure distribution function of I occurred in the first time the component failure time obeys a two parameter Weibull distribution, and correlation between residence time in different states. This paper introduces the copula theory, put forward using Gumbel Copula function to capture changes in the system between the residence time of different states of the right tail correlation, calculation algorithm specified the components of the k-out-of-n system failure sequence correlation coefficient matrix is given. The analysis results show that the use of correlation are considered The distributed k-out-of-n system model is more close to the actual operation of the system than the model that does not consider the correlation.
In order to describe the correlation between system components directly, this paper introduces a kind of system and the reliability block diagram based on the development of the model DRBD (Dynamic Reliability Block Diagram). This paper introduces the advantages of DRBD, and the application of the idea of the DRBD model describes the reliability of series model, common mode / cause fault model, redundant model, system framework common RAID disk array model etc, and puts forward the method of usability evaluation system based on the DRBD model, and according to the various system components connection analysis to generalized stochastic Petri nets (Generalized Stochastic Petri Net, GSPN) transformation method and solving the availability.
The availability of the traditional test methods, the use of long time operation of multiple online, online test with target system. High availability of sophisticated computer systems but used in key industries, which leads to online tracking test will take a long time to get the results surely. Aiming at the problem put forward based on MTBF (mean time between failure) k-out-of-n system usability testing method of the threshold, the system level of usability testing into a usability test for redundant components. This type of transaction oriented fault-tolerant computer system design and implementation of a usability evaluation system, by fault injection platform, usability evaluation kit and usability evaluation system database. Modeled on the banking system to build a simulation mode application system environment using the HP Superdome server, online test It shows that the evaluation results are in the same order of magnitude with the official announcement results. The usability test system proposed in this paper can test whether the target system meets the required availability level in a relatively short time.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2013
【分類號】:TP306
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 白保中;宋逢明;朱世武;;Copula函數(shù)度量我國商業(yè)銀行資產(chǎn)組合信用風(fēng)險的實(shí)證研究[J];金融研究;2009年04期
2 趙則章;江建慧;;操作系統(tǒng)健壯性測試方法研究[J];計算機(jī)工程與應(yīng)用;2007年07期
3 彭俊杰;黃慶成;洪炳熔;李瑞;袁成軍;;一種用于星載系統(tǒng)可靠性評測的軟件故障注入工具[J];宇航學(xué)報;2005年06期
相關(guān)博士學(xué)位論文 前2條
1 李秀敏;極值統(tǒng)計模型族的參數(shù)估計及其應(yīng)用研究[D];天津大學(xué);2007年
2 吳娟;Copula理論與相關(guān)性分析[D];華中科技大學(xué);2009年
本文編號:1537812
本文鏈接:http://www.sikaile.net/kejilunwen/jisuanjikexuelunwen/1537812.html
最近更新
教材專著