面向網(wǎng)絡(luò)入侵檢測(cè)的串匹配算法優(yōu)化
本文選題:網(wǎng)絡(luò)安全 + 入侵檢測(cè)系統(tǒng)。 參考:《哈爾濱工業(yè)大學(xué)》2014年博士論文
【摘要】:互聯(lián)網(wǎng)已經(jīng)成為人們?nèi)粘I钪胁豢苫蛉钡囊粋(gè)部分。伴隨而來的是日趨猖獗的網(wǎng)絡(luò)犯罪對(duì)網(wǎng)絡(luò)節(jié)點(diǎn)上主機(jī)的信息安全的威脅。出于不同的目的,網(wǎng)絡(luò)使用者可以利用一定的技術(shù)或者社會(huì)手段,非法進(jìn)入個(gè)人主機(jī)、盜取個(gè)人賬戶密碼甚至是對(duì)網(wǎng)站進(jìn)行協(xié)作攻擊。建立一個(gè)堅(jiān)固的預(yù)防、阻斷網(wǎng)絡(luò)犯罪的安全體系至關(guān)重要。入侵檢測(cè)系統(tǒng)就是網(wǎng)絡(luò)安全體系中不可或缺的重要組組成部分,其主要處理對(duì)象就是來自網(wǎng)絡(luò)的數(shù)據(jù)包。入侵檢測(cè)系統(tǒng)會(huì)對(duì)網(wǎng)絡(luò)數(shù)據(jù)包和已知的數(shù)據(jù)庫進(jìn)行比對(duì),根據(jù)已掌握的數(shù)據(jù)將數(shù)據(jù)包中的信息歸類為入侵行為或者安全行為。針對(duì)入侵檢測(cè)系統(tǒng)監(jiān)測(cè)到的入侵行為,還可以將這些入侵告警進(jìn)行分析進(jìn)一步挖掘告警之間的邏輯聯(lián)系,揭示隱含的攻擊過程用于定位攻擊源或者為后續(xù)檢測(cè)提供知識(shí)儲(chǔ)備。為了確保能有效地提取網(wǎng)絡(luò)數(shù)據(jù)包中的信息需要高效的串匹配算法提供技術(shù)支持。然而隨著網(wǎng)絡(luò)技術(shù)的發(fā)展,各種新的攻擊類型也層出不窮,這也使得串匹配自動(dòng)機(jī)需要維護(hù)的特征集的規(guī)模越來越大,直接降低了入侵檢測(cè)系統(tǒng)的性能。 基于這樣的背景,本文的研究目標(biāo)是構(gòu)造支持大規(guī)模特征集合的串匹配算法,并且能夠找到對(duì)現(xiàn)有串匹配算法進(jìn)行速度提升的方法。具體地,本文著重從以下幾個(gè)方面、不同層次對(duì)串匹配的優(yōu)化方法進(jìn)行了深入的研究: (1)特征集規(guī)模過大導(dǎo)致匹配自動(dòng)機(jī)無法正常構(gòu)建限制了入侵檢測(cè)系統(tǒng)能夠識(shí)別的有害信息的數(shù)量。本文特別針對(duì)大特征集文本匹配問題,提出一種混合匹配自動(dòng)機(jī)模型。該模型旨在通過長度對(duì)特征進(jìn)行分類,構(gòu)造更加緊湊的匹配自動(dòng)機(jī)。通過將具有不同長度范圍的特征分布到不同的自動(dòng)機(jī)中,可以采用不同的方式對(duì)文本進(jìn)行匹配。本文同時(shí)依據(jù)混合匹配自動(dòng)機(jī)的結(jié)構(gòu)特點(diǎn)設(shè)計(jì)了兩種獨(dú)立的文本過濾策略,即逆向狀態(tài)映射狀態(tài)轉(zhuǎn)移方法(SLSPM算法)和二級(jí)AVL過濾策略(BCHDFA算法)。兩種技術(shù)分別利用“將‘自動(dòng)機(jī)狀態(tài)-讀入文本信息’的映射關(guān)系進(jìn)行翻轉(zhuǎn)”和“借助AVL樹對(duì)特征段中特定位置上的雙字符文本進(jìn)行兩次過濾”達(dá)到減少將文本塊與特征塊進(jìn)行驗(yàn)證的次數(shù)的目的,從而抑制混合自動(dòng)機(jī)匹配速度降低的趨勢(shì)。 (2)針對(duì)URL等長特征,本文結(jié)合SSE指令提出了兩種專門面向16字節(jié)數(shù)據(jù)的匹配算法SSEHash和RTRIE,這兩種算法旨在提高長特征的匹配速度。SSEHash算法中,通過2個(gè)指令周期共16個(gè)加法運(yùn)算和一個(gè)24比特模運(yùn)算能夠?qū)?6字節(jié)文本比較均勻地分布到24比特的數(shù)據(jù)中。由于采用SSE指令進(jìn)行計(jì)算,使得SSEHash算法和常規(guī)意義的散列算法相比需要更短的處理時(shí)間,能夠更快地過濾文本信息。為了解決大規(guī)模特征集環(huán)境下Wu-Manber類算法的移動(dòng)距離較短的問題,本文結(jié)合SSE指令提出了一種反轉(zhuǎn)自動(dòng)機(jī)RTRIE。該算法提取每條長特征(長度至少16)的16字節(jié)前綴中的所有后綴子串的逆向串的指紋,并為每個(gè)指紋計(jì)算其安全移動(dòng)距離。該算法和通常的Wu-Manber類算法相比在計(jì)算文本指針移動(dòng)距離時(shí)考慮的文本字符數(shù)目更多,這樣可以盡量避免由于特征規(guī)模增加而導(dǎo)致的指針移動(dòng)距離的降低。 (3)為了提高表達(dá)式匹配性能消除無用表達(dá)式對(duì)內(nèi)存的占用,本文分析了表達(dá)式的各種包含關(guān)系并提出了BitCount算法的優(yōu)化算法MaskVeri以及表達(dá)式冗余消除算法KPGEM。為了不遺漏任何可能的表達(dá)式的包含關(guān)系,本文借助表達(dá)式中出現(xiàn)的各個(gè)短關(guān)鍵字的位置以及關(guān)鍵字的長度等信息限定了各關(guān)鍵字的前驅(qū)后繼關(guān)系。借助這種關(guān)系可以定義針對(duì)每條表達(dá)式的關(guān)鍵字路徑圖。通過對(duì)關(guān)鍵字路徑圖的遍歷KPGEM算法可以分析出當(dāng)前表達(dá)式中隱含的其它表達(dá)式,并正確地刪除表達(dá)式中冗余的關(guān)鍵字或者包含其它表達(dá)式的表達(dá)式。 (4)基于串匹配算法的數(shù)據(jù)包檢測(cè)部件是入侵檢測(cè)系統(tǒng)中的重要功能部件。在此基礎(chǔ)上才能對(duì)通過串匹配模塊獲得的網(wǎng)絡(luò)告警日志進(jìn)行告警關(guān)聯(lián)分析。本文在最后一部分實(shí)現(xiàn)了一個(gè)入侵告警事件關(guān)聯(lián)系統(tǒng)。該系統(tǒng)通過對(duì)告警中的重復(fù)結(jié)構(gòu)進(jìn)行保持語義的序列最小化處理,使得串匹配算法捕獲的告警得到大幅精簡(jiǎn)。系統(tǒng)中采用D-S證據(jù)理論對(duì)宏觀分析方法(序列挖掘)和微觀分析方法(權(quán)能提升推理)二者處理的結(jié)果進(jìn)行融合,,并通過信任能量計(jì)算公式對(duì)經(jīng)過D-S融合后的告警序列進(jìn)行再處理,從獲得的頻繁序列中得到更能體現(xiàn)前后關(guān)系的攻擊告警序列。該系統(tǒng)能夠獲得有效的攻擊場(chǎng)景,并達(dá)到自動(dòng)提取關(guān)鍵攻擊序列的目的。
[Abstract]:The Internet has become an indispensable part of people's daily life. It is accompanied by the threat that the increasingly rampant network crime threatens the information security of the host on the network nodes. For different purposes, users can use certain technical or social means to enter the personal host by non law and steal personal account passwords. The intrusion detection system is an important component part of the network security system, and its main object is the data packet from the network. The intrusion detection system will carry on the network data packet and the known data. According to the data that has been mastered, the data is classified as intrusion or security behavior according to the data already mastered. In view of the intrusion behavior monitored by the intrusion detection system, these Intrusion Alerts can be analyzed to further excavate the logical connection between the alarm and reveal the hidden attack process used to locate the source of the attack or to be after the attack. Continuous detection provides knowledge reserves. In order to ensure effective extraction of information in network packets, efficient string matching algorithms are required. However, with the development of network technology, various new types of attack are emerging, which makes the scale of the feature set that the string matching automata need to maintain is becoming more and more large and is directly reduced. The performance of the intrusion detection system.
Based on this background, the aim of this paper is to construct a series matching algorithm that supports a large set of features, and to find a way to speed up the current string matching algorithm.
(1) the size of the feature set is too large to limit the number of harmful information that the intrusion detection system can recognize. In this paper, a hybrid matching automaton model is proposed for the text matching problem of the great special collection. This model is designed to classify the characteristics by the length and construct a more compact matching automatically. By distributing features of different length range to different automata, the text can be matched in different ways. In this paper, two independent text filtering strategies are designed based on the structure characteristics of mixed matching automata, that is, reverse state mapping state transfer (SLSPM algorithm) and two level AVL filtering strategy. (BCHDFA algorithm). The two techniques use "the mapping relationship of the automata state reading text information" and "the two filtering with the help of the double character text on the specific location in the feature segment" to reduce the number of verifying the text block and the feature block by the AVL tree, thus restraining the mixed automata. The trend of decreasing speed.
(2) in view of the URL isometric feature, this paper presents two matching algorithms, SSEHash and RTRIE, which are specialized for 16 byte data in combination with the SSE instruction. The two algorithms are designed to improve the long feature matching speed.SSEHash algorithm with a total of 16 addition operations and a 24 specific mode operation in the 2 instruction cycles to distribute the 16 byte text more evenly. To 24 bits of data, because of the use of SSE instruction, the SSEHash algorithm needs shorter processing time compared with the conventional hash algorithm and can filter text information faster. In order to solve the problem of the shorter moving distance of the Wu-Manber class algorithm in the large model collection environment, this paper presents a kind of SSE instruction. This algorithm extracts the fingerprint of the reverse string of all the suffixes in the 16 byte prefix of each long feature (at least 16) and calculates its safe moving distance for each fingerprint. This algorithm and the usual Wu-Manber class algorithm consider the number of text characters in the calculation of the moving distance of the text pointer, so that the number of text characters is more. It is possible to avoid the reduction of pointer movement distance due to the increase of feature size.
(3) in order to improve the performance of expression matching to eliminate the use of useless expressions for memory, this paper analyzes the various inclusion relations of the expression and proposes an optimization algorithm MaskVeri of BitCount algorithm and the expression redundancy elimination algorithm KPGEM. in order to avoid missing any possible expression. The position of each short keyword and the length of the keyword limit the forward and subsequent relations of each keyword. By this relationship, the keyword path graph for each expression can be defined. By traversing the keyword path graph, the KPGEM algorithm can be used to analyze the other expressions implying in the current expression and delete it correctly. A redundant keyword in an expression or an expression containing other expressions.
(4) the packet detection unit based on string matching algorithm is an important functional component in the intrusion detection system. On this basis, the network alarm log obtained through the string matching module can be analyzed. In the last part, an intrusion alarm event closing system is implemented. The system is repeated in the alarm. The structure carries out the sequential minimization of the semantics, making the alarm captured by the string matching algorithm greatly streamlined. In the system, the D-S evidence theory is used to fuse the results of the macro analysis method (sequence mining) and the microanalysis method (the weight lifting reasoning), and the D-S fusion formula is used to fuse the D-S through the trust energy calculation formula. The alarm sequence is reprocessed to get the attack alarm sequence which is more able to reflect the front and back relations from the frequent sequences obtained. The system can obtain the effective attack scene and automatically extract the key attack sequence.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.08
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 徐從富,耿衛(wèi)東,潘云鶴;面向數(shù)據(jù)融合的DS方法綜述[J];電子學(xué)報(bào);2001年03期
2 宋麒;羅志宇;叢鵬;;SSE指令集在~(60)Co集裝箱CT系統(tǒng)圖像重建算法中的應(yīng)用[J];核電子學(xué)與探測(cè)技術(shù);2007年01期
3 李小紅;基于SSE技術(shù)的H.264運(yùn)動(dòng)估計(jì)的并行處理[J];合肥學(xué)院學(xué)報(bào)(自然科學(xué)版);2005年03期
4 李成軍;周衛(wèi)峰;朱重光;;基于Intel SIMD指令的二維FFT優(yōu)化算法[J];計(jì)算機(jī)工程與應(yīng)用;2007年05期
5 張慶丹;戴正華;馮圣中;孫凝暉;;基于GPU的串匹配算法研究[J];計(jì)算機(jī)應(yīng)用;2006年07期
6 唐定車;劉任任;譚建龍;;基于統(tǒng)一計(jì)算設(shè)備架構(gòu)的并行串匹配算法[J];計(jì)算機(jī)應(yīng)用;2009年S1期
7 羅若愚,魯強(qiáng),曾紹群;在醫(yī)學(xué)圖像處理中使用MMX及SSE指令[J];計(jì)算機(jī)應(yīng)用研究;2005年01期
8 曹京;譚建龍;劉萍;郭莉;;布爾表達(dá)式匹配問題研究[J];計(jì)算機(jī)應(yīng)用研究;2007年09期
9 何文華;;基于海量數(shù)據(jù)的多模式匹配算法研究[J];計(jì)算機(jī)應(yīng)用與軟件;2012年04期
10 宋云;龍際珍;;規(guī)則數(shù)量無關(guān)的多布爾表達(dá)式匹配算法[J];軟件導(dǎo)刊;2012年03期
本文編號(hào):1796080
本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/1796080.html