基于機器學(xué)習(xí)的網(wǎng)絡(luò)流量識別方法與實現(xiàn)
發(fā)布時間:2018-04-21 22:32
本文選題:網(wǎng)絡(luò)流量識別 + 機器學(xué)習(xí); 參考:《山東大學(xué)》2014年碩士論文
【摘要】:隨著計算機網(wǎng)絡(luò)技術(shù)的飛速發(fā)展和信息時代的到來,網(wǎng)絡(luò)使用頻率的不斷增加造成了互聯(lián)網(wǎng)的數(shù)據(jù)流量爆發(fā)式增長;網(wǎng)絡(luò)新應(yīng)用的不斷出現(xiàn)造成了網(wǎng)絡(luò)通信協(xié)議使用更加靈活、混雜;網(wǎng)絡(luò)病毒、竊聽和惡意攻擊等行為不斷增多造成了網(wǎng)絡(luò)安全成為社會和政府部門關(guān)注的熱點。這些問題可以通過網(wǎng)絡(luò)流量識別得到很好的解決。因此,網(wǎng)絡(luò)流量識也越來越受到人們的重視。 已經(jīng)有許多不同的流量識別方法,但從研究和應(yīng)用角度人們越來越關(guān)注流量識別的可行性和有效性,即如何快速地處理海量的數(shù)據(jù)和如何正確地識別網(wǎng)絡(luò)中的各種應(yīng)用。面臨不斷變化的網(wǎng)絡(luò)環(huán)境,本論文主要研究基于機器學(xué)習(xí)(Machine Learning, ML)的網(wǎng)絡(luò)流量識別方法,重點采用了后向傳播(Back Propagation, BP)神經(jīng)網(wǎng)絡(luò)和支持向量機(Support Vector Machine, SVM)兩種監(jiān)督學(xué)習(xí)算法。 BP神經(jīng)網(wǎng)絡(luò)采用分布、并行的網(wǎng)狀結(jié)構(gòu)進行訓(xùn)練學(xué)習(xí),使其容錯性更高,處理速度更快;BP神經(jīng)網(wǎng)絡(luò)具有很好的非線性映射能力,可以模擬輸入與輸出的非線性關(guān)系;同時,BP神經(jīng)網(wǎng)絡(luò)是通過全局尋優(yōu)的方式進行訓(xùn)練的,因此BP網(wǎng)絡(luò)也具有很高的泛化能力。SVM則是針對小樣本的機器學(xué)習(xí)方法,并且通過內(nèi)積核函數(shù)將低維樣本空間非線性映射到高維空間,其具有比較完善的理論基礎(chǔ)。SVM采用“轉(zhuǎn)導(dǎo)推理”(Transductive Inference)方法可以很容易的解決非線性多分類問題。SVM的最優(yōu)分類超平面只由邊界上有限的支持向量構(gòu)成,使得SVM方法不僅簡單有效,而且具有很好的魯棒性。這兩種機器學(xué)習(xí)算法都能夠適應(yīng)網(wǎng)絡(luò)環(huán)境中的大數(shù)據(jù)和多樣性,都能夠快速有效的識別網(wǎng)絡(luò)流量的應(yīng)用類型。 本論文的流量識別系統(tǒng)是以家庭中的網(wǎng)絡(luò)流為識別對象,該系統(tǒng)從功能上分為家庭網(wǎng)關(guān)和后臺服務(wù)器兩部分。家庭網(wǎng)關(guān)實時抓取數(shù)據(jù)包、提取特征,并通過機器學(xué)習(xí)的方法進行流量識別,然后將識別結(jié)果傳送給后臺服務(wù)器;后臺服務(wù)器將識別結(jié)果存入數(shù)據(jù)庫,并顯示當(dāng)前網(wǎng)絡(luò)中流量的應(yīng)用類型,便于管理者進行監(jiān)管。論文研究的主要貢獻如下: 1、通過對網(wǎng)絡(luò)流量識別和機器學(xué)習(xí)的研究與分析,BP神經(jīng)網(wǎng)絡(luò)能夠適應(yīng)互聯(lián)網(wǎng)的大數(shù)據(jù)和多樣性特點,在此基礎(chǔ)上選擇了基于BP神經(jīng)網(wǎng)絡(luò)的流量識別方法。即選擇三層的BP神經(jīng)網(wǎng)絡(luò)作為實現(xiàn)方案,其分類能力滿足流量識別的要求并且結(jié)構(gòu)簡單易于實現(xiàn)。選擇S型函數(shù)作為BP神經(jīng)網(wǎng)絡(luò)隱含層的轉(zhuǎn)移函數(shù),實現(xiàn)對網(wǎng)絡(luò)流特征等輸入信息的非線性映射。雖然BP神經(jīng)網(wǎng)絡(luò)容易陷入誤差曲面的局部極小,但是通過粒子群算法(Particle Swarm Optimization, PSO)尋找具有全局最優(yōu)特性的初始化權(quán)值,保證BP神經(jīng)網(wǎng)絡(luò)訓(xùn)練時能夠進入誤差曲面的全局最小。實驗結(jié)果表明,經(jīng)過PSO算法優(yōu)化的BP神經(jīng)網(wǎng)絡(luò)能夠很快尋找到誤差曲面的全局最小值,并準(zhǔn)確識別流量的網(wǎng)絡(luò)應(yīng)用類型。 2、仔細(xì)研究SVM解決線性和非線性分類問題的原理,在此基礎(chǔ)上提出了基于SVM的流量識別方法,將SVM應(yīng)用于網(wǎng)絡(luò)流量識別領(lǐng)域。選擇徑向基函數(shù)作為SVM的核函數(shù),實現(xiàn)從低維的網(wǎng)絡(luò)流特征空間向更高維空間的非線性映射。并通過一對一方法(One-Against-One)構(gòu)造了SVM多值分類器,使SVM能夠識別多種網(wǎng)絡(luò)應(yīng)用類型。SVM在高維空間中生成最優(yōu)超平面,實現(xiàn)對空間的劃分和多種網(wǎng)絡(luò)應(yīng)用的分類,這是一種全局尋優(yōu)的方式因此SVM的識別方法具有很好的泛化能力。實驗結(jié)果表明,SVM非常適合解決網(wǎng)絡(luò)流量識別這種非線性多分類問題,而且所需訓(xùn)練樣本少,計算復(fù)雜度低,能夠進行實時識別。 3、在家庭局域網(wǎng)中設(shè)計和實現(xiàn)了流量識別系統(tǒng)。根據(jù)機器學(xué)習(xí)的系統(tǒng)模型和監(jiān)督學(xué)習(xí)的實現(xiàn)方法,設(shè)計了網(wǎng)絡(luò)流量識別的總體架構(gòu),將其分為實時在線流量識別和離線訓(xùn)練學(xué)習(xí)兩部分,具體過程包含抓取網(wǎng)絡(luò)流的數(shù)據(jù)包,生成網(wǎng)絡(luò)流的特征,選擇訓(xùn)練集和測試集,對機器學(xué)習(xí)算法進行訓(xùn)練,和測試兩種流量識別算法的分類效果。在系統(tǒng)實現(xiàn)方面,將BP神經(jīng)網(wǎng)絡(luò)和SVM的流量識別算法編寫為程序,并移植到家庭網(wǎng)關(guān)(家庭網(wǎng)關(guān)由路由器搭建)中。在后臺服務(wù)器的Linux平臺上搭建Web服務(wù)器和安裝MySQL數(shù)據(jù)庫,實現(xiàn)家庭網(wǎng)關(guān)與后臺服務(wù)器之間的交互通信、信息處理和存儲。管理員則可以通過Web瀏覽器登錄后臺服務(wù)器觀察當(dāng)前家庭網(wǎng)絡(luò)中流量識別結(jié)果。
[Abstract]:With the rapid development of computer network technology and the arrival of information age, the increasing frequency of network use has caused the explosive growth of the data flow of the Internet. The continuous emergence of new network applications caused the use of network communication protocols to be more flexible and mixed; network viruses, eavesdropping and malicious attacks have been increasing. Network security has become a hot spot of concern in the society and government departments. These problems can be solved well through network traffic identification. Therefore, the network traffic knowledge is also getting more and more attention.
There are many different traffic identification methods, but from the perspective of research and application, people pay more and more attention to the feasibility and effectiveness of traffic identification, that is, how to deal with massive data quickly and how to correctly identify various applications in the network. Facing the changing network environment, this paper mainly studies Machine L based on machine learning. Earning, ML) network traffic identification method, focusing on the backward propagation (Back Propagation, BP) neural network and support vector machine (Support Vector Machine, SVM) of the two supervised learning algorithms.
BP neural network adopts distributed and parallel network structure for training and learning, which makes it more fault-tolerant and faster processing; BP neural network has good nonlinear mapping ability and can simulate the nonlinear relationship between input and output. At the same time, BP neural network is trained through global optimization, so BP network also has The high generalization ability.SVM is a machine learning method for small sample, and maps the low dimensional sample space nonlinear to the high dimension space through the inner product kernel function, and it has a relatively perfect theoretical basis,.SVM can easily solve the nonlinear multi classification problem.SVM using the "Transductive Inference" method. The optimal classification hyperplane is only composed of finite support vectors on the boundary, which makes the SVM method not only simple and effective, but also has good robustness. These two machine learning algorithms can adapt to the large data and diversity in the network environment, and can quickly and effectively identify the application types of network flow.
The flow recognition system in this paper is based on the network flow in the family, which is divided into two parts: the home gateway and the backstage server. The home gateway takes the data packet in real time, extracts the features, and carries out the traffic identification through the machine learning method, and then transmits the recognition results to the backstage server; the background server is transferred to the background server. Storing the results in the database and displaying the application types of traffic in the current network is convenient for managers to supervise. The main contributions of the paper are as follows:
1, through the research and analysis of network traffic identification and machine learning, the BP neural network can adapt to the large data and diversity characteristics of the Internet. On this basis, we choose the flow recognition method based on the BP neural network. That is, the three layer BP neural network is selected as the implementation scheme, and its classification ability meets the requirements of traffic identification and the conclusion is concluded. The S type function is selected as the transfer function of the hidden layer of the BP neural network to realize the nonlinear mapping of the input information such as the network flow characteristics. Although the BP neural network is easy to fall into the local minimum of the error surface, the global optimal characteristic is found by the particle swarm optimization (Particle Swarm Optimization, PSO). The initial weight value ensures that the BP neural network is trained to enter the global minimum of the error surface. The experimental results show that the BP neural network optimized by the PSO algorithm can quickly find the global minimum value of the error surface and identify the network application type of the flow accurately.
2, the principle of SVM to solve linear and nonlinear classification problems is carefully studied. On this basis, a flow recognition method based on SVM is proposed, and SVM is applied to the field of network traffic identification. The radial basis function is selected as the kernel function of the SVM to realize the nonlinear mapping from the characteristic space of the low dimension network flow to the higher dimension space. Method (One-Against-One) constructs a SVM multi value classifier, which enables SVM to identify a variety of network application types.SVM to generate the optimal hyperplane in high dimensional space to realize the partition of space and the classification of various network applications. This is a global optimization method, so the SVM recognition method has a good generalization ability. The experimental results show that SVM is not. It is often suitable for solving the nonlinear multi class problem of network traffic identification. Moreover, it needs less training samples and low computational complexity, and can be used for real-time identification.
3, the flow recognition system is designed and implemented in the home LAN. According to the system model of machine learning and the realization method of supervised learning, the overall architecture of network traffic identification is designed, which is divided into two parts: real-time online traffic identification and off-line training learning. The specific process includes data packets grabbing network flow and generating network flow. Feature, select the training set and test set, train the machine learning algorithm, and test the classification effect of two traffic recognition algorithms. In the system realization, the BP neural network and the SVM traffic recognition algorithm are programmed and transplanted into the home gateway (the home gateway is built by the road device). On the Linux platform of the backstage server, it is built on the backstage server. Build Web server and install MySQL database to realize interactive communication between home gateway and backstage server, information processing and storage. Administrators can log in to backstage server through Web browser to observe current traffic identification results in home network.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.08;TP181
【參考文獻】
相關(guān)期刊論文 前10條
1 徐鵬;劉瓊;林森;;基于支持向量機的Internet流量分類研究[J];計算機研究與發(fā)展;2009年03期
2 陳亮;龔儉;徐選;;應(yīng)用層協(xié)議識別算法綜述[J];計算機科學(xué);2007年07期
3 彭蕓;劉瓊;;Internet流分類方法的比較研究[J];計算機科學(xué);2007年08期
4 顧亞祥;丁世飛;;支持向量機研究進展[J];計算機科學(xué);2011年02期
5 祁亨年;支持向量機及其應(yīng)用研究綜述[J];計算機工程;2004年10期
6 沈富可;常潘;任肖麗;;基于BP神經(jīng)網(wǎng)絡(luò)的P2P流量識別研究[J];計算機應(yīng)用;2007年S2期
7 徐鵬;林森;劉瓊;;基于決策樹的流量分類方法[J];計算機應(yīng)用研究;2008年08期
8 林森;徐鵬;劉瓊;;基于支持向量機的流量分類方法[J];計算機應(yīng)用研究;2008年08期
9 張學(xué)工;關(guān)于統(tǒng)計學(xué)習(xí)理論與支持向量機[J];自動化學(xué)報;2000年01期
10 梁偉;李晗;;網(wǎng)絡(luò)流量識別方法研究[J];通信技術(shù);2008年11期
,本文編號:1784379
本文鏈接:http://www.sikaile.net/guanlilunwen/ydhl/1784379.html
最近更新
教材專著