基于數(shù)據(jù)挖掘和機器學習的木馬檢測系統(tǒng)設計與實現(xiàn)

發(fā)布時間：2018-03-16 12:02

本文選題：網(wǎng)頁木馬　切入點：JavaScript　出處：《電子科技大學》2014年碩士論文　論文類型：學位論文

【摘要】：計算機網(wǎng)絡正在改變著人們的生活方式,但由于網(wǎng)絡存在開放性、互聯(lián)性等特征,致使網(wǎng)絡容易導致不法分子的攻擊,這使得網(wǎng)絡安全吸引著越來越多人的關注。其中,網(wǎng)頁木馬已經(jīng)稱為網(wǎng)絡安全的頭號殺手,病毒傳播、非法入侵、服務器癱瘓等安全問題都是以木馬為載體所引起的。傳統(tǒng)的基于模式匹配的檢測方法是當前安全檢測系統(tǒng)使用最多的方法,它主要依賴于人工分析提取,不能夠預測未知的惡意代碼,對于混淆或變形的惡意代碼卻無能為力。數(shù)據(jù)挖掘和機器學習是當前計算機熱門研究領域,結合這兩種技術對網(wǎng)頁木馬進行檢測是未來的研究發(fā)展趨勢。本文正是基于以上問題,在深入分析了數(shù)據(jù)挖掘和機器學習的原理基礎上,設計并實現(xiàn)了針對惡意JavaScript腳本的網(wǎng)頁木馬檢測系統(tǒng)。論文的主要工作內(nèi)容包括:1.首先,介紹了數(shù)據(jù)挖掘和機器學習技術的主要原理和理論知識;然后概括了目前國內(nèi)外已經(jīng)出現(xiàn)的網(wǎng)頁木馬的主流檢測算法,并分析了各算法具有的優(yōu)缺點。2.按照軟件工程的原理與思想,分析木馬檢測系統(tǒng)的主要功能需求、總體框架、工作流程等。最后,采用VC++6.0 MFC、mysql等工具與技術設計并實現(xiàn)了網(wǎng)頁木馬檢測的原型系統(tǒng)。該系統(tǒng)主要包括了URL黑名單、網(wǎng)絡爬蟲、特征提取、BP集成神經(jīng)網(wǎng)絡分類器等功能子模塊。3.目前,大部分網(wǎng)頁木馬都會在頁面中嵌入惡意JavaScript腳本代碼。因此本文重點針對基于惡意JavaScript腳本的網(wǎng)頁木馬進行檢測研究。為逃避防病毒軟件的檢測,惡意的JS代碼往往經(jīng)過混淆或變形,常規(guī)的特征匹配檢測技術對混淆網(wǎng)頁木馬檢測基本無效。本文利用Google V8 JavaScript腳本引擎編譯惡意JS腳本生成機器碼,從機器指令中提取出操作碼后再進行基于字N-gram的出現(xiàn)頻率統(tǒng)計,以出現(xiàn)最為頻繁的200個gram作為區(qū)別正常腳本和惡意腳本的網(wǎng)頁木馬特征。4.本文使用網(wǎng)絡爬蟲等工具從互聯(lián)網(wǎng)上收集100個正常JS腳本和100個惡意JS腳本作為網(wǎng)頁木馬樣本集合。然后利用這200個樣本數(shù)據(jù)集合進行BP神經(jīng)網(wǎng)絡集成分類器模型的訓練,使用4-重交叉驗證方法分析了該檢測方法的準確率和正確率,當分類器達到一定的準確度之后將訓練得到的分類器模型應用到網(wǎng)頁木馬檢測系統(tǒng)。最后,還對系統(tǒng)的功能性和健壯性進行了測試。
[Abstract]:The computer network is changing people's way of life, but because the network has the characteristics of openness, interconnection and so on, the network is easy to lead to the attack of lawless elements, which makes the network security attract more and more people's attention. Web Trojan has been known as the number one killer of network security, virus spread, illegal intrusion, Security problems such as server paralysis are caused by Trojan horse. Traditional detection method based on pattern matching is the most used method in current security detection system, which mainly relies on manual analysis and extraction. Can not predict unknown malicious code, but can not be confused or distorted malicious code. Data mining and machine learning is a hot area of computer research. It is the trend of future research and development to combine these two technologies to detect web Trojan horse. Based on the above problems, this paper deeply analyzes the principles of data mining and machine learning. A web Trojan detection system for malicious JavaScript script is designed and implemented. The main work of this paper includes: 1. Firstly, the main principles and theoretical knowledge of data mining and machine learning technology are introduced. Then it summarizes the main detection algorithms of the web Trojan that have appeared at home and abroad, and analyzes the advantages and disadvantages of the algorithms. 2. According to the principle and thought of software engineering, the paper analyzes the main functional requirements and the overall framework of the Trojan detection system. Finally, the prototype system of web Trojan detection is designed and implemented by using VC 6.0 MFCU MySQL and other tools and techniques. The system mainly includes URL blacklist, web crawler, web crawler, etc. Feature extraction BP integrated neural network classifier and other functional submodules. 3. At present, Most web Trojan horses will embed malicious JavaScript script code in the page. Therefore, this paper focuses on the detection of web Trojan based on malicious JavaScript scripts. The malicious JS code is often confused or deformed, and the conventional feature matching detection technique is not effective for the detection of the obfuscation page Trojan horse. This paper uses Google V8 JavaScript script engine to compile the malicious JS script to generate machine code. After extracting the operation code from the machine instruction, the occurrence frequency statistics based on the word N-gram are carried out. This paper uses web crawler and other tools to collect 100 normal JS scripts and 100 malicious JS scripts from the Internet. Then the 200 sample data sets are used to train the BP neural network ensemble classifier model. The accuracy and accuracy of the method are analyzed by using 4- re-cross verification method. After the classifier reaches a certain accuracy, the trained classifier model is applied to the web Trojan detection system. The functionality and robustness of the system are also tested.
【學位授予單位】：電子科技大學
【學位級別】：碩士
【學位授予年份】：2014
【分類號】：TP393.08
，

本文編號：1619826

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/guanlilunwen/ydhl/1619826.html

上一篇：僵尸網(wǎng)絡分析實驗設計
下一篇：基于社會計算的IM惡意代碼防御機制

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于數(shù)據(jù)挖掘和機器學習的木馬檢測系統(tǒng)設計與實現(xiàn)