基于分類器集成的網(wǎng)頁惡意代碼檢測研究

發(fā)布時間：2018-12-10 12:37

【摘要】：在這個互聯(lián)網(wǎng)飛速發(fā)展的時代,網(wǎng)絡不僅豐富了人們娛樂生活,也在各個方面為人們做出了巨大貢獻,改進了人們的生活。然而,網(wǎng)絡在為人們的生活帶來便捷的同時也帶來了隱患。不法分子在網(wǎng)絡的飛速發(fā)展中看到了可乘之機,利用惡意代碼破壞網(wǎng)絡安全,謀取經(jīng)濟利益。政府和國家對于惡意代碼檢測越來越重視。惡意代碼檢測一般分為靜態(tài)檢測和動態(tài)檢測兩種方法。靜態(tài)檢測[1]主要是基于規(guī)則和特征值匹配,提取網(wǎng)頁特征。動態(tài)檢測[2]是通過在虛擬環(huán)境中運行惡意代碼,根據(jù)惡意代碼的行為提取特征,本文主要是針對JavaScript惡意代碼[3],基于機器學習對惡意代碼檢測進行研究。本文的主要工作和成果如下:1.本文對于混淆的JavaScript代碼用V8引擎編譯成機器碼[4],并針對惡意代碼特點將機器碼中的操作數(shù)分類簡化并與操作碼混合。對處理后的機器碼根據(jù)信息增益用Bi-Gram和Tri-Gram提取特征值。提出基于頻率、距離和互信息的方法對樣本處理找出斷點,計算單個樣本變長N-gram特征。經(jīng)實驗分析證實,處理后的操作數(shù)和操作碼混合的特征提取能更細致的表達機器碼行為,并且通過變長N-Gram統(tǒng)計的特征能避免將有效序列分開的問題,提升了分類效果。2.在研究常見的分類算法和分類器集成算法的基礎(chǔ)上,針對輸入單一的問題,提出集成分類器輸入優(yōu)化[5],對輸入的數(shù)據(jù)集用不同方式處理,使得內(nèi)部多種分類器能針對性訓練形成分類模型進行集成[6]。并且通過加入次級分類器,將原本單層的分類器集成結(jié)構(gòu)變成多層次分類器集成,引入權(quán)重,給每個分類器設定不同的權(quán)重,通過訓練,找出效果最好的權(quán)值分配。實驗證明經(jīng)過多種優(yōu)化的多層次加權(quán)分類器集成有更好的分類效果。3.在以上算法研究的基礎(chǔ)上,設計并開發(fā)了在線惡意代碼檢測系統(tǒng)。用戶可以在線提交惡意腳本代碼或者網(wǎng)站地址,系統(tǒng)可以快速的進行檢測。用戶可以提交檢測報告和查看別人提交的檢測報告。被系統(tǒng)檢測為惡意的代碼,系統(tǒng)會自動保存到數(shù)據(jù)庫。
[Abstract]:In this era of rapid development of the Internet, the Internet not only enriches people's entertainment life, but also makes great contributions to people in all aspects, and improves people's lives. However, the network not only brings convenience to people's life, but also brings hidden trouble. In the rapid development of the network, lawbreakers see the opportunity to use malicious code to destroy network security and seek economic benefits. Governments and countries pay more and more attention to malicious code detection. Malicious code detection is generally divided into two methods: static detection and dynamic detection. Static detection [1] is mainly based on matching rules and feature values to extract page features. Dynamic detection [2] is by running malicious code in virtual environment, according to the behavior of malicious code to extract features, this paper is mainly aimed at JavaScript malicious code [3], based on machine learning to detect malicious code. The main work and results of this paper are as follows: 1. In this paper, the confused JavaScript code is compiled into machine code by V8 engine, and the Operand classification in machine code is simplified and mixed with the opcode according to the characteristics of malicious code. The eigenvalues are extracted by Bi-Gram and Tri-Gram according to the information gain of the processed machine code. A method based on frequency, distance and mutual information is proposed to find breakpoints for sample processing and to calculate the variable length N-gram features of a single sample. The experimental results show that the feature extraction of the mixture of operands and opcodes can express the behavior of machine code more carefully, and the problem of separating effective sequences can be avoided by the feature of variable length N-Gram statistics, and the classification effect is improved. 2. On the basis of studying common classification algorithms and classifier ensemble algorithms, aiming at the problem of single input, an integrated classifier input optimization [5] is proposed, and the input data sets are processed in different ways. Internal multiple classifiers can be trained to form a classification model for integration [6]. And by adding the secondary classifier, the original single-layer classifier integration structure is transformed into multi-level classifier integration, and the weight is introduced to set different weights for each classifier. Through training, the best weight distribution is found. Experiments show that multi-level weighted classifier ensemble has better classification effect. Based on the above algorithms, an online malicious code detection system is designed and developed. Users can submit malicious script code or site address online, the system can quickly detect. Users can submit test reports and view test reports submitted by others. Detected by the system as malicious code, the system will automatically save to the database.
【學位授予單位】：浙江工業(yè)大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP393.08

【參考文獻】

相關(guān)期刊論文前10條

1 修揚;劉嘉勇;;基于操作碼序列頻率向量和行為特征向量的惡意軟件檢測[J];信息安全與通信保密;2016年09期

2 賀鳴;孫建軍;成穎;;基于樸素貝葉斯的文本分類研究綜述[J];情報科學;2016年07期

3 張凱;王東安;李超;賈冰;;基于協(xié)同采樣主動學習的惡意代碼檢測[J];高技術(shù)通訊;2016年05期

4 盧曉勇;陳木生;;基于隨機森林和欠采樣集成的垃圾網(wǎng)頁檢測[J];計算機應用;2016年03期

5 廖國輝;劉嘉勇;;基于數(shù)據(jù)挖掘和機器學習的惡意代碼檢測方法[J];信息安全研究;2016年01期

6 付壘朋;張瀚;霍路陽;;基于多類特征的JavaScript惡意腳本檢測算法[J];模式識別與人工智能;2015年12期

7 向濤;李濤;趙雪專;李旭冬;;基于隨機森林的精確目標檢測方法[J];計算機應用研究;2016年09期

8 李盟;賈曉啟;王蕊;林東岱;;一種惡意代碼特征選取和建模方法[J];計算機應用與軟件;2015年08期

9 徐青;朱焱;唐壽洪;;分析多類特征和欺詐技術(shù)檢測JavaScript惡意代碼[J];計算機應用與軟件;2015年07期

10 宣以廣;周華;;基于字符熵的JavaScript代碼混淆自動檢測方法[J];計算機應用與軟件;2015年01期

相關(guān)博士學位論文前3條

1 解男男;機器學習方法在入侵檢測中的應用研究[D];吉林大學;2015年

2 孫鑫;機器學習中特征選問題研究[D];吉林大學;2013年

3 羅瑜;支持向量機在機器學習中的應用研究[D];西南交通大學;2007年

相關(guān)碩士學位論文前3條

1 王宇恒;推薦系統(tǒng)中隨機森林算法的優(yōu)化與應用[D];浙江大學;2016年

2 李運;機器學習算法在數(shù)據(jù)挖掘中的應用[D];北京郵電大學;2015年

3 李洋;基于機器學習的網(wǎng)頁惡意代碼檢測技術(shù)研究[D];西安電子科技大學;2013年

，

本文編號：2370581

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/guanlilunwen/ydhl/2370581.html

上一篇：A lightweight authentication scheme with user untraceability
下一篇：基于OTA技術(shù)的手機錢包的安全機制研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于分類器集成的網(wǎng)頁惡意代碼檢測研究