天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

基于改進(jìn)的N-gram惡意PDF文檔靜態(tài)檢測(cè)技術(shù)研究

發(fā)布時(shí)間:2018-04-28 13:42

  本文選題:PDF文檔 + Java; 參考:《東華理工大學(xué)》2017年碩士論文


【摘要】:隨著信息技術(shù)的發(fā)展和辦公自動(dòng)化的普及,PDF文檔逐漸成為人們工作和學(xué)習(xí)上必不可少的首選應(yīng)用文本軟件。盡管PDF文檔帶來(lái)諸多便利,使用過(guò)程中漸漸出現(xiàn)很多安全問(wèn)題。攻擊者利用PDF文檔格式漏洞嵌入惡意JavaScript代碼進(jìn)行攻擊,獲取特定目標(biāo)的私密信息,給特定目標(biāo)造成無(wú)法估計(jì)的損失。因此檢測(cè)和防范嵌入惡意JavaScript代碼的PDF文檔逐漸成為信息安全領(lǐng)域國(guó)內(nèi)外研究學(xué)者研究的重要目標(biāo)。本文對(duì)PDF文檔進(jìn)行分析,主要介紹PDF文檔的物理結(jié)構(gòu)與邏輯結(jié)構(gòu)、PDF文檔的攻擊技術(shù)及惡意PDF文檔的傳播途徑。深入分析現(xiàn)有基于N-gram的惡意PDF文檔靜態(tài)檢測(cè)模型,存在兩點(diǎn)不足:第一,忽略了PDF文檔中隱藏信息對(duì)提取的JavaScript代碼完整程度的影響以及對(duì)提取出來(lái)的JavaScript代碼預(yù)處理不足;第二,N-gram特征提取方法只能提取到固定長(zhǎng)度的N-gram特征,導(dǎo)致有效特征被分隔開(kāi)。論文針對(duì)上述問(wèn)題提出了一種改進(jìn)的N-gram惡意PDF文檔靜態(tài)檢測(cè)模型,設(shè)計(jì)一個(gè)PDF文檔預(yù)處理流程,包括解密處理、解碼處理、JavaScript定位與提取和JavaScript去混淆處理,確保提取的JavaScript代碼完整及有效;在現(xiàn)有N-gram特征提取方法基礎(chǔ)上進(jìn)行改進(jìn),確保提取到更有效的N-gram特征向量。為了驗(yàn)證改進(jìn)的N-gram特征提取方法的有效性,使用改進(jìn)前后的N-gram特征提取方法進(jìn)行特征提取,將提取到的特征向量作為數(shù)據(jù)輸入部分,使用多種檢測(cè)算法進(jìn)行訓(xùn)練與測(cè)試得到檢測(cè)結(jié)果,同時(shí)將檢測(cè)算法結(jié)合Boosting算法進(jìn)行訓(xùn)練與測(cè)試得到檢測(cè)結(jié)果。通過(guò)檢測(cè)結(jié)果,驗(yàn)證了本文提出的改進(jìn)的N-gram特征提取方法對(duì)惡意PDF文檔檢測(cè)有效,并且比對(duì)改進(jìn)前的N-gram特征提取方法,取得更優(yōu)的檢測(cè)效果,同時(shí)結(jié)合Boosting算法可以提升檢測(cè)模型的檢測(cè)性能,與DPScan模型、PJScan模型相比較檢測(cè)性能更好。
[Abstract]:With the development of information technology and the popularization of office automation, PDF documents are becoming the first choice software for people to work and study. Although PDF documents bring a lot of convenience, there are many security problems in the process of use. Attackers exploit the PDF document format vulnerability to embed malicious JavaScript code to attack, obtain private information of specific targets, and cause incalculable losses to specific targets. Therefore, detecting and guarding against PDF documents embedded in malicious JavaScript code has gradually become an important research goal in the field of information security. This paper analyzes the PDF documents, mainly introduces the physical and logical structure of PDF documents and the attack technology of PDF documents and the propagation of malicious PDF documents. There are two shortcomings in the existing static detection model of malicious PDF document based on N-gram. Firstly, the influence of hidden information in PDF document on the integrity of extracted JavaScript code and the insufficient preprocessing of extracted JavaScript code are ignored. The second N-gram feature extraction method can only extract N-gram features of fixed length, which leads to the separation of effective features. In this paper, an improved static detection model of N-gram malicious PDF document is proposed, and a preprocessing process of PDF document is designed, which includes decryption, decoding, location and extraction of N-gram, and JavaScript obfuscation. To ensure the integrity and efficiency of the extracted JavaScript code and to improve the existing N-gram feature extraction methods to ensure the extraction of a more effective N-gram feature vector. In order to verify the effectiveness of the improved N-gram feature extraction method, the improved N-gram feature extraction method is used for feature extraction, and the extracted feature vector is used as the data input part. The detection results are obtained by training and testing with a variety of detection algorithms, and the detection results are obtained by combining the detection algorithm with the Boosting algorithm. The detection results show that the improved N-gram feature extraction method proposed in this paper is effective in detecting malicious PDF documents, and it has better detection effect than the N-gram feature extraction method before the improvement. At the same time, combined with Boosting algorithm, the detection performance of the detection model can be improved, and compared with the DPScan model, the detection performance is better.
【學(xué)位授予單位】:東華理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP309

【參考文獻(xiàn)】

相關(guān)期刊論文 前4條

1 郝晨曦;方勇;;基于頻譜分析的PDF文件惡意代碼檢測(cè)方法[J];信息安全研究;2016年02期

2 陳亮;陳性元;孫奕;杜學(xué)繪;;基于結(jié)構(gòu)路徑的惡意PDF文檔檢測(cè)[J];計(jì)算機(jī)科學(xué);2015年02期

3 李衛(wèi)東;宋威;李欣;楊炳儒;;一種多標(biāo)準(zhǔn)決策樹(shù)剪枝方法及其在入侵檢測(cè)中的應(yīng)用[J];北京科技大學(xué)學(xué)報(bào);2007年04期

4 欒麗華,吉根林;決策樹(shù)分類(lèi)技術(shù)研究[J];計(jì)算機(jī)工程;2004年09期

相關(guān)碩士學(xué)位論文 前4條

1 孫本陽(yáng);PDF文檔的安全性檢測(cè)技術(shù)研究[D];上海交通大學(xué);2015年

2 楊書(shū)金;基于SVM模型的惡意網(wǎng)頁(yè)及PDF文檔檢測(cè)技術(shù)研究[D];江西理工大學(xué);2014年

3 丁曉煌;惡意PDF文檔的靜態(tài)檢測(cè)技術(shù)研究[D];西安電子科技大學(xué);2014年

4 武雪峰;惡意PDF文檔的分析[D];山東大學(xué);2012年

,

本文編號(hào):1815508

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/ruanjiangongchenglunwen/1815508.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶9b662***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com