基于機(jī)器學(xué)習(xí)的安卓惡意應(yīng)用檢測(cè)方法研究
發(fā)布時(shí)間:2018-10-18 18:42
【摘要】:隨著智能手機(jī)的出現(xiàn)以及移動(dòng)互聯(lián)網(wǎng)的快速發(fā)展,用戶連接網(wǎng)絡(luò)的方式也在逐漸發(fā)生變化,由PC端向移動(dòng)端轉(zhuǎn)移。現(xiàn)如今智能手機(jī)與傳統(tǒng)PC相比,已不僅僅是簡(jiǎn)單的通信工具,PC端的很多功能都在移動(dòng)端實(shí)現(xiàn)。Android手機(jī)系統(tǒng)是目前市場(chǎng)上用戶最多的手機(jī)操作系統(tǒng),因此大量的用戶和開(kāi)發(fā)人員關(guān)注安卓應(yīng)用市場(chǎng)。同時(shí),惡意代碼的開(kāi)發(fā)者也將目光轉(zhuǎn)入這一市場(chǎng),用戶的手機(jī)安全受到極大威脅。面對(duì)Android應(yīng)用市場(chǎng)存在的大量惡意應(yīng)用,如何高效的檢測(cè)惡意應(yīng)用是個(gè)亟待解決的問(wèn)題。針對(duì)以上問(wèn)題,本論文旨在研究基于機(jī)器學(xué)習(xí)的安卓惡意應(yīng)用檢測(cè)方法,主要研究重點(diǎn)包括:(1)對(duì)安卓惡意應(yīng)用檢測(cè)的研究現(xiàn)狀和成果以及安卓系統(tǒng)架構(gòu)進(jìn)行了深入的研究,分析了安卓系統(tǒng)基于Linux內(nèi)核的安全機(jī)制以及安卓系統(tǒng)特有的安全機(jī)制,如沙盒機(jī)制和權(quán)限機(jī)制等。(2)分析了惡意應(yīng)用的攻擊方式以及惡意代碼植入方式,在此基礎(chǔ)上對(duì)Android應(yīng)用的反編譯文件進(jìn)行了深入解析,并對(duì)論文中所使用的機(jī)器學(xué)習(xí)分類算法的原理進(jìn)行了分析。(3)設(shè)計(jì)了基于機(jī)器學(xué)習(xí)的安卓惡意應(yīng)用檢測(cè)的方案,針對(duì)惡意應(yīng)用特征提出使用N-gram Opcode特征進(jìn)行機(jī)器學(xué)習(xí)的惡意應(yīng)用檢測(cè)方案,實(shí)驗(yàn)結(jié)果表明使用Dalvik指令分為24類的規(guī)則和3-gram生成的3-gram Opcode特征具有最好的性能。隨后依據(jù)3-gram Opcode特征結(jié)合API特征和Permission特征,對(duì)特征集合和分類算法對(duì)分類器的性能影響進(jìn)行了多次實(shí)驗(yàn),大量的實(shí)驗(yàn)表明使用API特征、Permission特征與3-gram Opcode特征的組合特征集合與隨機(jī)森林算法訓(xùn)練得到的分類器有著較好的性能,在誤判率為5.3%的情況下達(dá)到了 94%的檢測(cè)準(zhǔn)確率,平均預(yù)測(cè)時(shí)間為10.06s。若是使用API特征與Permission特征的組合特征集合和隨機(jī)森林算法訓(xùn)練的分類器,在檢測(cè)準(zhǔn)確率94.1%和誤判率6.5%的情況下,平均預(yù)測(cè)時(shí)間為7.5s。
[Abstract]:With the emergence of smart phones and the rapid development of mobile Internet, the way users connect to the network is gradually changing from PC to mobile. Nowadays, compared with the traditional PC, the smartphone is not only a simple communication tool, but also many functions of the PC end are implemented on the mobile side. Android mobile phone system is the most popular mobile operating system in the market. So a lot of users and developers focus on the Android app market. At the same time, malicious code developers turn to this market, users' mobile phone security is greatly threatened. In the face of a large number of malicious applications in Android application market, how to detect malicious applications efficiently is an urgent problem to be solved. Aiming at the above problems, this thesis aims to study the malware detection methods of Android based on machine learning. The main research focuses are as follows: (1) the research status and achievements of Android malicious application detection and the Android system architecture are studied deeply. This paper analyzes the security mechanism of Android system based on Linux kernel and the special security mechanism of Android system, such as sandboxie mechanism and permission mechanism. (2) the attack mode of malicious application and the way of malicious code implantation are analyzed. On this basis, the decompilation file of Android application is deeply analyzed, and the principle of machine learning classification algorithm used in this paper is analyzed. (3) the scheme of malware application detection based on machine learning is designed. A malicious application detection scheme using N-gram Opcode features for machine learning is proposed for malicious application features. The experimental results show that the Dalvik instruction is divided into 24 kinds of rules and the 3-gram Opcode features generated by 3-gram have the best performance. Then, according to the 3-gram Opcode features combined with API features and Permission features, the effects of feature sets and classification algorithms on the performance of the classifier are tested many times. A large number of experiments show that the classifier trained by API feature, Permission feature and 3-gram Opcode feature combined with random forest algorithm has good performance, and the detection accuracy is 94% when the error rate is 5.3%. The average predicted time was 10.06 s. If the combined feature set of API feature and Permission feature and the classifier trained by stochastic forest algorithm are used, the average prediction time is 7.5 s when the detection accuracy is 94.1% and the error rate is 6.5%.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP181;TP309
[Abstract]:With the emergence of smart phones and the rapid development of mobile Internet, the way users connect to the network is gradually changing from PC to mobile. Nowadays, compared with the traditional PC, the smartphone is not only a simple communication tool, but also many functions of the PC end are implemented on the mobile side. Android mobile phone system is the most popular mobile operating system in the market. So a lot of users and developers focus on the Android app market. At the same time, malicious code developers turn to this market, users' mobile phone security is greatly threatened. In the face of a large number of malicious applications in Android application market, how to detect malicious applications efficiently is an urgent problem to be solved. Aiming at the above problems, this thesis aims to study the malware detection methods of Android based on machine learning. The main research focuses are as follows: (1) the research status and achievements of Android malicious application detection and the Android system architecture are studied deeply. This paper analyzes the security mechanism of Android system based on Linux kernel and the special security mechanism of Android system, such as sandboxie mechanism and permission mechanism. (2) the attack mode of malicious application and the way of malicious code implantation are analyzed. On this basis, the decompilation file of Android application is deeply analyzed, and the principle of machine learning classification algorithm used in this paper is analyzed. (3) the scheme of malware application detection based on machine learning is designed. A malicious application detection scheme using N-gram Opcode features for machine learning is proposed for malicious application features. The experimental results show that the Dalvik instruction is divided into 24 kinds of rules and the 3-gram Opcode features generated by 3-gram have the best performance. Then, according to the 3-gram Opcode features combined with API features and Permission features, the effects of feature sets and classification algorithms on the performance of the classifier are tested many times. A large number of experiments show that the classifier trained by API feature, Permission feature and 3-gram Opcode feature combined with random forest algorithm has good performance, and the detection accuracy is 94% when the error rate is 5.3%. The average predicted time was 10.06 s. If the combined feature set of API feature and Permission feature and the classifier trained by stochastic forest algorithm are used, the average prediction time is 7.5 s when the detection accuracy is 94.1% and the error rate is 6.5%.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP181;TP309
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 桓自強(qiáng);倪宏;胡琳琳;郭志川;;基于Android權(quán)限機(jī)制的應(yīng)用安全檢測(cè)方法[J];計(jì)算機(jī)工程與設(shè)計(jì);2016年01期
2 謝妞妞;;決策樹(shù)算法綜述[J];軟件導(dǎo)刊;2015年11期
3 王鵬;;安卓平臺(tái)下惡意軟件的檢測(cè)研究[J];中國(guó)新通信;2015年08期
4 李挺;董航;袁春陽(yáng);杜躍進(jìn);徐國(guó)愛(ài);;基于Dalvik指令的Android惡意代碼特征描述及驗(yàn)證[J];計(jì)算機(jī)研究與發(fā)展;2014年07期
5 張玉清;王凱;楊歡;方U喚,
本文編號(hào):2280035
本文鏈接:http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/2280035.html
最近更新
教材專著