天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于模糊理論的決策樹(shù)算法的研究及應(yīng)用

發(fā)布時(shí)間:2018-04-26 16:58

  本文選題:數(shù)據(jù)挖掘 + 決策樹(shù); 參考:《中國(guó)地質(zhì)大學(xué)(北京)》2017年碩士論文


【摘要】:在數(shù)據(jù)挖掘領(lǐng)域,數(shù)據(jù)的分類是其研究的核心內(nèi)容之一,而決策樹(shù)算法,便是一種簡(jiǎn)單高效且應(yīng)用比較普遍的分類算法。該算法的模型簡(jiǎn)單明朗,易于理解,可復(fù)用性強(qiáng),同時(shí)具有較高的分類精度。經(jīng)典的決策樹(shù)算法不善于處理數(shù)據(jù)的模糊性問(wèn)題,隨著模糊理論在機(jī)器學(xué)習(xí)、人工智能等方面的應(yīng)用,通過(guò)將模糊集合理論與決策樹(shù)算法融合,誕生了模糊決策樹(shù)算法,比如FuzzyID3、Min-Ambiguity算法等。模糊決策樹(shù)算法的出現(xiàn)使經(jīng)典決策樹(shù)算法的應(yīng)用得到拓展,對(duì)該類算法的發(fā)展有著深遠(yuǎn)的影響,使其能夠處理具有不確定性的數(shù)據(jù)。論文的主要工作包括以下幾點(diǎn):(1)論述決策樹(shù)以及模糊理論相關(guān)的基礎(chǔ)概念,總結(jié)不同決策樹(shù)算法分裂屬性選取標(biāo)準(zhǔn)的差異,分析不同的決策樹(shù)剪枝技術(shù)。重點(diǎn)比較清晰決策樹(shù)與模糊決策樹(shù)在建樹(shù)過(guò)程、數(shù)據(jù)預(yù)處理、算法復(fù)雜度、規(guī)則匹配方式以及適用范圍等方面的差異,總結(jié)它們的優(yōu)缺點(diǎn)。(2)提出了通過(guò)K-means算法獲取連續(xù)屬性聚類中心點(diǎn),并結(jié)合三角模糊數(shù)對(duì)連續(xù)數(shù)據(jù)模糊處理的方式。同時(shí)設(shè)計(jì)完成了基于FuzzyID3和Min-Ambiguity算法的可視化模糊決策系統(tǒng)。結(jié)合Weka開(kāi)源數(shù)據(jù)挖掘軟件中實(shí)現(xiàn)的C4.5和CART算法,通過(guò)實(shí)驗(yàn)分析,比較四種決策樹(shù)算法在分類正確率和產(chǎn)生的規(guī)則數(shù)上的不同。實(shí)驗(yàn)發(fā)現(xiàn)FuzzyID3算法在各個(gè)數(shù)據(jù)集上都有較高的正確率,且規(guī)則數(shù)較少。CART算法生成的規(guī)則數(shù)最少,這是因?yàn)槠涠鏄?shù)的模型特點(diǎn)和以基尼指數(shù)作為分裂屬性選取標(biāo)準(zhǔn)的特性。對(duì)比FuzzyID3和Min-Ambiguity兩種模糊決策樹(shù)算法,發(fā)現(xiàn)前者整體性能優(yōu)于后者,同時(shí)實(shí)驗(yàn)分析了真實(shí)度對(duì)這兩種算法的影響。(3)將模糊決策樹(shù)算法應(yīng)用到郵件分類中,設(shè)計(jì)了一種以FuzzyID3算法為核心,基于郵件行為特征的郵件分類模型,提出了一種郵件特征屬性選取的方案和相應(yīng)的模糊處理方案。通過(guò)實(shí)驗(yàn)驗(yàn)證發(fā)現(xiàn),該模型在對(duì)郵件分類時(shí)具有較高的召回率和正確率,可以較為高效的識(shí)別垃圾郵件。
[Abstract]:In the field of data mining, the classification of data is one of the core contents of its research, and the decision tree algorithm is a simple, efficient and widely used classification algorithm. The model of the algorithm is simple and clear, easy to understand, reusability and high classification accuracy. The classical decision tree algorithm is not good at dealing with the fuzzy problem of data. With the application of fuzzy theory in machine learning and artificial intelligence, the fuzzy decision tree algorithm is born by combining fuzzy set theory with decision tree algorithm. Such as FuzzyID3 Min-Ambiguity algorithm and so on. The emergence of fuzzy decision tree algorithm extends the application of classical decision tree algorithm, and has a profound influence on the development of this kind of algorithm, which enables it to deal with data with uncertainty. The main work of this paper includes the following points: 1) discussing the basic concepts of decision tree and fuzzy theory, summarizing the differences of different decision tree algorithms' splitting attribute selection criteria, and analyzing different pruning techniques of decision tree. The difference between decision tree and fuzzy decision tree in building process, data preprocessing, algorithm complexity, rule matching method and application scope is emphasized. The advantages and disadvantages of these two methods are summarized. (2) A method to obtain the center points of continuous attribute clustering by K-means algorithm is proposed, and the method of fuzzy processing of continuous data is combined with triangular fuzzy number. At the same time, a visual fuzzy decision system based on FuzzyID3 and Min-Ambiguity algorithm is designed. Combined with the C4.5 and CART algorithms implemented in Weka open source data mining software, the differences of the classification accuracy and the number of rules generated by the four decision tree algorithms are compared through experimental analysis. Experimental results show that the FuzzyID3 algorithm has a high accuracy in each data set, and the rule number is less. Cart algorithm generates the least number of rules, which is due to the model characteristics of its binary tree and the characteristic of selecting the split attribute with the Gini index as the criterion. Compared with two fuzzy decision tree algorithms, FuzzyID3 and Min-Ambiguity, it is found that the former has better overall performance than the latter. At the same time, the influence of the degree of truthfulness on the two algorithms is analyzed experimentally. (3) the fuzzy decision tree algorithm is applied to the mail classification. In this paper, a mail classification model based on the FuzzyID3 algorithm is designed, and a scheme to select the mail feature attributes and the corresponding fuzzy processing scheme are proposed. The experimental results show that the model has high recall rate and correct rate in the classification of mail, and it can be used to identify spam more efficiently.
【學(xué)位授予單位】:中國(guó)地質(zhì)大學(xué)(北京)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13;O159

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 王曙燕,耿國(guó)華,李丙春;決策樹(shù)算法在醫(yī)學(xué)圖像數(shù)據(jù)挖掘中的應(yīng)用[J];西北大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年03期

2 馮少榮;;決策樹(shù)算法的研究與改進(jìn)[J];廈門大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年04期

3 王磊;鄭任兒;;決策樹(shù)算法的比較研究[J];科技信息;2012年30期

4 萬(wàn)川南;柳炳祥;徐星;;決策樹(shù)算法在手機(jī)購(gòu)買中的應(yīng)用[J];內(nèi)江科技;2013年09期

5 崔麗;;基于云平臺(tái)的決策樹(shù)算法在醫(yī)療領(lǐng)域中的應(yīng)用[J];科技通報(bào);2013年02期

6 李建軍;吳文亮;;基于決策樹(shù)算法的高?蒲泄芾碓u(píng)估研究[J];科技通報(bào);2014年03期

7 金瑩;;一種改進(jìn)的決策樹(shù)算法及其在高校學(xué)生就業(yè)中的應(yīng)用[J];合肥學(xué)院學(xué)報(bào)(自然科學(xué)版);2010年02期

8 呂爽;陳高云;吳曉;王鵬;;基于主從模式的并行決策樹(shù)算法研究[J];西南民族大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年04期

9 遲慶云;;一種動(dòng)態(tài)的決策樹(shù)算法研究[J];邵陽(yáng)學(xué)院學(xué)報(bào)(自然科學(xué)版);2007年03期

10 徐健鋒;劉斕;邱桃榮;劉清;;基于粒計(jì)算的二進(jìn)制矩陣及在決策樹(shù)算法的應(yīng)用[J];廣西師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2008年03期

相關(guān)會(huì)議論文 前3條

1 韓松來(lái);張輝;周華平;;決策樹(shù)算法中多值偏向問(wèn)題的理論分析[A];全國(guó)自動(dòng)化新技術(shù)學(xué)術(shù)交流會(huì)會(huì)議論文集(一)[C];2005年

2 楊林權(quán);呂維先;;基于決策樹(shù)算法的SimuroSot決策程序設(shè)計(jì)[A];馬斯特杯2003年中國(guó)機(jī)器人大賽及研討會(huì)論文集[C];2003年

3 王琦;;基于貝葉斯決策樹(shù)算法的垃圾郵件識(shí)別機(jī)制[A];2011年通信與信息技術(shù)新進(jìn)展——第八屆中國(guó)通信學(xué)會(huì)學(xué)術(shù)年會(huì)論文集[C];2011年

相關(guān)碩士學(xué)位論文 前10條

1 王偉;具有降維容噪特性的決策樹(shù)算法改進(jìn)[D];鄭州大學(xué);2015年

2 薛硯丹;基于決策樹(shù)算法的高校財(cái)務(wù)管理與決策分析研究[D];寧夏大學(xué);2015年

3 高帆;基于面向?qū)ο鬀Q策樹(shù)算法的土地利用遙感分類初步研究[D];云南師范大學(xué);2015年

4 龍志勇;基于并行化的決策樹(shù)算法優(yōu)化及其應(yīng)用研究[D];浙江大學(xué);2015年

5 張敬軒;決策樹(shù)算法在違約預(yù)測(cè)中的應(yīng)用[D];北京理工大學(xué);2015年

6 李偉;決策樹(shù)算法應(yīng)用及并行化研究[D];電子科技大學(xué);2014年

7 張曉偉;銀行卡業(yè)務(wù)分析和數(shù)據(jù)挖掘系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];電子科技大學(xué);2014年

8 劉勝濤;地源熱泵優(yōu)化控制系統(tǒng)設(shè)計(jì)與研究[D];電子科技大學(xué);2016年

9 李海濤;基于Hadoop的決策樹(shù)算法改進(jìn)及林業(yè)數(shù)據(jù)分類預(yù)測(cè)研究[D];東北林業(yè)大學(xué);2016年

10 范志成;航空總線信息提取及優(yōu)化的研究[D];中國(guó)民航大學(xué);2012年

,

本文編號(hào):1806866

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/yysx/1806866.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶4e1d1***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com