天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 自動化論文 >

面向不平衡分布數(shù)據(jù)的主動極限學(xué)習(xí)機算法研究

發(fā)布時間:2018-11-05 14:03
【摘要】:近年來,隨著數(shù)據(jù)獲取與數(shù)據(jù)存儲技術(shù)的高速發(fā)展,各行各業(yè)均積累了海量的數(shù)據(jù),如何對這些海量數(shù)據(jù)進(jìn)行分析成為了困擾機器學(xué)習(xí)與數(shù)據(jù)挖掘領(lǐng)域研究者的核心問題。例如,對這海量數(shù)據(jù)的類別進(jìn)行標(biāo)注,進(jìn)而建立分類模型,無疑會大幅增加人力、物力與時間成本的開銷,而主動學(xué)習(xí)則是可有效解決上述問題的利器。經(jīng)過多年研究,研究人員已提出了多種有效的主動學(xué)習(xí)算法,但其均忽略了一個重要問題,即在樣本不平衡分布場景下,這些算法是否會仍舊有效。故本文主要研究在類別不平衡數(shù)據(jù)中如何保持主動學(xué)習(xí)的效率與性能。針對上述問題,本文主要圍繞在不平衡數(shù)據(jù)分布中,如何改進(jìn)主動學(xué)習(xí)算法使其分類性能達(dá)到最優(yōu)這一問題展開研究,主要研究內(nèi)容包括以下兩個方面:1)針對在不平衡分布數(shù)據(jù)中執(zhí)行主動學(xué)習(xí),其分類面容易形成偏倚,從而導(dǎo)致主動學(xué)習(xí)失效這一問題,擬采用采樣技術(shù)作為學(xué)習(xí)過程的平衡控制策略,在調(diào)查了幾種已有的采樣算法的基礎(chǔ)上,提出了一種邊界過采樣算法,并將其與主動學(xué)習(xí)相結(jié)合。且由于極限學(xué)習(xí)機具有泛化能力強、訓(xùn)練速度快等優(yōu)點,擬采用其作為基分類器,來加速主動學(xué)習(xí)的進(jìn)程。并通過12個基準(zhǔn)數(shù)據(jù)集對加入了平衡控制策略的主動學(xué)習(xí)算法的性能進(jìn)行了驗證。結(jié)果表明,在不平衡場景下,主動學(xué)習(xí)方法確實會受到影響,且采取了樣本采樣技術(shù)的主動學(xué)習(xí)方法性能更優(yōu)。2)為了實現(xiàn)更快的訓(xùn)練速度,引入了在線學(xué)習(xí),進(jìn)而提出了一種在線加權(quán)極限學(xué)習(xí)機算法,即OS-W-ELM算法。同時擬采用代價敏感學(xué)習(xí)技術(shù)作為學(xué)習(xí)過程中的平衡控制策略,并與主動學(xué)習(xí)相結(jié)合。此實驗仍是以極限學(xué)習(xí)機作為基分類器。并采用與上述實驗相同的12個基準(zhǔn)數(shù)據(jù)集,對AL-OS-W-ELM算法、AL-OS-ELM算法和RS-OS-W-ELM算法的性能進(jìn)行了比較。同時將AL-OS-W-ELM算法、AL-OS-ELM算法與加入了采樣技術(shù)的主動學(xué)習(xí)算法在運行時間上進(jìn)行了對比。結(jié)果表明,在不平衡場景下,采取了在線學(xué)習(xí)與代價敏感學(xué)習(xí)技術(shù)的主動學(xué)習(xí)方法性能更優(yōu)。
[Abstract]:In recent years, with the rapid development of data acquisition and data storage technology, a variety of industries have accumulated massive data, how to analyze these massive data has become a core problem for researchers in the field of machine learning and data mining. For example, tagging the huge data category and establishing classification model will undoubtedly increase the cost of manpower, material resources and time cost, and active learning is the effective weapon to solve the above problems. After many years of research, researchers have proposed a variety of effective active learning algorithms, but they all ignore an important question, that is, whether these algorithms will still be effective in the scenario of uneven distribution of samples. Therefore, this paper focuses on how to maintain the efficiency and performance of active learning in class imbalance data. In view of the above problems, this paper focuses on how to improve the active learning algorithm to achieve the optimal classification performance in the unbalanced data distribution. The main research contents include the following two aspects: 1) in order to solve the problem of active learning in unbalanced distributed data, the classification surface is prone to bias, which leads to the failure of active learning. Based on the investigation of several existing sampling algorithms, a boundary oversampling algorithm is proposed and combined with active learning. Because extreme learning machine has the advantages of strong generalization ability and fast training speed, it is proposed to use it as a base classifier to speed up the process of active learning. The performance of active learning algorithm with balanced control strategy is verified by 12 datum data sets. The results show that the active learning method will be affected in the unbalanced scenario, and the performance of the active learning method with sample sampling technique is better. 2) in order to achieve faster training speed, online learning is introduced. Furthermore, an online weighted limit learning machine algorithm, OS-W-ELM algorithm, is proposed. At the same time, the cost sensitive learning technique is adopted as the balance control strategy in the learning process and combined with active learning. This experiment still uses the extreme learning machine as the base classifier. The performance of AL-OS-W-ELM algorithm, AL-OS-ELM algorithm and RS-OS-W-ELM algorithm is compared with 12 datum data sets. At the same time, the AL-OS-W-ELM algorithm, the AL-OS-ELM algorithm and the active learning algorithm with sampling technology are compared in the running time. The results show that the active learning method based on online learning and cost sensitive learning is better in the unbalanced scenario.
【學(xué)位授予單位】:江蘇科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP18

【參考文獻(xiàn)】

中國期刊全文數(shù)據(jù)庫 前4條

1 翟云;楊炳儒;曲武;;不平衡類數(shù)據(jù)挖掘研究綜述[J];計算機科學(xué);2010年10期

2 王和勇;樊泓坤;姚正安;李成安;;不平衡數(shù)據(jù)集的分類方法研究[J];計算機應(yīng)用研究;2008年05期

3 林智勇;郝志峰;楊曉偉;;不平衡數(shù)據(jù)分類的研究現(xiàn)狀[J];計算機應(yīng)用研究;2008年02期

4 龍軍;殷建平;祝恩;趙文濤;;主動學(xué)習(xí)研究綜述[J];計算機研究與發(fā)展;2008年S1期

,

本文編號:2312301

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/2312301.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶f448d***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com