基于選擇性集成的在線機(jī)器學(xué)習(xí)關(guān)鍵技術(shù)研究
發(fā)布時(shí)間:2018-11-18 10:01
【摘要】:一直以來(lái),機(jī)器學(xué)習(xí)技術(shù)在眾多領(lǐng)域都發(fā)揮著巨大的作用。對(duì)數(shù)據(jù)進(jìn)行分析處理,從中獲得有用的信息和知識(shí)以便指導(dǎo)后續(xù)的決策,這是機(jī)器學(xué)習(xí)的最終目標(biāo)。而隨著互聯(lián)網(wǎng)的普及,數(shù)據(jù)獲取的手段逐漸豐富,人們獲得的數(shù)據(jù)量呈指數(shù)增長(zhǎng),從而對(duì)傳統(tǒng)的機(jī)器學(xué)習(xí)技術(shù)造成挑戰(zhàn)。對(duì)于架構(gòu)在互聯(lián)網(wǎng)之上的在線交易、在線廣告、金融分析以及搜索引擎等業(yè)務(wù)而言,能夠?qū)Υ笠?guī)模、長(zhǎng)時(shí)間、持續(xù)性的數(shù)據(jù)進(jìn)行快速、有效的學(xué)習(xí)具有重要的意義。在線機(jī)器學(xué)習(xí)是對(duì)大量數(shù)據(jù)進(jìn)行及時(shí)處理的重要手段,預(yù)測(cè)能力和預(yù)測(cè)效率成為在線學(xué)習(xí)方法最重要的評(píng)價(jià)標(biāo)準(zhǔn)。作為最重要的在線機(jī)器學(xué)習(xí)策略,增量學(xué)習(xí)方法可以分為單分類(lèi)器增量學(xué)習(xí)和集成式增量學(xué)習(xí)。單分類(lèi)器方法容易出現(xiàn)過(guò)適應(yīng)問(wèn)題,預(yù)測(cè)能力較低。而隨著系統(tǒng)的持續(xù)運(yùn)行,集成式學(xué)習(xí)方法通常會(huì)導(dǎo)致目標(biāo)集成分類(lèi)器規(guī)模不斷增大,預(yù)測(cè)開(kāi)銷(xiāo)越來(lái)越大。 在批量式機(jī)器學(xué)習(xí)中,選擇性集成可以有效提高集成分類(lèi)器的預(yù)測(cè)能力和預(yù)測(cè)效率。本文針對(duì)監(jiān)督學(xué)習(xí)和分類(lèi)問(wèn)題,提出將選擇性集成技術(shù)用于集成式增量學(xué)習(xí),從而提高在線學(xué)習(xí)的預(yù)測(cè)能力和預(yù)測(cè)效率的思想。論文首先提出了選擇性集成與增量學(xué)習(xí)相結(jié)合的在線學(xué)習(xí)模型,然后對(duì)其涉及的關(guān)鍵技術(shù)展開(kāi)深入研究。論文的主要工作和創(chuàng)新包括: 1、提出選擇性集成與增量學(xué)習(xí)相結(jié)合的在線學(xué)習(xí)模型EPIL。 本文針對(duì)各領(lǐng)域的實(shí)際需求以及目前在線學(xué)習(xí)技術(shù)的缺陷,提出選擇性集成與增量學(xué)習(xí)相結(jié)合的在線學(xué)習(xí)模型EPIL,并闡述了該模型涉及的若干技術(shù)挑戰(zhàn)。EPIL模型對(duì)每次增量數(shù)據(jù)集的學(xué)習(xí)均獲得若干局部基分類(lèi)器,然后利用局部選擇剔除預(yù)測(cè)能力差的局部基分類(lèi)器,并擇機(jī)利用全局選擇剔除已經(jīng)過(guò)時(shí)的全局基分類(lèi)器,使得目標(biāo)集成分類(lèi)器的規(guī)模小、預(yù)測(cè)能力強(qiáng)、具有良好的增量學(xué)習(xí)能力。 2、提出基于模式挖掘的選擇性集成策略及算法框架。 對(duì)EPIL模型中的選擇性集成技術(shù)進(jìn)行研究,創(chuàng)新性地提出了基于模式挖掘的選擇性集成策略,并構(gòu)建基于該策略的選擇性集成算法框架,詳細(xì)分析了框架中的關(guān)鍵技術(shù)。在基于模式挖掘的選擇性集成策略中,選擇性集成問(wèn)題被描述為從事務(wù)數(shù)據(jù)庫(kù)中挖掘一個(gè)模式的問(wèn)題,從而能夠利用事務(wù)處理和模式挖掘技術(shù)進(jìn)行基分類(lèi)器的選擇,為選擇性集成方法的研究開(kāi)拓了一個(gè)新的方向。 3、提出兩種基于覆蓋模式挖掘的選擇性集成算法。 源于基于模式挖掘的選擇性集成策略,論文首先提出了覆蓋模式挖掘的概念,然后利用該概念給出了兩種選擇性集成算法:CPM-EP和PMEP。CPM-EP和PMEP算法都利用覆蓋模式挖掘思想和多數(shù)投票法原理來(lái)獲取各種長(zhǎng)度的候選子模式,然后都是利用貪婪策略來(lái)構(gòu)造目標(biāo)集成分類(lèi)器。但是PMEP通過(guò)對(duì)原始事務(wù)數(shù)據(jù)庫(kù)創(chuàng)建一棵FP-Tree,然后從FP-Tree中獲取候選子模式,避免對(duì)事務(wù)數(shù)據(jù)庫(kù)的頻繁操作,從而節(jié)省了大量開(kāi)銷(xiāo)。實(shí)驗(yàn)結(jié)果表明,CPM-EP和PMEP算法的基分類(lèi)器選擇速度快,目標(biāo)集成分類(lèi)器規(guī)模小、預(yù)測(cè)能力強(qiáng)。就上述兩種算法而言,PMEP在選擇時(shí)間上優(yōu)于CPM-EP。實(shí)驗(yàn)結(jié)果驗(yàn)證了模式挖掘思想是一種十分有效的選擇性集成策略。 4、提出以Bagging為基礎(chǔ)的集成式增量學(xué)習(xí)方法。 論文對(duì)EPIL模型中的基分類(lèi)器構(gòu)造方法進(jìn)行研究,針對(duì)傳統(tǒng)集成式增量學(xué)習(xí)方法對(duì)基分類(lèi)器的結(jié)構(gòu)適應(yīng)性差,提出以Bagging為基礎(chǔ)的集成式增量學(xué)習(xí)方法Bagging++,并提出一種基于Bagging的異構(gòu)基分類(lèi)器構(gòu)造方法。實(shí)驗(yàn)結(jié)果表明,Bagging++具有很好的基分類(lèi)器算法適應(yīng)性,能夠獲得良好的預(yù)測(cè)能力,性能明顯優(yōu)于傳統(tǒng)算法。此外,采用異構(gòu)基分類(lèi)器構(gòu)造方法能夠進(jìn)一步提高集成式增量學(xué)習(xí)的預(yù)測(cè)性能。 5、提出基于選擇性集成的增量學(xué)習(xí)技術(shù)。 論文對(duì)EPIL模型中利用選擇性集成技術(shù)進(jìn)行增量學(xué)習(xí)的具體方法進(jìn)行研究,主要包括基分類(lèi)選擇的時(shí)機(jī),校驗(yàn)樣本集的確定等內(nèi)容,然后針對(duì)Bagging++算法,提出基于局部選擇的LP-Bagging++算法,以及局部與全局選擇相結(jié)合的MP-Bagging++算法。實(shí)驗(yàn)結(jié)果表明,由于全局選擇可剔除失效的基分類(lèi)器,可有效控制目標(biāo)集成分類(lèi)器的規(guī)模,在保證預(yù)測(cè)能力的同時(shí),顯著提高了預(yù)測(cè)的時(shí)空效率。因此,局部與全局相結(jié)合的混合選擇策略更適合當(dāng)前在線學(xué)習(xí)的需求。 6、設(shè)計(jì)并實(shí)現(xiàn)了集成學(xué)習(xí)開(kāi)發(fā)平臺(tái)LibEP。 在前面研究結(jié)果的基礎(chǔ)上,論文設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)開(kāi)源的集成學(xué)習(xí)開(kāi)發(fā)平臺(tái)LibEP。該平臺(tái)涵蓋的算法包括了集成學(xué)習(xí)研究的所有主要方面,包括樣本操作方法、基分類(lèi)器學(xué)習(xí)算法、集成學(xué)習(xí)算法、選擇性集成算法、增量學(xué)習(xí)算法、性能評(píng)估算法等。LibEP平臺(tái)的接口簡(jiǎn)單,易于使用,能夠方便地集成到用戶(hù)的程序中。該開(kāi)發(fā)平臺(tái)采用標(biāo)準(zhǔn)C++語(yǔ)言實(shí)現(xiàn),運(yùn)行性能高、可移植性好,功能易于擴(kuò)展。 本文從模型、算法和實(shí)驗(yàn)研究的角度,探討了選擇性集成與增量學(xué)習(xí)相結(jié)合的在線學(xué)習(xí)技術(shù)。而在下一步,通過(guò)將論文的研究?jī)?nèi)容與實(shí)際應(yīng)用相結(jié)合,作者將致力于推動(dòng)該項(xiàng)技術(shù)在需要高性能、高效率的機(jī)器學(xué)習(xí)應(yīng)用領(lǐng)域中發(fā)揮出重要的作用。
[Abstract]:The machine learning technology has been playing a great role in many fields. Data is analyzed and useful information and knowledge is obtained to guide subsequent decision-making, which is the ultimate goal of machine learning. With the popularization of the Internet, the method of data acquisition is gradually rich, and the data volume obtained by people is exponentially increasing, thus causing the challenge to the traditional machine learning technology. For the business of on-line transaction, on-line advertisement, financial analysis and search engine, which is based on the Internet, it is of great significance to carry out fast and effective learning for large-scale, long-time and continuous data. On-line machine learning is an important means to deal with a large amount of data in a timely manner, and the prediction ability and the prediction efficiency become the most important evaluation standard of the online learning method. As the most important online machine learning strategy, the incremental learning method can be divided into single-classifier incremental learning and integrated incremental learning. The single-classifier method has the advantages of easy adaptation and low prediction capability. With the continuous operation of the system, the integrated learning method usually results in an ever-increasing scale of the target integrated classifier, and the prediction cost is increasing. In batch machine learning, selective integration can effectively improve the prediction ability and prediction of the integrated classifier In view of the problem of supervised learning and classification, this paper puts forward that the selective integration technology is used for integrated incremental learning, so as to improve the prediction and prediction efficiency of on-line learning. In this paper, the on-line learning model of selective integration and incremental learning is put forward, and the key technology involved in this paper is further studied. Research. The main work and innovation of the paper The method comprises the following steps of: 1, proposing an on-line learning mode combining the selective integration and the incremental learning, In this paper, we put forward the on-line learning model EPIL with the combination of selective integration and incremental learning for the actual demand in all fields and the defects of the present on-line learning technology. and selecting a global base classifier which is out of date by using the global selection, so that the target set The classifier has the advantages of small scale, strong prediction capability and good performance. good incremental learning ability. In this paper, the selective integration strategy in the EIL model is studied, the selective integration strategy based on the model mining is proposed, and the framework of the selective integration algorithm based on the strategy is constructed. In this paper, the key technologies in the framework are analyzed in detail. In the selective integration strategy based on pattern mining, the problem of selective integration is described as the problem of mining a pattern from the transaction database, so that the transaction and the model can be utilized. The selection of the base classifier based on the mining technology is a selective integration. the research of the method has developed a new direction. In this paper, a selective integration algorithm based on pattern mining is proposed, which is derived from the selective integration strategy based on pattern mining. and then the two selective integration algorithms are given by using the concept: the cpm-ep and the pmep. cpm-ep and pmep algorithms all use the overlay mode mining idea and the majority of the voting method principles to obtain candidate sub-modes of various lengths, The PMEP then uses the greedy strategy to construct the target integrated classifier. However, the PMEP creates a FP-Tree for the original transaction database, and then obtains the candidate sub-mode from the FP-Tree, so as to avoid The frequent operation of the transaction database saves a lot of overhead. The experimental results show that the base classifier of the CPM-EP and PMEP algorithm The method has the advantages of high speed, small scale of the target integrated classifier, strong prediction capability, In this paper, PMEP is better than CPM-EP in the time of selection. The idea of mining is a very effective and selective integration strategy. This paper presents an integrated incremental learning method based on Bagging. The paper studies the construction method of the base classifier in the EIL model, and puts forward an integrated incremental learning method based on Bagging for the traditional integrated incremental learning method. The experimental results show that Bagging ++ has a good base classifier algorithm. adaptability, can obtain good prediction capability, performance is obviously superior to the traditional algorithm, The method of constructing the classifier can further improve the integrated increment. The predictive performance of learning is 5, and the incremental learning technology based on selective integration is put forward. The paper studies the specific methods of the increment learning by using the selective integration technique in the EPIL model, mainly including the timing of the base classification selection, the determination of the sample set, and then the Bag. The LP-Bagging based on the local selection is proposed based on the logging + + algorithm An MP-Bagging + + algorithm based on the combination of local and global selection is presented in this paper. The experimental results show that the control target can be effectively controlled by the global selection. The scale of the integrated classifier significantly improves the spatial and temporal efficiency of the prediction while ensuring the prediction capability. rate. As a result, the mix-selection strategy that is locally combined with the global is more appropriate To meet the current online learning requirements. 6. Design and implement the LibEP of the integrated learning and development platform. On the basis of the previous research results, the thesis designs and implements an open source integrated learning and development platform LibEP. The proposed algorithm includes all the main aspects of the integrated learning study, including the sample operation method. a base classifier learning algorithm, an integrated learning algorithm, a selective integration algorithm, an incremental learning algorithm, a performance, The LibEP platform has a simple interface and is easy to use and can be easily integrated into the user's range In order, the development platform is realized with the standard C ++ language, the running performance is high, the portability is good, and the function is easy to expand. In this paper, the on-line learning technology of selective integration and incremental learning is discussed from the perspective of the model, the algorithm and the experimental research. In the next step, the research contents of the thesis are combined with the practical application
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2010
【分類(lèi)號(hào)】:TP18
本文編號(hào):2339734
[Abstract]:The machine learning technology has been playing a great role in many fields. Data is analyzed and useful information and knowledge is obtained to guide subsequent decision-making, which is the ultimate goal of machine learning. With the popularization of the Internet, the method of data acquisition is gradually rich, and the data volume obtained by people is exponentially increasing, thus causing the challenge to the traditional machine learning technology. For the business of on-line transaction, on-line advertisement, financial analysis and search engine, which is based on the Internet, it is of great significance to carry out fast and effective learning for large-scale, long-time and continuous data. On-line machine learning is an important means to deal with a large amount of data in a timely manner, and the prediction ability and the prediction efficiency become the most important evaluation standard of the online learning method. As the most important online machine learning strategy, the incremental learning method can be divided into single-classifier incremental learning and integrated incremental learning. The single-classifier method has the advantages of easy adaptation and low prediction capability. With the continuous operation of the system, the integrated learning method usually results in an ever-increasing scale of the target integrated classifier, and the prediction cost is increasing. In batch machine learning, selective integration can effectively improve the prediction ability and prediction of the integrated classifier In view of the problem of supervised learning and classification, this paper puts forward that the selective integration technology is used for integrated incremental learning, so as to improve the prediction and prediction efficiency of on-line learning. In this paper, the on-line learning model of selective integration and incremental learning is put forward, and the key technology involved in this paper is further studied. Research. The main work and innovation of the paper The method comprises the following steps of: 1, proposing an on-line learning mode combining the selective integration and the incremental learning, In this paper, we put forward the on-line learning model EPIL with the combination of selective integration and incremental learning for the actual demand in all fields and the defects of the present on-line learning technology. and selecting a global base classifier which is out of date by using the global selection, so that the target set The classifier has the advantages of small scale, strong prediction capability and good performance. good incremental learning ability. In this paper, the selective integration strategy in the EIL model is studied, the selective integration strategy based on the model mining is proposed, and the framework of the selective integration algorithm based on the strategy is constructed. In this paper, the key technologies in the framework are analyzed in detail. In the selective integration strategy based on pattern mining, the problem of selective integration is described as the problem of mining a pattern from the transaction database, so that the transaction and the model can be utilized. The selection of the base classifier based on the mining technology is a selective integration. the research of the method has developed a new direction. In this paper, a selective integration algorithm based on pattern mining is proposed, which is derived from the selective integration strategy based on pattern mining. and then the two selective integration algorithms are given by using the concept: the cpm-ep and the pmep. cpm-ep and pmep algorithms all use the overlay mode mining idea and the majority of the voting method principles to obtain candidate sub-modes of various lengths, The PMEP then uses the greedy strategy to construct the target integrated classifier. However, the PMEP creates a FP-Tree for the original transaction database, and then obtains the candidate sub-mode from the FP-Tree, so as to avoid The frequent operation of the transaction database saves a lot of overhead. The experimental results show that the base classifier of the CPM-EP and PMEP algorithm The method has the advantages of high speed, small scale of the target integrated classifier, strong prediction capability, In this paper, PMEP is better than CPM-EP in the time of selection. The idea of mining is a very effective and selective integration strategy. This paper presents an integrated incremental learning method based on Bagging. The paper studies the construction method of the base classifier in the EIL model, and puts forward an integrated incremental learning method based on Bagging for the traditional integrated incremental learning method. The experimental results show that Bagging ++ has a good base classifier algorithm. adaptability, can obtain good prediction capability, performance is obviously superior to the traditional algorithm, The method of constructing the classifier can further improve the integrated increment. The predictive performance of learning is 5, and the incremental learning technology based on selective integration is put forward. The paper studies the specific methods of the increment learning by using the selective integration technique in the EPIL model, mainly including the timing of the base classification selection, the determination of the sample set, and then the Bag. The LP-Bagging based on the local selection is proposed based on the logging + + algorithm An MP-Bagging + + algorithm based on the combination of local and global selection is presented in this paper. The experimental results show that the control target can be effectively controlled by the global selection. The scale of the integrated classifier significantly improves the spatial and temporal efficiency of the prediction while ensuring the prediction capability. rate. As a result, the mix-selection strategy that is locally combined with the global is more appropriate To meet the current online learning requirements. 6. Design and implement the LibEP of the integrated learning and development platform. On the basis of the previous research results, the thesis designs and implements an open source integrated learning and development platform LibEP. The proposed algorithm includes all the main aspects of the integrated learning study, including the sample operation method. a base classifier learning algorithm, an integrated learning algorithm, a selective integration algorithm, an incremental learning algorithm, a performance, The LibEP platform has a simple interface and is easy to use and can be easily integrated into the user's range In order, the development platform is realized with the standard C ++ language, the running performance is high, the portability is good, and the function is easy to expand. In this paper, the on-line learning technology of selective integration and incremental learning is discussed from the perspective of the model, the algorithm and the experimental research. In the next step, the research contents of the thesis are combined with the practical application
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2010
【分類(lèi)號(hào)】:TP18
【引證文獻(xiàn)】
相關(guān)博士學(xué)位論文 前1條
1 牛小飛;基于遺傳規(guī)劃和集成學(xué)習(xí)的Web Spam檢測(cè)關(guān)鍵技術(shù)研究[D];山東大學(xué);2012年
,本文編號(hào):2339734
本文鏈接:http://www.sikaile.net/wenyilunwen/guanggaoshejilunwen/2339734.html
最近更新
教材專(zhuān)著