耦合的支持向量學(xué)習(xí)方法及應(yīng)用研究

發(fā)布時間：2018-05-30 02:11

本文選題：概念漂移 + 遷移學(xué)習(xí)��；參考：《江南大學(xué)》2016年博士論文

【摘要】：傳統(tǒng)的機器學(xué)習(xí)問題面向的是單一學(xué)習(xí)機問題,當(dāng)前多學(xué)習(xí)機問題正得到越來越多的關(guān)注,但目前尚沒有研究從宏觀的角度來統(tǒng)一來描述多學(xué)習(xí)機問題。多任務(wù)學(xué)習(xí)是同時求解相關(guān)數(shù)據(jù)集上的既關(guān)聯(lián)又有不同特征的多個學(xué)習(xí)機;遷移學(xué)習(xí)則關(guān)注于相關(guān)歷史場景中豐富但又不能直接使用的數(shù)據(jù)或模型對當(dāng)前場景建模的增益作用;概念漂移是對不斷變化的學(xué)習(xí)場景進(jìn)行研究。它們都是直接或間接地對多個子學(xué)習(xí)機及其關(guān)系進(jìn)行研究,本文統(tǒng)一稱之為耦合的機器學(xué)習(xí)方法。本文提出耦合支持向量學(xué)習(xí)的框架,期望在此角度下,可以使多學(xué)習(xí)機問題的研究焦點更多地放在場景之間的耦合特征上。時間自適應(yīng)支持向量機方法在處理非靜態(tài)數(shù)據(jù)集時表現(xiàn)出良好的性能,但僅根據(jù)鄰接子分類器相似而獲得的相關(guān)信息并不充分,由此可能會導(dǎo)致訓(xùn)練所得模型不可靠,限制其應(yīng)用能力。通過定義子分類器序列的相關(guān)性衰減函數(shù),提出新的面向非靜態(tài)數(shù)據(jù)分類問題的演進(jìn)支持向量機(Evolving Support Vector Machines,ESVM)。ESVM使用衰變函數(shù)以體現(xiàn)子分類器之間的相關(guān)程度,通過約束所有子分類器之間的帶權(quán)差異以求得變化更光滑的子分類器序列,契合了數(shù)據(jù)中隱藏的漸變概念。在各種數(shù)據(jù)緩慢變化場景的對比實驗中,該文的ESVM方法優(yōu)于以往方法。雖然時間自適應(yīng)支持向量機有著從兼顧局部優(yōu)化和全局優(yōu)化的角度同時求解多個子分類器的特性,但子分類器之間的直接耦合帶來了計算中的矩陣求偽逆問題,因而難以從理論上保證其擴展核函數(shù)為Mercer核函數(shù);且對于大數(shù)據(jù)集,較高的計算代價限制了其實用性。針對此不足,提出了改進(jìn)型時間自適應(yīng)支持向量機(Improved Time Adaptive Support Vector Machine,ITA-SVM),用基分類器及一組增量來描述子分類器序列,以避免因直接求解子分類器序列而帶來的矩陣求偽逆問題;并結(jié)合CVM理論,給出了ITA-SVM的快速算法。ITA-SVM在處理非靜態(tài)數(shù)據(jù)集時有著與TA-SVM相當(dāng)或更良好的分類性能,同時又具有漸近線性時間復(fù)雜度的優(yōu)點。該方法的有效性在實驗中得到了驗證。傳統(tǒng)的回歸系統(tǒng)構(gòu)建方法在訓(xùn)練時僅考慮單一的場景,其伴隨的一個重要缺陷是:若當(dāng)前場景中重要信息缺失,受訓(xùn)所得系統(tǒng)泛化能力較差。針對此不足,以支持向量回歸機為基礎(chǔ),提出了具有遷移學(xué)習(xí)能力的回歸機系統(tǒng),即遷移學(xué)習(xí)支持向量回歸機(Transfer learning Support Vector Regression,T-SVR)。T-SVR不僅能充分利用當(dāng)前場景的數(shù)據(jù)信息,而且能有效地利用歷史知識來學(xué)習(xí),具有通過遷移歷史場景知識來彌補當(dāng)前場景信息缺失的能力。具體地,通過控制目標(biāo)函數(shù)中當(dāng)前模型與歷史模型的相似性,使當(dāng)前模型能在信息缺失和不足時從歷史場景中得到有益信息,得到增強的當(dāng)前場景模型。在模擬數(shù)據(jù)和汾酒光譜數(shù)據(jù)集上的實驗研究亦驗證了在信息缺失場景下T-SVR較之于傳統(tǒng)回歸系統(tǒng)建模方法的更好適應(yīng)性。多任務(wù)學(xué)習(xí)方法旨在借助相關(guān)任務(wù)中的信息以提高各個子學(xué)習(xí)機的性能,在理論研究及基因測序、網(wǎng)頁分類等實際應(yīng)用方面都已經(jīng)取得了較好的成果。然而以往方法僅關(guān)注于多個任務(wù)之間的關(guān)聯(lián),而未充分考慮算法的復(fù)雜度。當(dāng)前社會信息量的急劇膨脹對多任務(wù)學(xué)習(xí)提出了新的挑戰(zhàn),較高的計算代價限制了以往各種多任務(wù)學(xué)習(xí)方法的實用性。本文提出了快速正則化多任務(wù)學(xué)習(xí)(Fast regularized Multi Task Learning,Fr MTL)方法。Fr MTL方法有著與正則化多任務(wù)學(xué)習(xí)方法相當(dāng)?shù)姆诸愋阅?又能依據(jù)核心向量機技術(shù)獲得漸近線性時間復(fù)雜度,使其在面對大數(shù)據(jù)集時仍然能夠獲得較快的決策速度。
[Abstract]:The problem of the traditional machine learning is the single learning machine problem. The problem of multi learning machine is getting more and more attention. However, there is no research on the multi learning machine problem from the macro point of view. Multi task learning is a multi learning machine with both the correlation and different characteristics on the related data set at the same time; Learning is concerned with the gain of the rich but undirectly used data or models in the relevant historical scenes for the modeling of the current scene; conceptual drift is a study of changing learning scenes. They are both direct or indirect study of the multiple learning machines and their relationships. This article is called the coupled machine science. The framework of coupled support vector learning is proposed in this paper. It is expected that the focus of research on the multi learning machine problem can be placed more on the coupling characteristics between scenes. The time adaptive support vector machine (time adaptive support vector machine) shows good performance when dealing with non static data sets, but it is obtained only according to the similarity of the adjacent Subclassifier. The relevant information is not sufficient, which may lead to the unreliability of the training model and limit its application ability. By defining the correlation attenuation function of the sub classifier sequence, a new Evolving Support Vector Machines (ESVM).ESVM is proposed for the use of the decay function to reflect the non static data classification problem. The degree of correlation between subclassifiers, by constraining the weight difference between all subclassifiers to obtain a more smooth sequence of subclassifiers, fits the concept of hidden gradient in the data. In the contrast experiments of various data slowly changing scenes, the ESVM method in this paper is superior to the previous method. In the view of both local optimization and global optimization, the characteristics of multiple sub classifiers are solved at the same time, but the direct coupling between the sub classifiers brings the matrix pseudo inverse problem in the calculation, so it is difficult to guarantee the extended kernel function as Mercer kernel function in theory, and the higher computation cost limits its practical use for large data sets. In order to solve this problem, an improved time adaptive support vector machine (Improved Time Adaptive Support Vector Machine, ITA-SVM) is proposed, which uses a base classifier and a group of increments to describe the sequence of the Subclassifier to avoid the matrix pseudo inverse problem caused by the direct solution of the sequence of the Subclassifier, and the fast ITA-SVM is given in conjunction with the CVM theory. The fast algorithm.ITA-SVM has the advantages of the equivalent or better classification performance of the non static data set with the TA-SVM and the asymptotically linear time complexity. The effectiveness of the method is verified in the experiment. The traditional construction method of the regression system only considers a single scene in training, and it is accompanied by an important defect. If the important information is missing in the current scene and the generalization ability of the training income system is poor, the regression machine system with the ability of transfer learning is proposed based on the support vector regression machine, that is, the Transfer learning Support Vector Regression, T-SVR.T-SVR can not only make full use of when it is used. The data information of the front scene, and can effectively use the historical knowledge to learn, has the ability to compensate for the absence of the current scene information by migrating the historical scene knowledge. Specifically, by controlling the similarity of the current model and the historical model in the target function, the current model can get from the historical scene when the information is missing and insufficient. The experimental research on the simulated data and the Fenjiu spectral data set also validates the better adaptability of T-SVR to the traditional regression system modeling method in the absence of information. The multi task learning method aims to improve the performance of each learning machine by using the information in the related tasks to improve the performance of each learning machine. Good results have been achieved in theoretical research, gene sequencing, Web classification and other practical applications. However, the previous methods only paid attention to the association between multiple tasks, but did not fully consider the complexity of the algorithm. The rapid expansion of the current social information has put forward new challenges to multi task learning, and the higher computational cost is limited. The practicability of various multitask learning methods. This paper proposes the fast regularization multitask learning (Fast regularized Multi Task Learning, Fr MTL) method.Fr MTL method has the equivalent classification performance with the regularized multitask learning method, and can also obtain the asymptotic linear time complexity based on the core vector machine technology, so that it is facing the big face. Data sets still achieve faster decision making speed.
【學(xué)位授予單位】：江南大學(xué)
【學(xué)位級別】：博士
【學(xué)位授予年份】：2016
【分類號】：TP181

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 ;數(shù)據(jù)集N鄽2[J];航空材料;1959年09期

2 江海洪 ,羅長坤;首套中國數(shù)字化可視人體數(shù)據(jù)集在第三軍醫(yī)大學(xué)研制成功[J];中華醫(yī)學(xué)雜志;2003年09期

3 陳相穎;數(shù)據(jù)集記錄快速定位與篩選方法之探討[J];計量與測試技術(shù);2005年06期

4 張曉斌;魏永祥;韓德民;夏寅;李希平;原林;唐雷;王興海;;數(shù)字化耳鼻咽喉數(shù)據(jù)集的采集[J];中華耳鼻咽喉頭頸外科雜志;2005年06期

5 王宏鼎;唐世渭;董國田;;數(shù)據(jù)集成中數(shù)據(jù)集特征的檢測方法[J];中國金融電腦;2006年03期

6 張華;郁書好;;時空數(shù)據(jù)集的連接處理和優(yōu)化方法研究[J];皖西學(xué)院學(xué)報;2006年02期

7 苗卿;單立新;裘昱;;信息熵在數(shù)據(jù)集分割中的應(yīng)用研究[J];電腦知識與技術(shù)(學(xué)術(shù)交流);2007年05期

8 陳德誠;丘平珠;唐炳莉;;廣西氣象數(shù)據(jù)集設(shè)計與制作[J];氣象研究與應(yīng)用;2007年04期

9 趙鳳英;王崇駿;陳世福;;用于不均衡數(shù)據(jù)集的挖掘方法[J];計算機科學(xué);2007年09期

10 劉密霞;張秋余;趙宏;余冬梅;;入侵檢測報警相關(guān)性及評測數(shù)據(jù)集研究[J];計算機應(yīng)用研究;2008年10期

相關(guān)會議論文前10條

1 田捷;;三維醫(yī)學(xué)影像數(shù)據(jù)集處理的集成化平臺[A];2003年全國醫(yī)學(xué)影像技術(shù)學(xué)術(shù)會議論文匯編[C];2003年

2 范明;魏芳;;挖掘基本顯露模式用于分類[A];第二十一屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集（技術(shù)報告篇）[C];2004年

3 冷傳良;;飛機化銑成樣板劃線數(shù)據(jù)集設(shè)計方法探索[A];第十屆沈陽科學(xué)學(xué)術(shù)年會論文集（信息科學(xué)與工程技術(shù)分冊）[C];2013年

4 孟燁;張鵬;宋大為;王雷;;信息檢索系統(tǒng)性能對數(shù)據(jù)集特性的依賴性分析[A];第十二屆全國人機語音通訊學(xué)術(shù)會議（NCMMSC'2013）論文集[C];2013年

5 段磊;唐常杰;左R，

本文編號：1953446

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/shoufeilunwen/xxkjbs/1953446.html

上一篇：非極性AlGaN材料生長及探測器制備技術(shù)研究
下一篇：動態(tài)網(wǎng)絡(luò)環(huán)境下服務(wù)組合優(yōu)化方法的分析與研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

耦合的支持向量學(xué)習(xí)方法及應(yīng)用研究