數(shù)據(jù)歸一化方法對提升SVM訓練效率的研究

發(fā)布時間：2018-08-13 14:06

【摘要】：支持向量機(Support Vector Machines,SVM)是基于統(tǒng)計學習理論,建立在結構風險最小化原理和VC維理論基礎上的一種機器學習方法。近幾十年來以其優(yōu)秀的分類能力在很多領域得到廣泛應用,至今仍然是機器學習領域最熱門的研究之一,眾多的國內外學者都致力于SVM訓練效率的提升。數(shù)據(jù)歸一化是訓練支持向量機必須的數(shù)據(jù)預處理過程。常用的歸一化策略有[-1,+1]、N(0,1)等方法,但現(xiàn)有文獻尚未發(fā)現(xiàn)關于這些常用歸一化方法科學依據(jù)方面的研究。本文通過對SVM中順序最小優(yōu)化算法運行機制的研究,發(fā)現(xiàn)高斯核函數(shù)會受到數(shù)據(jù)樣本屬性值的影響,數(shù)據(jù)屬性值過大或過小都會使高斯核函數(shù)的參與度降低。數(shù)據(jù)歸一化恰好能夠將數(shù)據(jù)限定在某一范圍內,使其能夠更好地配合高斯核半徑,從而避免最優(yōu)分類超平面過于崎嶇。論文以經(jīng)驗性的實驗對數(shù)據(jù)歸一化的內在機理、歸一化與不歸一化對訓練效率和模型預測能力影響等方面開展了探索和研究。論文選擇標準數(shù)據(jù)集,對原始未歸一化、不同方法歸一化、人工非歸一化、任選數(shù)據(jù)屬性列等情況下的數(shù)據(jù)分別進行了SVM訓練,并記錄目標函數(shù)值隨迭代次數(shù)的變化、訓練時間、模型測試及k-CV性能等信息。概括起來取得了如下的研究成果:(1)在傳統(tǒng)的順序最小優(yōu)化算法(SMO)的基礎上,總結出了目標函數(shù)值及其變化量的表達式,并使用C++11技術進行了算法編程,實現(xiàn)了目標函數(shù)值及其變化值和訓練時間及測試正確率的計算和輸出。對使用高斯核函數(shù)的順序最小優(yōu)化算法的典型研究文獻進行深入分析,確定了高斯核半徑的最優(yōu)值λ以及違反KKT條件的精度值κ。實驗結果表明所確定的λ值和κ值能夠達到最好的泛化能力,并通過對輸出數(shù)據(jù)變化曲線的分析得出有根據(jù)的結論:可以通過數(shù)據(jù)的預處理來改進SVM訓練效率。(2)對數(shù)據(jù)預處理的方式方法進行了深入研究,尤其是對最值歸一化、中值歸一化、標準分數(shù)歸一化三種不同數(shù)據(jù)歸一化方法進行了應用實現(xiàn),使其與SVM分類機進行了有機融合。實驗結果表明數(shù)據(jù)歸一化方法可以彌補高斯核函數(shù)核半徑認為選擇上的不足,使高斯核函數(shù)更加理想地應用于SVM分類。(3)對標準實驗數(shù)據(jù)集以三種不同的數(shù)據(jù)歸一化方法進行了預處理,設計了多種實驗方式,利用k-CV驗證方法,對訓練時間以及測試正確率進行了詳細記錄和比較。最終通過分析數(shù)據(jù)歸一化后SVM訓練效率的變化得出了數(shù)據(jù)歸一化可以提升SVM訓練效率的較為根本的內在機制。(4)通過數(shù)據(jù)歸一化對SVM訓練效率影響的分析以及對分類能力差異的比較,分析出了最能提升SVM訓練效率的數(shù)據(jù)歸一化的最優(yōu)限定原則,即將各數(shù)據(jù)屬性的值控制在常規(guī)的可比擬的數(shù)值范圍內,如:[-0.5,+0.5]~[-5,+5]、N(0,1)~N(0,5)等。通過大量的實驗分析驗證,數(shù)據(jù)歸一化能夠有效的提升SVM的訓練效率。本文為SVM以及一般機器學習算法的數(shù)據(jù)歸一化提供了科學依據(jù)。
[Abstract]:Support Vector Machines (SVM) is a machine learning method based on statistical learning theory, structural risk minimization principle and VC dimension theory. It has been widely used in many fields for its excellent classification ability in recent decades, and is still one of the most popular research fields in machine learning. Data normalization is a necessary data preprocessing process for SVM training. The commonly used normalization strategies are [-1,+1], N (0,1), etc. However, the existing literature has not found the scientific basis for these commonly used normalization methods. In this paper, the order of SVM is minimal. It is found that the Gaussian kernel function will be affected by the attribute values of the data samples, and the participation of the Gaussian kernel function will be reduced if the attribute values are too large or too small. The plane is too rugged. The paper explores and studies the internal mechanism of data normalization by empirical experiments, and the effects of normalization and non-normalization on training efficiency and model prediction ability. The data are trained by SVM and the changes of the objective function values with the number of iterations, training time, model testing and k-CV performance are recorded. The algorithm is programmed with C++ 11 technology, and the calculation and output of the objective function value, its variation value, training time and test accuracy are realized. The typical research literature of sequential minimization optimization algorithm using Gaussian kernel function is deeply analyzed, and the optimal value of Gaussian kernel radius is determined, and the precision value of violation of KKT condition is determined. The results show that the determined values of lambda and kappa can achieve the best generalization ability, and through the analysis of the change curve of output data, we can draw a reasonable conclusion that the training efficiency of SVM can be improved by data pretreatment. (2) The methods of data pretreatment are studied deeply, especially the normalization of the maximum value and the normalization of the median value. Three different data normalization methods of standard fraction normalization are applied to SVM classifier. The experimental results show that the data normalization method can compensate for the shortage of kernel radius of Gaussian kernel function and make Gaussian kernel function more ideal for SVM classification. (3) Standard experimental data sets. Three different data normalization methods are used to preprocess the SVM data, and a variety of experimental methods are designed. The training time and test accuracy are recorded and compared in detail by using k-CV verification method. (4) By analyzing the effect of data normalization on the training efficiency of SVM and comparing the difference of classification ability, the optimal criterion of data normalization which can improve the training efficiency of SVM is put forward, i.e. the value of each data attribute is controlled within the conventional comparable range, such as: [-0.5, +0.5]~[-5, +5], N (0,1) ~ N (0,0) N (0). Through a large number of experimental analysis and verification, data normalization can effectively improve the training efficiency of SVM. This paper provides a scientific basis for data normalization of SVM and general machine learning algorithms.
【學位授予單位】：山東師范大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP181

【參考文獻】

相關期刊論文前10條

1 柴巖;王慶菊;;基于邊界向量的樣本取樣SMO算法[J];系統(tǒng)工程;2015年06期

2 劉洛霞;;基于SVM的多變量函數(shù)回歸分析研究(英文)[J];電光與控制;2013年06期

3 王新志;陳偉;祝明坤;;樣本數(shù)據(jù)歸一化方式對GPS高程轉換的影響[J];測繪科學;2013年06期

4 趙長春;姜曉愛;金英漢;;非線性回歸支持向量機的SMO算法改進[J];北京航空航天大學學報;2014年01期

5 劉學藝;李平;郜傳厚;;極限學習機的快速留一交叉驗證算法[J];上海交通大學學報;2011年08期

6 顧亞祥;丁世飛;;支持向量機研究進展[J];計算機科學;2011年02期

7 ;A new data normalization method for unsupervised anomaly intrusion detection[J];Journal of Zhejiang University-Science C(Computers & Electronics);2010年10期

8 濮定國;金中;;新的拉格朗日乘子方法[J];同濟大學學報(自然科學版);2010年09期

9 駱世廣;駱昌日;;加快SMO算法訓練速度的策略研究[J];計算機工程與應用;2007年33期

10 談效俊;張永新;錢敏平;張幼怡;鄧明華;;芯片數(shù)據(jù)標準化方法比較研究[J];生物化學與生物物理進展;2007年06期

相關博士學位論文前1條

1 段會川;高斯核函數(shù)支持向量分類機超級參數(shù)有效范圍研究[D];山東師范大學;2012年

相關碩士學位論文前2條

1 王正鵬;數(shù)據(jù)標準化及隨機游走下的語義關系相似度計算[D];復旦大學;2012年

2 于丹;基因芯片數(shù)據(jù)歸一化處理的幾點研究[D];浙江大學;2008年

，

本文編號：2181215

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/2181215.html

上一篇：層疊式柔版印刷機控制系統(tǒng)設計
下一篇：面向小型金屬鑄件的機器人自動拋磨關鍵技術研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

數(shù)據(jù)歸一化方法對提升SVM訓練效率的研究