數(shù)據(jù)歸一化方法對提升SVM訓練效率的研究
[Abstract]:Support Vector Machines (SVM) is a machine learning method based on statistical learning theory, structural risk minimization principle and VC dimension theory. It has been widely used in many fields for its excellent classification ability in recent decades, and is still one of the most popular research fields in machine learning. Data normalization is a necessary data preprocessing process for SVM training. The commonly used normalization strategies are [-1,+1], N (0,1), etc. However, the existing literature has not found the scientific basis for these commonly used normalization methods. In this paper, the order of SVM is minimal. It is found that the Gaussian kernel function will be affected by the attribute values of the data samples, and the participation of the Gaussian kernel function will be reduced if the attribute values are too large or too small. The plane is too rugged. The paper explores and studies the internal mechanism of data normalization by empirical experiments, and the effects of normalization and non-normalization on training efficiency and model prediction ability. The data are trained by SVM and the changes of the objective function values with the number of iterations, training time, model testing and k-CV performance are recorded. The algorithm is programmed with C++ 11 technology, and the calculation and output of the objective function value, its variation value, training time and test accuracy are realized. The typical research literature of sequential minimization optimization algorithm using Gaussian kernel function is deeply analyzed, and the optimal value of Gaussian kernel radius is determined, and the precision value of violation of KKT condition is determined. The results show that the determined values of lambda and kappa can achieve the best generalization ability, and through the analysis of the change curve of output data, we can draw a reasonable conclusion that the training efficiency of SVM can be improved by data pretreatment. (2) The methods of data pretreatment are studied deeply, especially the normalization of the maximum value and the normalization of the median value. Three different data normalization methods of standard fraction normalization are applied to SVM classifier. The experimental results show that the data normalization method can compensate for the shortage of kernel radius of Gaussian kernel function and make Gaussian kernel function more ideal for SVM classification. (3) Standard experimental data sets. Three different data normalization methods are used to preprocess the SVM data, and a variety of experimental methods are designed. The training time and test accuracy are recorded and compared in detail by using k-CV verification method. (4) By analyzing the effect of data normalization on the training efficiency of SVM and comparing the difference of classification ability, the optimal criterion of data normalization which can improve the training efficiency of SVM is put forward, i.e. the value of each data attribute is controlled within the conventional comparable range, such as: [-0.5, +0.5]~[-5, +5], N (0,1) ~ N (0,0) N (0). Through a large number of experimental analysis and verification, data normalization can effectively improve the training efficiency of SVM. This paper provides a scientific basis for data normalization of SVM and general machine learning algorithms.
【學位授予單位】:山東師范大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP181
【參考文獻】
相關期刊論文 前10條
1 柴巖;王慶菊;;基于邊界向量的樣本取樣SMO算法[J];系統(tǒng)工程;2015年06期
2 劉洛霞;;基于SVM的多變量函數(shù)回歸分析研究(英文)[J];電光與控制;2013年06期
3 王新志;陳偉;祝明坤;;樣本數(shù)據(jù)歸一化方式對GPS高程轉換的影響[J];測繪科學;2013年06期
4 趙長春;姜曉愛;金英漢;;非線性回歸支持向量機的SMO算法改進[J];北京航空航天大學學報;2014年01期
5 劉學藝;李平;郜傳厚;;極限學習機的快速留一交叉驗證算法[J];上海交通大學學報;2011年08期
6 顧亞祥;丁世飛;;支持向量機研究進展[J];計算機科學;2011年02期
7 ;A new data normalization method for unsupervised anomaly intrusion detection[J];Journal of Zhejiang University-Science C(Computers & Electronics);2010年10期
8 濮定國;金中;;新的拉格朗日乘子方法[J];同濟大學學報(自然科學版);2010年09期
9 駱世廣;駱昌日;;加快SMO算法訓練速度的策略研究[J];計算機工程與應用;2007年33期
10 談效俊;張永新;錢敏平;張幼怡;鄧明華;;芯片數(shù)據(jù)標準化方法比較研究[J];生物化學與生物物理進展;2007年06期
相關博士學位論文 前1條
1 段會川;高斯核函數(shù)支持向量分類機超級參數(shù)有效范圍研究[D];山東師范大學;2012年
相關碩士學位論文 前2條
1 王正鵬;數(shù)據(jù)標準化及隨機游走下的語義關系相似度計算[D];復旦大學;2012年
2 于丹;基因芯片數(shù)據(jù)歸一化處理的幾點研究[D];浙江大學;2008年
,本文編號:2181215
本文鏈接:http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/2181215.html