面向類不平衡問題的邏輯回歸分類學習算法研究
發(fā)布時間:2018-04-24 16:39
本文選題:邏輯回歸 + 類不平衡; 參考:《信陽師范學院》2017年碩士論文
【摘要】:類不平衡問題是模式識別和機器學習領域的熱門研究問題之一,其特征是某些類實例數明顯少于其它類實例數。在實際應用中,正確識別少數類實例往往比正確識別多數類實例更有價值。例如在醫(yī)療診斷中,只有極少數人是癌癥患者,如何正確識別這些癌癥患者具有重要意義。然而,作為經典的統計分類方法,邏輯回歸試圖通過假設數據集中各類的實例數目相當,以達到總體高準確率的分類目的。這往往導致學習到的模型不能很好地捕獲少數類實例特征,進而誤分少數類實例。針對該問題,本文提出了兩種面向類不平衡問題的邏輯回歸分類學習算法:(1)提出新的針對類不平衡的邏輯回歸學習算法。邏輯回歸使用最大似然估計法求解模型參數,這導致模型很難捕獲少數類實例特征。針對該問題,本文構造了一種基于最大似然函數和召回率的度量指標MLER(Maximum Likelihood Evaluation and Recall)。與最大似然目標函數不同,MLER同時考慮模型的準確率和召回率,進而保證模型在所有類上的性能。根據MLER,本文提出了一種面向類不平衡問題的邏輯回歸新算法LRIL(Logistic Regression for Imbalanced Learning)。依據MLER,LRIL使用牛頓法學習相關參數。實驗結果表明,LRIL在保持邏輯回歸高準確率的前提下,有效地提高了其在召回率、f-measure以及g-mean上的性能,同時與其它高級方法相比,LRIL也表現出明顯優(yōu)勢。(2)針對類不平衡問題中類分布不均衡這一特征,提出了基于k-means和邏輯回歸混合策略的類不平衡學習算法ILKLR(Imbalanced Learning based on k-means and Logistic Regression)。不同于傳統的邏輯回歸方法,ILKLR采用k-means算法將多數類數據集劃分成多個子簇并關聯新的類標號,進而達到訓練集線性可分的目的。實驗結果顯示,本文提出的數據預處理方法比傳統邏輯回歸、欠抽樣邏輯回歸、過抽樣邏輯回歸等方法在召回率、g-mean和f-measure等指標上效果更優(yōu)。
[Abstract]:Class imbalance is one of the most popular problems in the field of pattern recognition and machine learning, which is characterized by the fact that the number of instances in some classes is obviously less than the number of instances in other classes. In practical applications, it is more valuable to recognize a few class instances correctly than to identify most class instances correctly. For example, in medical diagnosis, only a small number of people are cancer patients, how to correctly identify these cancer patients has important significance. However, as a classical statistical classification method, logical regression attempts to achieve the goal of overall high accuracy by assuming that the number of instances in the data set is equal. This often leads to the learning model can not capture the characteristics of a few class instances and misdivide the few instances. In order to solve this problem, this paper proposes two kinds of learning algorithms of logic regression classification for class unbalance problem: (1) A new algorithm of logic regression learning for class unbalance is proposed. The method of maximum likelihood estimation is used to solve the model parameters, which makes it difficult for the model to capture a few instance features. In order to solve this problem, a MLER(Maximum Likelihood Evaluation and recall based on maximum likelihood function and recall rate is constructed in this paper. Different from the maximum likelihood objective function, MLER considers the accuracy and recall of the model simultaneously, thus ensuring the performance of the model on all classes. According to MLERs, this paper presents a new logic regression algorithm, LRIL(Logistic Regression for Imbalanced learning, which is oriented to class imbalance problem. According to MLER-LRIL, Newton's method is used to learn the relevant parameters. The experimental results show that LRIL can effectively improve its performance on f-measure and g-mean on the premise of keeping high accuracy of logical regression. At the same time, compared with other advanced methods, LRIL also shows obvious advantages. (2) aiming at the feature of class disequilibrium in class imbalance problem, a class unbalance learning algorithm ILKLR(Imbalanced Learning based on k-means and Logistic regulation based on the mixed strategy of k-means and logical regression is proposed. Different from the traditional logical regression method, ILKLR uses k-means algorithm to divide the majority of class data sets into multiple subclusters and associate new class labels, thus achieving the purpose of linearly separable training sets. The experimental results show that the proposed data preprocessing method is more effective than the traditional logical regression, under-sampling logical regression and over-sampling logical regression in the recall rate of g-mean and f-measure.
【學位授予單位】:信陽師范學院
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP181
【參考文獻】
相關期刊論文 前4條
1 鄔長安;鄭桂榮;郭華平;;不平衡類分類問題的邏輯判別式算法[J];信陽師范學院學報(自然科學版);2016年02期
2 郭華平;董亞東;鄔長安;范明;;面向類不平衡的邏輯回歸方法[J];模式識別與人工智能;2015年08期
3 職為梅;郭華平;范明;葉陽東;;非平衡數據集分類方法探討[J];計算機科學;2012年S1期
4 職為梅;范明;葉陽東;;樣本大小對非平衡數據分類的影響[J];微型機與應用;2010年19期
相關博士學位論文 前1條
1 唐明珠;類別不平衡和誤分類代價不等的數據集分類方法及應用[D];中南大學;2012年
,本文編號:1797447
本文鏈接:http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/1797447.html