基于懲罰似然的變量選擇方法及其在高維數(shù)據(jù)中的應(yīng)用

發(fā)布時(shí)間：2018-09-03 18:13

【摘要】：隨著信息技術(shù)的快速發(fā)展,我們能夠獲得到的數(shù)據(jù)信息量和變量維數(shù)越來越大。如何從眾多候選模型中選擇最佳的一個(gè),就成為計(jì)量經(jīng)濟(jì)學(xué)重要的研究內(nèi)容。好的變量選擇方法能夠改變傳統(tǒng)方法存在的計(jì)算量大和過度擬合等問題,選出的模型有良好的預(yù)測精度和預(yù)測能力,有效地排除掉干擾變量,獲得最簡潔的模型。懲罰似然函數(shù)法作為連續(xù)的最優(yōu)化過程,與傳統(tǒng)的離散方法相比更穩(wěn)定,即使變量個(gè)數(shù)很大時(shí),通過運(yùn)用合理的算法也能有效的執(zhí)行。因此對于高維數(shù)據(jù)模型來說,用懲罰似然函數(shù)法來進(jìn)行模型選擇將會更加有效,準(zhǔn)確,穩(wěn)定。本文基于懲罰似然函數(shù)方法,研究了幾類高維數(shù)據(jù)模型的變量選擇方法,獲得的方法能夠同時(shí)進(jìn)行模型選擇和變量估計(jì);此外,運(yùn)用概率論和數(shù)理統(tǒng)計(jì)知識證明了估計(jì)量具有Oracle性質(zhì),包括能夠以概率趨于1正確地選擇模型以及估計(jì)量漸近地服從正態(tài)分布。具體來說,本文研究的方法及主要結(jié)論如下:首先,本文提出了高維數(shù)據(jù)模型自適應(yīng)橋估計(jì)方法。受橋估計(jì)方法的啟發(fā),本文按照變量的重要性程度對懲罰項(xiàng)施加不同的權(quán)重,研究自適應(yīng)橋估計(jì)量是否滿足好的估計(jì)量的標(biāo)準(zhǔn),即是否具有Oracle性質(zhì),包括能否以概率趨于1正確地選擇模型以及估計(jì)量是否漸近地服從正態(tài)分布。本文證明了在適當(dāng)?shù)臈l件下,自適應(yīng)橋估計(jì)方法具有Oracle性質(zhì)。通過隨機(jī)模擬和實(shí)際數(shù)據(jù)來評價(jià)自適應(yīng)橋估計(jì)方法的良好的數(shù)值表現(xiàn)和實(shí)證表現(xiàn)。其次,本文研究了高維數(shù)據(jù)線性回歸模型的M-估計(jì)方法,討論了懲罰項(xiàng)為局部線性逼近情形下的估計(jì)量的性質(zhì)。M-估計(jì)方法是涵蓋最小一乘估計(jì)、分位數(shù)回歸、最小二乘估計(jì)以及Huber回歸的框架性方法。當(dāng)數(shù)據(jù)出現(xiàn)異常值或誤差項(xiàng)服從厚尾分布時(shí),此時(shí)M-估計(jì)的特殊情形——最小一乘回歸比最小二乘估計(jì)更加穩(wěn)健。本文在理論上證明,通過施加一定的條件,M-估計(jì)和局部線性逼近結(jié)合作為目標(biāo)函數(shù)獲得的估計(jì)量具有良好的大樣本性質(zhì);在數(shù)值模擬部分,選擇了編寫合適的算法展現(xiàn)了該方法具有更好的穩(wěn)健性;對于超高維數(shù)據(jù)模型,我們也通過模擬說明向后回歸與我們提出的方法相結(jié)合表現(xiàn)更好;在實(shí)證部分,通過實(shí)際數(shù)據(jù)說明了我們提出的方法能夠很好的選擇變量和估計(jì)參數(shù)。最后,本文研究了高維情形下基于Logistic模型的信貸違約客戶識別方法。選取了信用評分模型中常用的Logistic模型對信貸違約行為的影響因素進(jìn)行識別,同時(shí)利用所建立的Logistic模型對信貸客戶的違約風(fēng)險(xiǎn)進(jìn)行衡量與預(yù)測。數(shù)值模擬結(jié)果表明,本文提出的變量選擇方法是有效的。實(shí)證結(jié)果也說明運(yùn)用本文提出的高維數(shù)據(jù)模型的變量選擇方法,可以選出具有較高解釋能力和預(yù)測能力的模型。
[Abstract]:With the rapid development of information technology, we can obtain more and more data information and variable dimension. How to choose the best one from many candidate models has become an important research content in econometrics. A good variable selection method can change the problems existing in the traditional methods, such as large computation and over-fitting. The selected model has good prediction accuracy and prediction ability, effectively eliminates the interference variables, and obtains the most concise model. As a continuous optimization process, the penalty likelihood function method is more stable than the traditional discrete method, even when the number of variables is large, it can be executed effectively by using reasonable algorithm. Therefore, for high dimensional data model, it is more effective, accurate and stable to select the model by using the penalty likelihood function method. In this paper, based on the penalty likelihood function method, the variable selection methods for several kinds of high-dimensional data models are studied. The obtained methods can be used for model selection and variable estimation at the same time. By using probability theory and mathematical statistics, it is proved that the estimator has Oracle property, including the possibility of selecting the model correctly with probability approaching 1, and the asymptotic acceptance of the estimator from the normal distribution. The main conclusions are as follows: firstly, an adaptive bridge estimation method for high dimensional data model is proposed. Inspired by the bridge estimation method, this paper applies different weights to the penalty term according to the importance of the variable, and studies whether the adaptive bridge estimator meets the criteria of good estimator, that is, whether the adaptive bridge estimator has Oracle property. It includes whether the model can be selected correctly with probability approaching 1 and whether the estimator is asymptotically obedient to the normal distribution. In this paper, we prove that the adaptive bridge estimation method has Oracle property under proper conditions. The good numerical and empirical performance of the adaptive bridge estimation method is evaluated by random simulation and actual data. Secondly, in this paper, we study the M- estimation method of the linear regression model of high dimensional data, and discuss the properties of the estimator under the condition that the penalty term is local linear approximation. The frame method of least square estimation and Huber regression. When the outliers or error terms are distributed from the thick tail, the special case of M- estimation is more robust than the least square estimation. In this paper, it is theoretically proved that the estimator obtained by applying certain conditions and combining local linear approximation with M- estimator as objective function has a good large sample property. Choosing the appropriate algorithm to show that the method has better robustness; for ultra-high dimensional data model, we also show that backward regression and our proposed method is better; in the empirical part, The actual data show that the proposed method can select variables and estimate parameters well. Finally, this paper studies the identification method of credit default customers based on Logistic model. The Logistic model which is commonly used in the credit scoring model is selected to identify the influencing factors of the credit default and the Logistic model is used to measure and predict the default risk of the credit customers. The numerical simulation results show that the proposed variable selection method is effective. The empirical results also show that using the variable selection method of the high-dimensional data model proposed in this paper, we can select the model with higher interpretation and prediction ability.
【學(xué)位授予單位】：對外經(jīng)濟(jì)貿(mào)易大學(xué)
【學(xué)位級別】：博士
【學(xué)位授予年份】：2017
【分類號】：F224

【相似文獻(xiàn)】

相關(guān)期刊論文前5條

1 吳翌琳;林寅;陳昊;;基于色差法的高維數(shù)據(jù)展示方法初探[J];統(tǒng)計(jì)與決策;2011年07期

2 吳武清;汪成杰;蔣勇;陳敏;;高維數(shù)據(jù)選元:方法比較及其在納稅評估中的應(yīng)用[J];管理評論;2013年08期

3 郝媛;高學(xué)東;孟海東;;高維數(shù)據(jù)對象聚類算法效果分析[J];中國管理信息化;2012年08期

4 郭茜;朱杰;;高維數(shù)據(jù)挖掘技術(shù)在教學(xué)質(zhì)量監(jiān)控與評價(jià)的應(yīng)用研究[J];全國商情(理論研究);2010年11期

5 顧冬娟;戴浩;;改進(jìn)的基于密度和網(wǎng)格的高維聚類算法[J];科技創(chuàng)新導(dǎo)報(bào);2008年22期

相關(guān)會議論文前6條

1 周煜人;彭輝;桂衛(wèi)華;;基于映射的高維數(shù)據(jù)聚類方法[A];04'中國企業(yè)自動化和信息化建設(shè)論壇暨中南六省區(qū)自動化學(xué)會學(xué)術(shù)年會專輯[C];2004年

2 梁俊杰;楊澤新;馮玉才;;大規(guī)模高維數(shù)據(jù)庫索引結(jié)構(gòu)[A];第二十三屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集（研究報(bào)告篇）[C];2006年

3 陳冠華;馬秀莉;楊冬青;唐世渭;帥猛;;面向高維數(shù)據(jù)的低冗余Top-k異常點(diǎn)發(fā)現(xiàn)方法[A];第26屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集（A輯）[C];2009年

4 劉運(yùn)濤;鮑玉斌;吳丹;冷芳玲;孫煥良;于戈;;CBFrag-Cubing:一種基于壓縮位圖的高維數(shù)據(jù)立方創(chuàng)建算法(英文)[A];第二十二屆中國數(shù)據(jù)庫學(xué)術(shù)會議論文集（研究報(bào)告篇）[C];2005年

5 劉文慧;;PCA與PLS用于高維數(shù)據(jù)分類的比較性研究[A];2011年中國衛(wèi)生統(tǒng)計(jì)學(xué)年會會議論文集[C];2011年

6 劉喜蘭;馮德益;王公恕;朱成喜;馮雯;;臉譜分析在中進(jìn)期地震跟蹤預(yù)報(bào)中的應(yīng)用[A];中國地震學(xué)會第四次學(xué)術(shù)大會論文摘要集[C];1992年

相關(guān)重要報(bào)紙文章前1條

1 本報(bào)記者李雙藝;引領(lǐng)高維數(shù)據(jù)分析先河[N];吉林日報(bào);2013年

相關(guān)博士學(xué)位論文前10條

1 劉勝藍(lán);余弦度量下的高維數(shù)據(jù)降維及分類方法研究[D];大連理工大學(xué);2015年

2 黃曉輝;高維數(shù)據(jù)的若干聚類問題及算法研究[D];哈爾濱工業(yè)大學(xué);2015年

3 楊崇;高維數(shù)據(jù)流上的K近鄰問題研究[D];山東大學(xué);2016年

4 路梅;面向高維數(shù)據(jù)的特征學(xué)習(xí)理論與應(yīng)用研究[D];蘇州大學(xué);2016年

5 徐微微;高維數(shù)據(jù)降維可視化研究及其在生物醫(yī)學(xué)中的應(yīng)用[D];武漢大學(xué);2016年

6 連亦e，

本文編號：2220773

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.sikaile.net/shoufeilunwen/jjglss/2220773.html

上一篇：人口城鎮(zhèn)化背景下青島市基本公共服務(wù)研究
下一篇：中國消費(fèi)者的宗教性及其對顧客忠誠的影響

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于懲罰似然的變量選擇方法及其在高維數(shù)據(jù)中的應(yīng)用