超高維數(shù)據(jù)下特征篩選方法的研究與應(yīng)用
[Abstract]:With the advent of big data era, ultra-high dimensional data are often encountered in meteorological prediction, pattern recognition, gene research and other fields. For ultra-high dimensional data, only a small number of covariables are correlated with response variables, and the model is sparse because of its high dimension. Traditional robust statistical analysis methods and high-dimensional data variable selection methods will no longer be applicable. In order to better analyze the ultra-high-dimensional data, it is necessary to reduce the dimension. In recent years, many scholars have proposed a variety of convenient ultra-high dimensional variable screening methods. One effective and reasonable method is to divide them into two steps. First, a fast and efficient variable filtering process is used to reduce the ultra-high dimensional data to an appropriate size below the sample size and to retain all important variables. On the basis of this, some mature methods are used to select the variables of high dimensional data after dimensionality reduction. In this paper, two kinds of ultra-high dimensional feature selection methods are proposed, and a robust ultra-high dimensional feature selection method based on interval conditional quantiles is proposed in the presence of heteroscedasticity and heavy-tailed complex ultra-high dimensional data. In the case of incomplete ultra-high dimensional data with random absence of response variables, a method for feature selection of marginal correlation measures based on inverse probabilistic weighting is proposed. The main work of this thesis is as follows: in Chapter 1, the history and present situation of variable selection under ultra-high dimensional data are summarized, and the quantiles and missing data are systematically reviewed and studied. In chapter 2, we propose a robust feature selection method of interval conditional quantiles, which deals with the complex ultra-high dimensional data such as heavy-tailed and outliers. At present, most of the studies of conditional quantiles are based on a single quantile level. The selection of variables depends on the quantile set in advance, which makes the disturbance of quantile point lead to the instability of variable selection. In this paper, the idea of global quantile regression is introduced, and a conditional quantile screening method based on interval is proposed, which makes the screening criteria more accurate. Simulation studies and examples show that the improved method is more stable. In chapter 3, a feature screening method for random deletion of response variables is proposed. In the current research work, feature screening mainly focuses on the problem of complete data. However, in the field of market research, social research and medical research, the random absence of (MAR) in response variables is often found in the field of market research, social research and medical research. A marginal selection process based on inverse probability weighted method is proposed for randomly missing data with response variables. It is also proved by theory, numerical simulation and practical example to verify its validity. In chapter 4, we summarize the two methods of feature selection, and point out that we can study them more deeply.
【學(xué)位授予單位】:南京信息工程大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:O212
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 武森;馮小東;吳慶海;;基于稀疏指數(shù)排序的高維數(shù)據(jù)并行聚類算法[J];系統(tǒng)工程理論與實(shí)踐;2011年S2期
2 楊力行 ,劉金清;投影尋蹤應(yīng)用技術(shù)在水文領(lǐng)域中喜獲豐收[J];水文;1993年02期
3 蔡利平;周緒川;;高維數(shù)據(jù)上的自適應(yīng)譜聚類降維方法研究[J];西南民族大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年05期
4 毛林;陸全華;程濤;;基于高維數(shù)據(jù)的集成邏輯回歸分類算法的研究與應(yīng)用[J];科技通報(bào);2013年12期
5 陳曉明;;海量高維數(shù)據(jù)下分布式特征選擇算法的研究與應(yīng)用[J];科技通報(bào);2013年08期
6 劉立月;黃兆華;劉遵雄;;高維數(shù)據(jù)分類中的特征降維研究[J];江西師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年02期
7 李祚泳;投影尋蹤技術(shù)及其應(yīng)用進(jìn)展[J];自然雜志;1997年04期
8 王家耀;謝明霞;郭建忠;陳科;;基于相似性保持和特征變換的高維數(shù)據(jù)聚類改進(jìn)算法[J];測(cè)繪學(xué)報(bào);2011年03期
9 張嬌;裘國(guó)永;張奇;;基于二分K均值的SVM決策樹的高維數(shù)據(jù)分類方法[J];赤峰學(xué)院學(xué)報(bào)(自然科學(xué)版);2012年07期
10 周迪斌;蔣健明;胡斌;張量;;基于多GPU的千萬級(jí)高維空間實(shí)時(shí)檢索[J];科技通報(bào);2013年01期
相關(guān)會(huì)議論文 前6條
1 周煜人;彭輝;桂衛(wèi)華;;基于映射的高維數(shù)據(jù)聚類方法[A];04'中國(guó)企業(yè)自動(dòng)化和信息化建設(shè)論壇暨中南六省區(qū)自動(dòng)化學(xué)會(huì)學(xué)術(shù)年會(huì)專輯[C];2004年
2 梁俊杰;楊澤新;馮玉才;;大規(guī)模高維數(shù)據(jù)庫索引結(jié)構(gòu)[A];第二十三屆中國(guó)數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(研究報(bào)告篇)[C];2006年
3 陳冠華;馬秀莉;楊冬青;唐世渭;帥猛;;面向高維數(shù)據(jù)的低冗余Top-k異常點(diǎn)發(fā)現(xiàn)方法[A];第26屆中國(guó)數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(A輯)[C];2009年
4 劉運(yùn)濤;鮑玉斌;吳丹;冷芳玲;孫煥良;于戈;;CBFrag-Cubing:一種基于壓縮位圖的高維數(shù)據(jù)立方創(chuàng)建算法(英文)[A];第二十二屆中國(guó)數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(研究報(bào)告篇)[C];2005年
5 劉文慧;;PCA與PLS用于高維數(shù)據(jù)分類的比較性研究[A];2011年中國(guó)衛(wèi)生統(tǒng)計(jì)學(xué)年會(huì)會(huì)議論文集[C];2011年
6 劉喜蘭;馮德益;王公恕;朱成喜;馮雯;;臉譜分析在中進(jìn)期地震跟蹤預(yù)報(bào)中的應(yīng)用[A];中國(guó)地震學(xué)會(huì)第四次學(xué)術(shù)大會(huì)論文摘要集[C];1992年
相關(guān)重要報(bào)紙文章 前1條
1 本報(bào)記者 李雙藝;引領(lǐng)高維數(shù)據(jù)分析先河[N];吉林日?qǐng)?bào);2013年
相關(guān)博士學(xué)位論文 前10條
1 劉勝藍(lán);余弦度量下的高維數(shù)據(jù)降維及分類方法研究[D];大連理工大學(xué);2015年
2 黃曉輝;高維數(shù)據(jù)的若干聚類問題及算法研究[D];哈爾濱工業(yè)大學(xué);2015年
3 楊崇;高維數(shù)據(jù)流上的K近鄰問題研究[D];山東大學(xué);2016年
4 路梅;面向高維數(shù)據(jù)的特征學(xué)習(xí)理論與應(yīng)用研究[D];蘇州大學(xué);2016年
5 徐微微;高維數(shù)據(jù)降維可視化研究及其在生物醫(yī)學(xué)中的應(yīng)用[D];武漢大學(xué);2016年
6 連亦e,
本文編號(hào):2365405
本文鏈接:http://www.sikaile.net/kejilunwen/yysx/2365405.html