基于Gamma分布缺失數(shù)據(jù)的分位數(shù)估計
發(fā)布時間:2018-07-20 12:58
【摘要】:缺失數(shù)據(jù)普遍存在于實驗研究和社會調(diào)查領(lǐng)域中,數(shù)據(jù)缺失問題造成分析任務(wù)加重和統(tǒng)計結(jié)果的不準(zhǔn)確性.如何有效的處理缺失數(shù)據(jù),怎樣才能充分利用數(shù)據(jù)信息,準(zhǔn)確地反映研究群體的特征,達(dá)到預(yù)期研究目的,已成為當(dāng)前統(tǒng)計研究中的難點(diǎn)和熱點(diǎn)問題.由于缺失數(shù)據(jù)往往包含了總體的一些重要信息,要想把統(tǒng)計分析的方法應(yīng)用到處理缺失數(shù)據(jù)的領(lǐng)域,大體思路就是先對不完備數(shù)據(jù)進(jìn)行填補(bǔ),從而得到完整的數(shù)據(jù)集,再對這個數(shù)據(jù)集進(jìn)行分析,所得到的結(jié)論才能保證準(zhǔn)確的反應(yīng)真實情況.本文針對非隨機(jī)缺失機(jī)制下的數(shù)據(jù)缺失問題,在一定的假設(shè)前提下,由已觀察數(shù)據(jù),利用概率統(tǒng)計的方法給出補(bǔ)充數(shù)據(jù),并結(jié)合矩估計給出了分布參數(shù)的統(tǒng)計推斷,進(jìn)而給出完全數(shù)據(jù)的概率分布的分位數(shù)估計,我們比較關(guān)心的是中位數(shù)(0.5分位數(shù))、四分位數(shù)(分別是第一四分位數(shù)/0.25-分位數(shù),第三四分位數(shù)/0.75-分位數(shù))以及0.05分位數(shù)和0.95分位數(shù).同時我們在總體服從Gamma分布,并且成功概率為指數(shù)分布(參數(shù)λ可根據(jù)Gamma分布的參數(shù)和缺失數(shù)據(jù)比例來給定)的假設(shè)下,對Gamma分布的參數(shù)和分位數(shù)做了估計,給出了參數(shù)的迭代公式.在模擬研究階段,利用R語言,分別對缺失比例為1/10、1/5、1/2、2/3進(jìn)行了模擬研究,并對模擬結(jié)果進(jìn)行互相對比,得到了較好的結(jié)果.生活中很多時候兩參數(shù)模型不能滿足實際需要,所以我們在此基礎(chǔ)上把模型推廣到三參數(shù)下,并給出了參數(shù)的迭代公式,對成功概率參數(shù)選取問題做了說明.對后續(xù)處理一些數(shù)據(jù)缺失問題給予一定的幫助.
[Abstract]:Missing data generally exist in the field of experimental research and social investigation. The problem of missing data results in the analysis task and the inaccuracy of statistical results. How to deal with the missing data effectively, how to make full use of the data information, accurately reflect the characteristics of the research group, and achieve the expected research goal, has become a difficult and hot issue in the current statistical research. Because the missing data often contain some important information of the whole, if we want to apply the statistical analysis method to the field of processing missing data, the general idea is to fill in the incomplete data first, so as to get the complete data set. Then the data set is analyzed and the conclusion is obtained to accurately reflect the true situation. In this paper, for the data missing problem under the non-random deletion mechanism, under certain assumptions, the supplementary data are given by the method of probability and statistics, and the statistical inference of the distribution parameters is given by using the method of probability and statistics on the premise of certain assumptions. Then we give the quantile estimation of the probability distribution of complete data. We are more concerned with the median (0.5 quartile), the quartile (the 14th quartile / 0.25-quartile, respectively), the quartile, the quartile and the quartile. The third quartile, the 0.05 quartile, and the 0.95 quartile. At the same time, we estimate the parameter and quantile of Gamma distribution under the assumption that the probability of success is exponential distribution (parameter 位 can be given according to the parameter of Gamma distribution and the ratio of missing data). The iterative formula of parameters is given. In the stage of simulation research, R language is used to simulate the missing ratio of 1 / 10 / 1 / 5 / 2 / 2 / 3, and the simulation results are compared with each other and good results are obtained. In many cases, the two-parameter model can not meet the actual needs, so we extend the model to three-parameter model, and give the iterative formula of the parameters, and explain the problem of parameter selection of success probability. It is helpful to deal with some problems of missing data in the future.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:O212.1
本文編號:2133593
[Abstract]:Missing data generally exist in the field of experimental research and social investigation. The problem of missing data results in the analysis task and the inaccuracy of statistical results. How to deal with the missing data effectively, how to make full use of the data information, accurately reflect the characteristics of the research group, and achieve the expected research goal, has become a difficult and hot issue in the current statistical research. Because the missing data often contain some important information of the whole, if we want to apply the statistical analysis method to the field of processing missing data, the general idea is to fill in the incomplete data first, so as to get the complete data set. Then the data set is analyzed and the conclusion is obtained to accurately reflect the true situation. In this paper, for the data missing problem under the non-random deletion mechanism, under certain assumptions, the supplementary data are given by the method of probability and statistics, and the statistical inference of the distribution parameters is given by using the method of probability and statistics on the premise of certain assumptions. Then we give the quantile estimation of the probability distribution of complete data. We are more concerned with the median (0.5 quartile), the quartile (the 14th quartile / 0.25-quartile, respectively), the quartile, the quartile and the quartile. The third quartile, the 0.05 quartile, and the 0.95 quartile. At the same time, we estimate the parameter and quantile of Gamma distribution under the assumption that the probability of success is exponential distribution (parameter 位 can be given according to the parameter of Gamma distribution and the ratio of missing data). The iterative formula of parameters is given. In the stage of simulation research, R language is used to simulate the missing ratio of 1 / 10 / 1 / 5 / 2 / 2 / 3, and the simulation results are compared with each other and good results are obtained. In many cases, the two-parameter model can not meet the actual needs, so we extend the model to three-parameter model, and give the iterative formula of the parameters, and explain the problem of parameter selection of success probability. It is helpful to deal with some problems of missing data in the future.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:O212.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前8條
1 謝民育;吳茗;熊明;寧建輝;;指令性抽樣下總體均值和方差的估計及其應(yīng)用[J];應(yīng)用數(shù)學(xué)學(xué)報;2010年02期
2 張星;郝偉;;不完備或缺失數(shù)據(jù)及其填補(bǔ)方法研究[J];福建電腦;2007年04期
3 劉富春;;基于修正容差關(guān)系的擴(kuò)充粗糙集模型[J];計算機(jī)工程;2005年24期
4 龐新生;缺失數(shù)據(jù)處理中相關(guān)問題的探討[J];統(tǒng)計與信息論壇;2004年05期
5 王國胤;Rough集理論在不完備信息系統(tǒng)中的擴(kuò)充[J];計算機(jī)研究與發(fā)展;2002年10期
6 金勇進(jìn);缺失數(shù)據(jù)的插補(bǔ)調(diào)整[J];數(shù)理統(tǒng)計與管理;2001年06期
7 金勇進(jìn);缺失數(shù)據(jù)的加權(quán)調(diào)整(系列之Ⅳ)[J];數(shù)理統(tǒng)計與管理;2001年05期
8 金勇進(jìn),朱琳;不同差補(bǔ)方法的比較[J];數(shù)理統(tǒng)計與管理;2000年04期
相關(guān)碩士學(xué)位論文 前1條
1 袁中萸;多元線性回歸模型中缺失數(shù)據(jù)填補(bǔ)方法的效果比較[D];中南大學(xué);2008年
,本文編號:2133593
本文鏈接:http://www.sikaile.net/kejilunwen/yysx/2133593.html
最近更新
教材專著