廣義線性混合模型在二分類縱向數(shù)據(jù)中的探索研究
發(fā)布時間:2018-09-03 07:20
【摘要】:二分類縱向數(shù)據(jù)廣泛應(yīng)用于醫(yī)學(xué)、心理學(xué)、社會科學(xué)等領(lǐng)域,在新藥臨床研究中更為常見。由于該類數(shù)據(jù)不服從正態(tài)分布以及其不同時間點的數(shù)據(jù)間具有相關(guān)性,因而不滿足傳統(tǒng)的統(tǒng)計研究方法的應(yīng)用條件。目前,能夠分析研究二分類縱向數(shù)據(jù)的統(tǒng)計方法有廣義估計方程和廣義線性混合模型等。其中,文獻中廣義估計方程的研究已經(jīng)比較成熟,但是,關(guān)于廣義線性混合模型的研究比較少,特別是關(guān)于參數(shù)估計方法的研究還很有限且不完整。目的:通過蒙特卡羅模擬比較廣義線性混合模型在二分類縱向數(shù)據(jù)分析中各種參數(shù)估計方法的優(yōu)劣,以及研究樣本量大小、協(xié)方差結(jié)構(gòu)以及數(shù)據(jù)缺失狀況對各種參數(shù)估計方法的影響。方法:根據(jù)蒙特卡羅模擬研究方法,采用如下評價指標(biāo)比較各種參數(shù)估計方法的優(yōu)劣:偏差,均方誤,平均均方誤(針對各時間點)和最大均方誤(針對各時間點),以及95%可信區(qū)間覆蓋率;考慮不同樣本量,不同協(xié)方差結(jié)構(gòu),以及不同缺失機制和缺失比例對二分類縱向數(shù)據(jù)分析中各種參數(shù)估計方法上述指標(biāo)的影響;并且將研究的結(jié)果應(yīng)用于一個臨床試驗二分類縱向數(shù)據(jù)的分析。結(jié)果:在樣本量較大且沒有缺失數(shù)據(jù)的情況下,數(shù)值積分近似法在分析不管協(xié)方差結(jié)構(gòu)為復(fù)合對稱型還是不確定型的二分類縱向數(shù)據(jù)時,獲得的估計量偏差更小,95%可信區(qū)間的覆蓋率更高,而且其估計量的均方誤、平均均方誤和最大均方誤也更低,也就是說,數(shù)值積分近似法分析較高樣本量的二分類縱向數(shù)據(jù)更準(zhǔn)確更穩(wěn)定。數(shù)值積分近似法的優(yōu)勢在隨機效應(yīng)方差較大(大于等于1)的情況下更為明顯,但在隨機效應(yīng)方差小于1的情況下,數(shù)值積分近似法和線性化方法分析二分類縱向數(shù)據(jù)得到的估計量偏差十分接近,95%可信區(qū)間的覆蓋率也大致相同。大樣本量的結(jié)論并不適用小樣本量的情況。在分析小樣本量的二分類縱向數(shù)據(jù)時,線性化方法中的RSPL和MSPL方法更穩(wěn)定,RMPL和MMPL方法獲得的95%可信區(qū)間的覆蓋率更高,而且線性化方法對于隨機變量的參數(shù)估計更準(zhǔn)確。說明在分析低樣本量的二分類縱向數(shù)據(jù)時,線性化方法更有優(yōu)勢。廣義線性混合模型中不同參數(shù)估計方法在協(xié)方差結(jié)構(gòu)為不確定時的穩(wěn)健性在不同樣本量的情況下表現(xiàn)也是不同的。在樣本量較低的情況下,線性化方法中的RSPL和MSPL方法產(chǎn)生的均方誤,平均均方誤和最大均方誤與其他方法相比更小。從產(chǎn)生的G矩陣的正定比例來看,RSPL和MSPL方法也更好。所以在樣本量較低的情況下,RSPL和MSPL方這兩種方法在分析低樣本量的二分類縱向數(shù)據(jù)時在協(xié)方差結(jié)構(gòu)為不確定時穩(wěn)健性更好。而在樣本量較大的情況下,數(shù)值積分近似法更好,產(chǎn)生的估計值偏差較小,95%可信區(qū)間的覆蓋率也更高。與此同時,從收斂情況來看,數(shù)值積分法也有不可替代的優(yōu)勢。因此,數(shù)值積分近似法這兩種方法在分析大樣本量的二分類縱向數(shù)據(jù)時在協(xié)方差結(jié)構(gòu)為不確定時穩(wěn)健性更好。當(dāng)數(shù)據(jù)中含有缺失的情況下,不論缺失機制為完全隨機缺失還是隨機數(shù)據(jù)缺失,在缺失比例較小時,數(shù)值積分近似法分析二分類縱向數(shù)據(jù)得到的參數(shù)估計偏差相對更小,95%可信區(qū)間的覆蓋率更高,穩(wěn)定性也更好。在缺失比例較高時,數(shù)值積分近似法反而不如線性化方法中的RSPL和MSPL方法分析數(shù)據(jù)得到的估計量偏差小,而且線性化法得到的95%可信區(qū)間的覆蓋率也更高,分析數(shù)據(jù)獲得的估計量也更穩(wěn)定。在實例分析中,由于樣本量較大,缺失數(shù)據(jù)比率很低,數(shù)值積分近似法是應(yīng)該選擇的參數(shù)估計方法。各種數(shù)值積分法所得到的兩組的對數(shù)差異比及其95%可信區(qū)間并沒有明顯的差別。結(jié)論:應(yīng)用廣義線性混合模型分析二分類縱向數(shù)據(jù)要根據(jù)數(shù)據(jù)的樣本量,協(xié)方差結(jié)構(gòu)和數(shù)據(jù)缺失情況選擇參數(shù)估計方法。當(dāng)數(shù)據(jù)中沒有缺失或者缺失比例較低時,數(shù)值積分近似法對大樣本量和較大隨機效應(yīng)方差的數(shù)據(jù)分析有優(yōu)勢,而對于當(dāng)樣本量較小時,線性化法分析則更好。在缺失比例較高時,采用線性化中的RSPL和MSPL方法來分析二分類縱向數(shù)據(jù),相對于數(shù)值積分近似法更準(zhǔn)確穩(wěn)定。
[Abstract]:Bivariate longitudinal data are widely used in medicine, psychology, Social Sciences and other fields. It is more common in clinical research of new drugs. Because this kind of data does not obey normal distribution and the data of different time points have correlation, it does not meet the application conditions of traditional statistical research methods. Statistical methods for data processing include generalized estimator equations and generalized linear mixed models. In the literature, the study of generalized estimator equations is more mature, but the study of generalized linear mixed models is less, especially the study of parameter estimation methods is still limited and incomplete. The advantages and disadvantages of various parameter estimation methods in binary longitudinal data analysis of generalized linear mixed model and the effects of sample size, covariance structure and data missing on various parameter estimation methods were studied. Advantages and disadvantages: bias, mean square error, mean square error (for each time point) and maximum mean square error (for each time point), and 95% confidence interval coverage; considering the impact of different sample size, different covariance structure, and different missing mechanism and missing ratio on various parameter estimation methods in binary longitudinal data analysis Results: In the case of large sample size and no missing data, the numerical integral approximation method has a smaller deviation of 95% in the analysis of binary longitudinal data, regardless of whether the covariance structure is composite symmetric or uncertain. The coverage rate of the confidence interval is higher, and the mean square error, mean square error and maximum mean square error of the estimator are lower. That is to say, the numerical integration approximation method is more accurate and stable in analyzing the binary longitudinal data with higher sample size. However, when the variance of random effects is less than 1, the estimator deviations of the numerical integration approximation method and the linearization method are very close to each other, and the coverage of 95% confidence intervals are approximately the same. The RSPL and MSPL methods are more stable, the coverage rate of 95% confidence intervals obtained by RMPL and MMPL methods is higher, and the linearization method is more accurate for parameter estimation of random variables. The robustness of the proposed method varies with different sample sizes when the covariance structure is uncertain. In the case of lower sample sizes, the mean square error, mean square error and maximum mean square error produced by RSPL and MSPL methods in linearization methods are smaller than those by other methods. The RSPL and MSPL methods are more robust when the covariance structure is uncertain when the sample size is low, and the numerical integration approximation method is better when the sample size is large, and the estimated value deviation is small and 95% confidence interval is small. At the same time, the numerical integration method has an irreplaceable advantage in terms of convergence. Therefore, the numerical integration approximation method is more robust when the covariance structure is uncertain in the analysis of large sample size binary longitudinal data. When the missing ratio is small, the numerical integration approximation method gets smaller deviation of parameter estimation, higher coverage rate of 95% confidence interval and better stability. The estimator deviation is small, and the coverage of 95% confidence interval obtained by linearization method is higher, and the estimator obtained by analysis data is more stable. There is no significant difference between the two groups in logarithmic difference ratio and 95% confidence interval. Conclusion: Generalized linear mixed model is used to analyze the two classifications of longitudinal data. The approximation method is superior to the linear method in the analysis of large sample size and large random effect variance, but it is better for the linear method when the sample size is small.
【學(xué)位授予單位】:復(fù)旦大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:R96;O242.2
本文編號:2219307
[Abstract]:Bivariate longitudinal data are widely used in medicine, psychology, Social Sciences and other fields. It is more common in clinical research of new drugs. Because this kind of data does not obey normal distribution and the data of different time points have correlation, it does not meet the application conditions of traditional statistical research methods. Statistical methods for data processing include generalized estimator equations and generalized linear mixed models. In the literature, the study of generalized estimator equations is more mature, but the study of generalized linear mixed models is less, especially the study of parameter estimation methods is still limited and incomplete. The advantages and disadvantages of various parameter estimation methods in binary longitudinal data analysis of generalized linear mixed model and the effects of sample size, covariance structure and data missing on various parameter estimation methods were studied. Advantages and disadvantages: bias, mean square error, mean square error (for each time point) and maximum mean square error (for each time point), and 95% confidence interval coverage; considering the impact of different sample size, different covariance structure, and different missing mechanism and missing ratio on various parameter estimation methods in binary longitudinal data analysis Results: In the case of large sample size and no missing data, the numerical integral approximation method has a smaller deviation of 95% in the analysis of binary longitudinal data, regardless of whether the covariance structure is composite symmetric or uncertain. The coverage rate of the confidence interval is higher, and the mean square error, mean square error and maximum mean square error of the estimator are lower. That is to say, the numerical integration approximation method is more accurate and stable in analyzing the binary longitudinal data with higher sample size. However, when the variance of random effects is less than 1, the estimator deviations of the numerical integration approximation method and the linearization method are very close to each other, and the coverage of 95% confidence intervals are approximately the same. The RSPL and MSPL methods are more stable, the coverage rate of 95% confidence intervals obtained by RMPL and MMPL methods is higher, and the linearization method is more accurate for parameter estimation of random variables. The robustness of the proposed method varies with different sample sizes when the covariance structure is uncertain. In the case of lower sample sizes, the mean square error, mean square error and maximum mean square error produced by RSPL and MSPL methods in linearization methods are smaller than those by other methods. The RSPL and MSPL methods are more robust when the covariance structure is uncertain when the sample size is low, and the numerical integration approximation method is better when the sample size is large, and the estimated value deviation is small and 95% confidence interval is small. At the same time, the numerical integration method has an irreplaceable advantage in terms of convergence. Therefore, the numerical integration approximation method is more robust when the covariance structure is uncertain in the analysis of large sample size binary longitudinal data. When the missing ratio is small, the numerical integration approximation method gets smaller deviation of parameter estimation, higher coverage rate of 95% confidence interval and better stability. The estimator deviation is small, and the coverage of 95% confidence interval obtained by linearization method is higher, and the estimator obtained by analysis data is more stable. There is no significant difference between the two groups in logarithmic difference ratio and 95% confidence interval. Conclusion: Generalized linear mixed model is used to analyze the two classifications of longitudinal data. The approximation method is superior to the linear method in the analysis of large sample size and large random effect variance, but it is better for the linear method when the sample size is small.
【學(xué)位授予單位】:復(fù)旦大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:R96;O242.2
【參考文獻】
相關(guān)期刊論文 前7條
1 尹文嬌;趙守軍;張勇;;廣義線性混合模型在傳染病流行病學(xué)研究中的應(yīng)用[J];中國疫苗和免疫;2011年04期
2 羅天娥;趙晉芳;劉桂芬;;GENMOD過程和GLIMMIX過程的比較[J];中國衛(wèi)生統(tǒng)計;2010年02期
3 康萌萌;;基于廣義線性混合模型的經(jīng)驗費率厘定[J];統(tǒng)計與信息論壇;2009年07期
4 羅天娥;劉桂芬;孟海英;;廣義線性混合效應(yīng)模型在臨床療效評價中的應(yīng)用[J];數(shù)理醫(yī)藥學(xué)雜志;2007年05期
5 劉曉光;張巖;白艷春;燕春山;李吉娜;;類風(fēng)濕性關(guān)節(jié)炎的治療和護理體會[J];現(xiàn)代醫(yī)藥衛(wèi)生;2007年11期
6 殷宗俊;張勤;;利用GLMM方法估計家畜閾性狀的遺傳力[J];中國農(nóng)業(yè)大學(xué)學(xué)報;2005年06期
7 陳峰,任仕泉,陸守曾;非獨立試驗的組內(nèi)相關(guān)與廣義估計方程[J];南通醫(yī)學(xué)院學(xué)報;1999年04期
,本文編號:2219307
本文鏈接:http://www.sikaile.net/yixuelunwen/yiyaoxuelunwen/2219307.html
最近更新
教材專著