多組比較的傾向性評(píng)分模型構(gòu)建及匹配法的研究和應(yīng)用
發(fā)布時(shí)間:2018-05-08 20:04
本文選題:傾向性評(píng)分匹配 + 最鄰近匹配法 ; 參考:《第二軍醫(yī)大學(xué)》2014年博士論文
【摘要】:研究背景: 隨著信息技術(shù)的不斷發(fā)展,觀察性研究無(wú)論是在數(shù)量上還是在研究準(zhǔn)確性上都在不斷增加和提高。大樣本的觀察性研究在醫(yī)學(xué)研究當(dāng)中發(fā)揮著越來(lái)越重要的作用。但在觀察性研究中,由于研究對(duì)象所在的組別不是隨機(jī)分配的,而是自然存在的,因此具有某些特征的研究對(duì)象更傾向于進(jìn)入處理組或?qū)φ战M,導(dǎo)致不同組間存在混雜偏倚。傾向性評(píng)分法(propensity score, PS)是解決觀察性研究中存在混雜偏倚的常用研究方法。該方法便于理解、研究步驟標(biāo)準(zhǔn)化程度高,近些年在非隨機(jī)化大樣本的觀察性研究當(dāng)中被廣泛應(yīng)用。傾向性評(píng)分法的應(yīng)用主要包括匹配法、分層法和回歸校正法等,以匹配法最具優(yōu)勢(shì),應(yīng)用范圍也最為廣泛。傾向性評(píng)分匹配法主要包括最鄰近匹配法、卡鉗匹配法和馬氏距離匹配法等幾種方法。目前,對(duì)于傾向性評(píng)分匹配法的應(yīng)用上還有一些問(wèn)題尚未得到解決。例如,對(duì)于在傾向性評(píng)分模型中應(yīng)放入何種類型的協(xié)變量,目前仍存在著爭(zhēng)議;何種匹配方法更具優(yōu)勢(shì)目前尚未得到定論;另外,目前傾向性評(píng)分匹配法主要用于分組因素為二分類的觀察性研究資料,很少有研究將其用于分組因素為多分類的觀察性研究資料中。 研究目的: 構(gòu)建分組因素為有序三分類的傾向性評(píng)分匹配方法。通過(guò)模擬研究篩選納入到傾向性評(píng)分模型中的協(xié)變量,比較多種匹配方法在分組因素為有序三分類情況下優(yōu)劣,通過(guò)調(diào)整參數(shù)確定不同數(shù)據(jù)特征下最具優(yōu)勢(shì)的匹配方式,同時(shí)在分組因素為有序三分類的情況下對(duì)不同傾向性評(píng)分應(yīng)用方法進(jìn)行比較,最后將模擬研究中建立的最優(yōu)傾向性評(píng)分匹配方法應(yīng)用到實(shí)際數(shù)據(jù)分析中。 研究方法: 本研究采用蒙特卡洛法模擬數(shù)據(jù)集。分組因素模擬為有序三分類,并分別調(diào)整不同組間的樣本量比例為1:1:1、2:3:5、1:2:3和1:4:5。根據(jù)協(xié)變量與分組因素和結(jié)局的關(guān)系模擬不同類型的協(xié)變量,包括與分組因素和結(jié)局均相關(guān)聯(lián)的協(xié)變量、與分組因素相關(guān)聯(lián)的協(xié)變量、與結(jié)局相關(guān)聯(lián)的協(xié)變量和與分組因素和結(jié)局均不相關(guān)聯(lián)的協(xié)變量。通過(guò)在傾向性評(píng)分模型中納入不同類型的協(xié)變量,確定在分組因素為有序三分類情況下傾向性評(píng)分模型中應(yīng)納入的協(xié)變量類型。根據(jù)分組因素為二分類的傾向性評(píng)分匹配方法的基本思想,構(gòu)建分組因素為有序三分類的傾向性評(píng)分匹配法,包括最鄰近匹配法、卡鉗匹配法和馬氏距離匹配法,并通過(guò)SAS宏程序?qū)崿F(xiàn)各種匹配方法。在不同匹配方法中設(shè)定不同匹配參數(shù),如匹配比例、卡鉗值等,通過(guò)比較不同匹配方法和設(shè)定不同匹配參數(shù)確定不同數(shù)據(jù)特征下最具優(yōu)勢(shì)的匹配方式。另外,還將利用模擬數(shù)據(jù)比較不同傾向性評(píng)分應(yīng)用方法,包括匹配法、分層法、回歸校正法和匹配后回歸校正法。 采用有序logistic回歸分析法計(jì)算分組因素為有序三分類的研究對(duì)象的傾向性評(píng)分值。在傾向性評(píng)分匹配前后需要對(duì)放入傾向性評(píng)分模型中的協(xié)變量進(jìn)行均衡性檢驗(yàn)。本研究采用標(biāo)準(zhǔn)化差異法(standardized differences, SD)來(lái)評(píng)價(jià)不同組間協(xié)變量的均衡性。通過(guò)預(yù)實(shí)驗(yàn)得到,當(dāng)分組因素為有序三分類時(shí),,不同組間標(biāo)準(zhǔn)化差異的絕對(duì)值的最大值大于0.1時(shí),三組間的協(xié)變量尚未達(dá)到均衡。當(dāng)完成傾向性評(píng)分匹配后,還要對(duì)模型的偏性和精度進(jìn)行評(píng)價(jià)。本研究采用相對(duì)偏倚(relative bias, RB)來(lái)評(píng)價(jià)模型的偏性,RB的絕對(duì)值越小,表明模型的偏性就越;采用平均誤差均方(mean squarederror, MSE)來(lái)評(píng)價(jià)模型的精度,MSE越小,表明模型的精度越高。 最后,將模擬研究建立的分組因素為有序三分類的傾向性評(píng)分匹配方法應(yīng)用到實(shí)例分析中。實(shí)例分析部分的數(shù)據(jù)來(lái)源于第二軍醫(yī)大學(xué)承擔(dān)的“中國(guó)大陸胃腸道疾病流行病學(xué)調(diào)查”的數(shù)據(jù)。本研究利用問(wèn)卷中調(diào)查對(duì)象的一般信息、體格檢查問(wèn)卷和SF-36健康調(diào)查問(wèn)卷中的數(shù)據(jù),評(píng)價(jià)腹部肥胖與健康相關(guān)的生活質(zhì)量(health-related quality oflife, HRQOL)之間的關(guān)系。人口學(xué)信息包括性別、年齡、身高、體重、教育水平、職業(yè)和慢性病發(fā)病情況等。腹部特征定義為“正常腰圍”、“輕度腹部肥胖”和“重度腹部肥胖”三類。健康相關(guān)的生活質(zhì)量采用中文版的健康測(cè)量簡(jiǎn)表(SF-36)進(jìn)行評(píng)價(jià)。以腹部特征為分組因素,健康相關(guān)的生活質(zhì)量的各個(gè)維度得分為結(jié)局,篩選人口學(xué)信息中的變量為協(xié)變量,構(gòu)建傾向性評(píng)分模型。利用模擬研究建立的傾向性評(píng)分匹配方法控制混雜因素對(duì)結(jié)局的影響,從而評(píng)價(jià)腹部肥胖對(duì)健康相關(guān)的生活質(zhì)量的影響。 研究結(jié)果: (1)協(xié)變量篩選:在分組因素為有序三分類的情況下,當(dāng)傾向性評(píng)分模型中納入與結(jié)局相關(guān)聯(lián)的協(xié)變量時(shí),可獲得相對(duì)較高的匹配比例,并且估計(jì)的處理效應(yīng)的偏性相對(duì)最小,精度最高。當(dāng)逐步從模型中剔除一個(gè)協(xié)變量后,如果該協(xié)變量與分組因素和結(jié)局變量均相關(guān)聯(lián),會(huì)極大增加處理效應(yīng)估計(jì)值的偏性,降低其精度,說(shuō)明與分組因素和結(jié)局變量均相關(guān)聯(lián)的協(xié)變量需全部納入,同時(shí)再納入與結(jié)局相關(guān)聯(lián)但與分組因素不相關(guān)聯(lián)的協(xié)變量可進(jìn)一步減小處理效應(yīng)估計(jì)的偏性,增大處理效應(yīng)估計(jì)的精度。因此,在分組因素為有序三分類的情況下,傾向性評(píng)分模型中需納入與結(jié)局相關(guān)聯(lián)的協(xié)變量,無(wú)論其是否與分組因素相關(guān)聯(lián)。 (2)匹配方法構(gòu)建和比較:本研究構(gòu)建了分組因素為有序三分類的傾向性評(píng)分匹配方法,包括最鄰近匹配法、卡鉗匹配法和馬氏距離法,并對(duì)不同匹配方法進(jìn)行比較。在不同組間樣本量比例下,卡鉗匹配法的效果均達(dá)到最好。當(dāng)組間樣本量比例為1:1:1時(shí),采用卡鉗匹配法(卡鉗值設(shè)為0.005)進(jìn)行1:1:1匹配效果最好;當(dāng)組間樣本量比例為2:3:5時(shí),采用卡鉗匹配法(卡鉗值設(shè)為0.01)進(jìn)行1:1:1匹配效果最好;當(dāng)組間樣本量比例為1:2:3時(shí),采用卡鉗匹配法(卡鉗值設(shè)為0.01)進(jìn)行1:1:1匹配效果最好;組間樣本量比例為1:4:5時(shí),采用卡鉗匹配法(卡鉗值設(shè)為0.01)進(jìn)行1:2:2匹配效果最好。 (3)不同傾向性評(píng)分應(yīng)用方法比較:不同傾向性評(píng)分方法均能極大地降低處理效應(yīng)估計(jì)值的偏性,提高處理效應(yīng)估計(jì)值的精度。無(wú)論組間樣本量比例如何,匹配法和匹配后回歸校正法的效果均優(yōu)于其他方法。當(dāng)組間樣本量比例為1:1:1時(shí),回歸校正法優(yōu)于分層法;當(dāng)組間樣本量的比例逐漸拉大時(shí),分層法優(yōu)于回歸校正法。 (4)實(shí)例研究:經(jīng)傾向性評(píng)分匹配后,所有與結(jié)局相關(guān)聯(lián)的協(xié)變量均在不同腹部特征組間達(dá)到了均衡,因此可以直接評(píng)價(jià)腹部肥胖對(duì)健康相關(guān)的生活質(zhì)量的作用。結(jié)果表明,在體能維度上,重度腹部肥胖組的人群得分均顯著低與正常腰圍組,而輕度腹部肥胖組的人群得分顯著高于正常腰圍組。而在社會(huì)功能維度上,只有重度腹部肥胖組的人群在得分上顯著低于正常腰圍組人群,輕度腹部肥胖組人群與正常腰圍組人群在得分上無(wú)統(tǒng)計(jì)學(xué)差別。 研究結(jié)論: 在分組因素為有序三分類的情況下,傾向性評(píng)分模型中應(yīng)納入與結(jié)局相關(guān)聯(lián)的協(xié)變量。在進(jìn)行傾向性評(píng)分匹配時(shí),采用卡鉗匹配法進(jìn)行匹配效果最好,卡鉗值和匹配比例根據(jù)組間樣本量比例進(jìn)行調(diào)整。在不同傾向性評(píng)分應(yīng)用方法中,以匹配法和匹配后回歸校正法的效果最好。與傳統(tǒng)多因素統(tǒng)計(jì)方法相比,本研究建立的分組因素為有序三分類的傾向性評(píng)分匹配方法可通過(guò)控制混雜因素定量評(píng)價(jià)不同組間連續(xù)型結(jié)局變量的差異。
[Abstract]:Background of Study :
With the development of information technology , observational studies have been increasing and improving both in quantity and in research accuracy . The observational study of large samples plays a more and more important role in medical research .
What kind of matching method is more advantageous and has not yet been finalized ;
In addition , the current tendency score matching method is mainly used for observational study data of grouping factors into two categories , and few researches have been used in observational study data for grouping factors into multi - classification .
Purpose of study :
In this paper , we construct the matching method of propensity score in order three classification , and compare multiple matching methods under the condition of grouping factor into ordered three classification , and compare the best advantage in different data characteristics by adjusting the parameters , and then compare the application methods of different inclination scores under the condition of grouping factors as ordered three categories , and finally apply the optimal propensity score matching method established in the simulation study to the actual data analysis .
Study method :
In this study , the data set is simulated by Monte Carlo method . The grouping factors are modeled as ordered three categories , and the proportion of sample size between different groups is 1 : 1 : 1 , 2 : 3 : 5 , 1 : 2 : 3 and 1 : 4 : 5 .
By means of sequential logistic regression analysis , we calculated the tendency score value of the grouped factors into the ordered three categories . By pre - experiment , the equilibrium between different groups was evaluated by standardized differences ( SD ) . When grouping factors were ordered three categories , the covariables between the three groups had not yet reached equilibrium . When the tendency score was completed , the bias and accuracy of the model were evaluated . The smaller the absolute value of RB , the smaller the bias of the model was shown .
The smaller the mean squarederror ( MSE ) is used to evaluate the accuracy of the model , the smaller the MSE , the higher the accuracy of the model .
Finally , the relationship between obesity and health - related quality of life ( HRQOL ) was evaluated by using the data from the general information , physical examination questionnaire and SF - 36 health questionnaire . The data from the questionnaire included sex , age , height , weight , education level , occupational and chronic diseases . The health - related quality of life was defined as " normal waist circumference " , " mild abdominal obesity " and " severe abdominal obesity " .
Results of the study :
( 1 ) Covariate screening : In the case of grouping factors into an ordered three classification , a relatively high matching ratio can be obtained when the covariables associated with the outcome are included in the propensity score model , and the accuracy is the highest . If the covariables are associated with both the grouping factor and the outcome variable , the accuracy of the processing effect estimate can be greatly increased , and the covariables associated with the outcome variables and the outcome variables can be further reduced , so that the accuracy of the processing effect estimation is increased . Therefore , in the case of the grouping factors being ordered three categories , the covariables associated with the outcomes need to be included in the propensity score model regardless of whether or not it is associated with the grouping factor .
( 2 ) Construction and comparison of matching method : This study constructed the matching method of propensity score based on grouping factors as ordered three classification , including the most adjacent matching method , the caliper matching method and the Markov distance method . The effect of the caliper matching method is the best when the sample size ratio of the groups is 1 : 1 : 1 . When the sample size ratio is 1 : 1 : 1 , the matching effect of the caliper matching method is 1 : 1 : 1 .
When the sample size ratio of the group is 2 : 3 : 5 , the matching effect of 1 : 1 : 1 is best done by using the caliper matching method ( the caliper value is set to 0.01 ) .
When the sample size ratio of the group is 1 : 2 : 3 , the matching effect of 1 : 1 : 1 is best done by using the caliper matching method ( the caliper value is set to 0.01 ) .
When the sample size ratio between groups is 1 : 4 : 5 , the matching effect of 1 : 2 : 2 is the best by adopting the caliper matching method ( the caliper value is set to 0.01 ) .
( 3 ) Compared with other methods , the method of different propensity score can greatly reduce the deviation of treatment effect estimation value and improve the accuracy of treatment effect estimation value . The regression correction method is superior to other methods , regardless of the proportion of sample size , the matching method and the post - matching regression correction method .
When the proportion of sample size in the group gradually increases , the stratification method is superior to the regression correction method .
( 4 ) Case study : After the matching of the propensity score , all the covariables associated with the outcome were balanced among the different abdominal characteristic groups , so it was possible to directly evaluate the effect of abdominal obesity on the health - related quality of life . The results showed that the scores of the patients with severe abdominal obesity were significantly lower than those in the normal waist group .
Conclusions of the study :
In the case of grouping factors as ordered three classification , the covariables associated with the outcomes should be included in the propensity score model . The best results are compared with the traditional multi - factor statistical methods . The grouping factors established in this study are the best results compared with the traditional multi - factor statistical methods .
【學(xué)位授予單位】:第二軍醫(yī)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:R181.2
【引證文獻(xiàn)】
相關(guān)期刊論文 前1條
1 鄧峰;屈蒙;楊培榮;王紅林;楊彪;高建民;;寶雞市農(nóng)村居民高血壓糖尿病社區(qū)干預(yù)效果分析[J];中國(guó)公共衛(wèi)生管理;2016年05期
本文編號(hào):1862851
本文鏈接:http://www.sikaile.net/yixuelunwen/yufangyixuelunwen/1862851.html
最近更新
教材專著