細(xì)菌必需基因的預(yù)測(cè)及進(jìn)化特征的分析
本文選題:必需基因 + 組成特征; 參考:《電子科技大學(xué)》2016年碩士論文
【摘要】:必需基因在細(xì)菌生存中扮演了一個(gè)相當(dāng)重要的角色,其編碼的蛋白質(zhì)保證了細(xì)菌的正常生存和繁殖。在確定了致病菌的必需基因之后,我們可以將其當(dāng)作治病藥物的靶標(biāo),從而達(dá)到治療疾病的效果;細(xì)菌必需基因的理論研究還有助于我們理解生命的起源和進(jìn)化。所以,預(yù)測(cè)細(xì)菌的必需基因越來(lái)越成為生物信息學(xué)的研究重點(diǎn)。在預(yù)測(cè)細(xì)菌必需基因的方法中,實(shí)驗(yàn)的方法無(wú)疑是最準(zhǔn)確的,但是實(shí)驗(yàn)周期長(zhǎng),操作麻煩,并且花費(fèi)巨大,所以到目前為止只有很少的菌種的必需基因被確定出來(lái),因此理論的方法越來(lái)越受到重視。本文就以細(xì)菌的必需基因?yàn)橹饕难芯繉?duì)象,采用基于組成特征的理論方法來(lái)預(yù)測(cè)細(xì)菌的必需基因。我們首先根據(jù)注釋文件從大腸桿菌的基因組序列中提取出其組成特征。然后用支持向量機(jī)(SVM)和主成分回歸(PCR)的方法對(duì)組成變量進(jìn)行分類(lèi)處理,并用曲線下面積AUC的值來(lái)衡量分類(lèi)器的效果。這也是第一次將主成分回歸的方法用于細(xì)菌必需基因的預(yù)測(cè)。得出SVM的AUC為0.83,PCR結(jié)果為0.87。接著我們又對(duì)兩種方法進(jìn)行改進(jìn),在支持向量機(jī)方法之前,將組成變量進(jìn)行特征分析(ttSVM),篩除必需基因和非必需基因沒(méi)有明顯差異的變量。對(duì)于主成分回歸,加上了核函數(shù)(KPCR),提高了其對(duì)非線性特征的分類(lèi)能力。改進(jìn)后,ttSVM結(jié)果最高達(dá)0.87,KPCR則為0.84。接著我們將其他所有的已經(jīng)實(shí)驗(yàn)確定必需基因的物種用該四種方法處理,AUC最高達(dá)到0.95。最后,我們用AUC大于0.8的物種,建立預(yù)測(cè)模型,構(gòu)建了一個(gè)免費(fèi)的網(wǎng)上服務(wù)IBEG(http://cefg.uestc.edu.cn/ibeg/),利用該服務(wù),研究人員不但可以運(yùn)用不同的方法預(yù)測(cè)未知基因的必需性,也可以對(duì)比不同方法的優(yōu)劣。此外,我們還從功能性基因和水平轉(zhuǎn)移基因兩方面,分別對(duì)不同物種的必需基因、高密碼子使用基因以及高表達(dá)基因進(jìn)行了對(duì)比分析。在功能性基因中,必需基因所占的比例最多,說(shuō)明必需基因中具有功能的基因比較多,功能越是對(duì)生命體重要的基因,進(jìn)化越保守;在水平轉(zhuǎn)移基因中,必需基因所占的比例也是最多,說(shuō)明必需基因的功能中有一些管家基因,從而容易發(fā)生水平轉(zhuǎn)移。綜上所述,本文在組成特征上對(duì)細(xì)菌必需基因的預(yù)測(cè)做了新方法的處理,增加了新的組成特征,并對(duì)其在進(jìn)化方面做了的研究。但是還有一些問(wèn)題,需要進(jìn)一步深入研究,并進(jìn)一步完善。
[Abstract]:Essential genes play a very important role in bacterial survival, and the proteins they encode guarantee the normal survival and reproduction of bacteria. After we have identified the necessary genes of pathogenic bacteria, we can use them as targets of medicine to cure diseases, and the theoretical study of essential genes of bacteria can also help us to understand the origin and evolution of life. Therefore, predicting the essential genes of bacteria has become the focus of bioinformatics. Of the methods used to predict bacterial essential genes, the experimental method is undoubtedly the most accurate, but it is so long, cumbersome and costly that only a few essential genes have been identified so far. Therefore, more and more attention has been paid to the theoretical method. In this paper, the essential genes of bacteria were used as the main research object, and the essential genes of bacteria were predicted by using the theory method based on component characteristics. We first extracted the composition of Escherichia coli from the genome sequence according to the annotated document. Then support vector machine (SVM) and principal component regression (PCR) are used to classify the component variables, and the effect of the classifier is evaluated by the value of AUC under the curve. This is the first time that the principal component regression method has been used to predict bacterial essential genes. The AUC of SVM is 0. 83% and the PCR result is 0. 87. Then we improve the two methods. Before the support vector machine (SVM) method, the component variables are analyzed by feature analysis to screen the variables which have no obvious difference between the essential gene and the non-essential gene. For principal component regression, kernel function KPCRN is added to improve its ability to classify nonlinear features. The result of the improved vector machine (SVM) is 0.87kPCR and 0.84respectively. We then treated AUC with all other species that had experimented with essential genes to a maximum of 0. 95. Finally, we built a prediction model for species with AUC greater than 0.8, and we built a free online service, IBEGEGG: r / cefg.uestc.edu.cnr.ibegrr, which allows researchers not only to use different methods to predict the need for unknown genes. It is also possible to compare the advantages and disadvantages of different methods. In addition, we compared the essential genes, high codon usage genes and high expression genes in different species from functional genes and horizontal transfer genes. Among functional genes, essential genes account for the largest proportion, indicating that there are more functional genes in essential genes, and the more important genes for life, the more conservative evolution is; in horizontal transfer genes, The proportion of essential genes is also the highest, indicating that there are some housekeeping genes in the function of essential genes, which makes it easy to transfer horizontally. To sum up, a new method is proposed to predict the essential genes of bacteria in terms of composition characteristics, and the new characteristics are added, and the evolution of these genes is also studied. However, there are still some problems that need to be further studied and further improved.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:Q933
【相似文獻(xiàn)】
中國(guó)期刊全文數(shù)據(jù)庫(kù) 前3條
1 葉遠(yuǎn)濃;郭鋒彪;;微生物必需基因的理論研究現(xiàn)狀[J];遺傳;2012年04期
2 沈露露;杜敏;林興鳳;蔡婷;王大勇;;嗅覺(jué)神經(jīng)元AWA功能必需基因以胰島素信號(hào)依賴的方式調(diào)控秀麗線蟲(chóng)的衰老(英文)[J];Neuroscience Bulletin;2010年02期
3 ;[J];;年期
中國(guó)重要會(huì)議論文全文數(shù)據(jù)庫(kù) 前2條
1 張春霆;;細(xì)菌必需基因研究與最小基因組[A];第五屆全國(guó)生物信息學(xué)與系統(tǒng)生物學(xué)學(xué)術(shù)大會(huì)論文集[C];2012年
2 郭鋒彪;寧綠文;黃健;林昊;張會(huì)雄;;新洋蔥伯克霍爾德氏菌AU-1054菌株的三條染色體上必需基因的異常分布[A];中國(guó)的遺傳學(xué)研究——遺傳學(xué)進(jìn)步推動(dòng)中國(guó)西部經(jīng)濟(jì)與社會(huì)發(fā)展——2011年中國(guó)遺傳學(xué)會(huì)大會(huì)論文摘要匯編[C];2011年
中國(guó)博士學(xué)位論文全文數(shù)據(jù)庫(kù) 前2條
1 葉遠(yuǎn)濃;細(xì)菌必需基因團(tuán)簇模型及最小基因集構(gòu)建[D];電子科技大學(xué);2015年
2 林巖;微生物必需基因數(shù)據(jù)的分析[D];天津大學(xué);2010年
中國(guó)碩士學(xué)位論文全文數(shù)據(jù)庫(kù) 前4條
1 林丹;多種微生物功能基因的預(yù)測(cè)和分析[D];電子科技大學(xué);2014年
2 鄧炎炎;細(xì)菌必需基因的預(yù)測(cè)及進(jìn)化特征的分析[D];電子科技大學(xué);2016年
3 羅森;細(xì)菌必需基因自訓(xùn)練算法的研究及實(shí)現(xiàn)[D];電子科技大學(xué);2016年
4 竇運(yùn)濤;原核生物基因識(shí)別程序ZCURVE 1.02的研發(fā)和微生物必需基因的分析[D];天津大學(xué);2005年
,本文編號(hào):2008980
本文鏈接:http://www.sikaile.net/kejilunwen/jiyingongcheng/2008980.html