天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 自動(dòng)化論文 >

基于主成分分析和神經(jīng)網(wǎng)絡(luò)的癌癥驅(qū)動(dòng)基因預(yù)測(cè)模型

發(fā)布時(shí)間:2018-06-07 22:30

  本文選題:主成分分析 + 神經(jīng)網(wǎng)絡(luò); 參考:《北京交通大學(xué)》2017年碩士論文


【摘要】:癌癥是人類生命和健康的主要威脅之一,它不僅給個(gè)人和家庭造成沉重的精神壓力和經(jīng)濟(jì)負(fù)擔(dān),也嚴(yán)重影響了全球的經(jīng)濟(jì)發(fā)展和社會(huì)進(jìn)步。癌癥產(chǎn)生機(jī)制及其控制研究已經(jīng)成為全球性的衛(wèi)生戰(zhàn)略研究重點(diǎn)。既往癌癥的研究主要集中在尋找其外部誘因,對(duì)于內(nèi)在的致癌機(jī)理知之甚少,直到高通量測(cè)序技術(shù)等方法的出現(xiàn),使得從基因水平分析內(nèi)因成為可能。通過(guò)分析癌癥形成過(guò)程中細(xì)胞內(nèi)基因表達(dá)水平的變化,人們發(fā)現(xiàn)有些基因能夠?qū)δ[瘤起控制作用,如果抑制這些基因表達(dá)或基因通路,就可以終止腫瘤發(fā)展的相關(guān)事件,這些基因被稱為癌癥驅(qū)動(dòng)基因。驅(qū)動(dòng)基因是決定癌癥的最主要內(nèi)部原因,針對(duì)驅(qū)動(dòng)基因靶向治療,癌癥治療就可能事半功倍。目前,我們主要通過(guò)分析大量樣本的序列比對(duì)結(jié)果來(lái)預(yù)測(cè)癌癥驅(qū)動(dòng)基因,這種基于生物學(xué)的方法易于理解,但往往需要對(duì)大量的癌癥樣本進(jìn)行測(cè)序,花費(fèi)昂貴。隨著分子生物學(xué)的快速發(fā)展,諸如TCGA(The Cancer Genome Atlas)等組織為研究者提供了數(shù)量龐大且更新及時(shí)的數(shù)據(jù)資源,此外,機(jī)器學(xué)習(xí)、數(shù)據(jù)挖掘等技術(shù)的涌現(xiàn)為分析這些數(shù)據(jù)提供了強(qiáng)大的支撐。驅(qū)動(dòng)基因預(yù)測(cè)逐漸向數(shù)據(jù)化方向發(fā)展。本文介紹了驅(qū)動(dòng)基因的研究背景、意義和方法,并對(duì)主成分分析方法和神經(jīng)網(wǎng)絡(luò)的基本原理及在本文中的應(yīng)用做詳細(xì)分析介紹;谶@兩種方法,我們提出了一種用于預(yù)測(cè)癌癥驅(qū)動(dòng)基因的系統(tǒng)生物學(xué)模型,該模型能夠從微陣列數(shù)據(jù)出發(fā)逐步得到驅(qū)動(dòng)基因預(yù)測(cè)集,降低實(shí)驗(yàn)過(guò)程中相關(guān)步驟的系統(tǒng)誤差和人為誤差,可以有效地減少經(jīng)費(fèi)支出和實(shí)驗(yàn)周期,為癌癥的靶向治療提供依據(jù)。本文選取多形性膠質(zhì)母細(xì)胞瘤作為實(shí)驗(yàn)對(duì)象進(jìn)行驗(yàn)證。首先,對(duì)實(shí)驗(yàn)樣本數(shù)據(jù)進(jìn)行預(yù)處理,對(duì)腫瘤表達(dá)譜數(shù)據(jù)進(jìn)行歸一化等處理,之后利用主成分分析方法進(jìn)一步過(guò)濾無(wú)表達(dá)信息或者表達(dá)信息過(guò)低的表達(dá)數(shù)據(jù);其次,受模塊網(wǎng)絡(luò)的啟發(fā),對(duì)篩選出的基因進(jìn)行劃分,將具有相似突變率的基因劃分在同一個(gè)塊中,并對(duì)塊進(jìn)行排序;最后,通過(guò)受限玻爾茲曼機(jī)學(xué)習(xí)得到驅(qū)動(dòng)基因的預(yù)測(cè)集,并將預(yù)測(cè)結(jié)果和文本挖掘的結(jié)果進(jìn)行比較,發(fā)現(xiàn)有80%左右的基因符合文本挖掘的結(jié)果,證明本文提出的模型具有一定的可行性和有效性。
[Abstract]:Cancer is one of the main threats to human life and health. It not only causes heavy mental stress and economic burden to individuals and families, but also seriously affects global economic development and social progress. The research on the mechanism and control of cancer has become the focus of global health strategy research. Previous studies on cancer have focused on finding out the external causes, but little is known about the underlying carcinogenic mechanisms until the advent of high-throughput sequencing techniques, which make it possible to analyze the internal causes at the gene level. By analyzing the changes in gene expression levels in cells during cancer formation, it has been found that some genes can control tumors, and if these genes are inhibited or gene pathways are inhibited, the events associated with tumor development can be terminated. These genes are called cancer-driven genes. Driving gene is the main internal cause of cancer. At present, we mainly predict the cancer driving gene by analyzing the sequence alignment results of a large number of samples. This biology-based approach is easy to understand, but it often requires a large number of cancer samples to be sequenced, which is expensive. With the rapid development of molecular biology, organizations such as TCGA and the Cancer Genome Atlas have provided researchers with a large number of updated and timely data resources, in addition to machine learning. The emergence of technologies such as data mining provides a strong support for the analysis of these data. Driving gene prediction is gradually moving towards data. In this paper, the background, significance and method of driving gene are introduced, and the principle of principal component analysis (PCA), the basic principle of neural network and its application in this paper are introduced in detail. Based on these two methods, we propose a system biological model for predicting cancer driven genes. The model can be used to obtain the prediction set of driving genes from microarray data step by step. Reducing the systematic error and artificial error of the relative steps in the experiment process can effectively reduce the expenditure and the experimental period and provide the basis for the targeted treatment of cancer. Pleomorphic glioblastoma was selected as experimental object. First, preprocessing the experimental sample data, normalizing the tumor expression profile data, then using principal component analysis method to further filter the unexpressed information or the expression information too low expression data; secondly, Inspired by the module network, the selected genes are divided into the same block with similar mutation rate and sequenced. Finally, the prediction set of the driving gene is obtained by the restricted Boltzmann machine learning. By comparing the predicted results with the results of text mining, it is found that about 80% of the genes are consistent with the results of text mining, which proves that the proposed model is feasible and effective.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:R73-3;TP183

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 錢曉燕;石遠(yuǎn)凱;韓曉紅;;中國(guó)肺癌的驅(qū)動(dòng)基因研究進(jìn)展[J];科技導(dǎo)報(bào);2014年26期

2 王敬慧;張宗德;張樹(shù)才;;肺腺癌驅(qū)動(dòng)基因研究相關(guān)進(jìn)展[J];中國(guó)肺癌雜志;2013年02期

3 劉冬;;比較基于留一法和bootstrap留一法得到的估計(jì)誤差的近似密度函數(shù)曲線[J];赤峰學(xué)院學(xué)報(bào)(自然科學(xué)版);2011年12期

4 姜偉;吳超;徐建凱;楊月瑩;李霞;;利用決策森林構(gòu)建復(fù)雜疾病驅(qū)動(dòng)的基因網(wǎng)絡(luò)[J];中國(guó)生物醫(yī)學(xué)工程學(xué)報(bào);2009年02期

5 高忠江;施樹(shù)良;李鈺;;SPSS方差分析在生物統(tǒng)計(jì)的應(yīng)用[J];現(xiàn)代生物醫(yī)學(xué)進(jìn)展;2008年11期

相關(guān)碩士學(xué)位論文 前1條

1 任叢林;基于壓縮感知算法的基因表達(dá)數(shù)據(jù)分類的研究[D];北京交通大學(xué);2012年

,

本文編號(hào):1993056

資料下載
論文發(fā)表

本文鏈接:http://www.sikaile.net/kejilunwen/zidonghuakongzhilunwen/1993056.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶3ec20***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com